 And then get started. Welcome everybody I'm Cliff Lynch I'm the director of the coalition for networked information and I'll be introducing the session today. You have reached one of the project briefing sessions. That's part of week three of the fall 2020 virtual CNI membership meeting. This is to remind you week three is themed around technology standards and infrastructure and all three of those are quite relevant here I think the session is being recorded the recording will be available subsequently. There is closed captioning available. There is chat available and please feel free to use that throughout the session. We also have a Q&A tool. After we hear all of the presentations at this session, Diane Goldenberg Hart from CNI will moderate a Q&A session. Please feel free to put questions in at any point during the presentation and we'll try and address as many as possible at the end. I think that we have had a change in speakers here. Jessica Meyerson is not with us today, but you and Cochran from Yale is joining his colleagues, Seth Anderson and Ethan Gates, both also from Yale. A full background on this project. This is one of these really deeply strategic projects that's trying to really build out a robust service for emulating software in obsolete environments. So it's a fundamental cornerstone of software preservation strategy. It's been going on for a while it's received funding from Mellon and Sloan foundations both, and it has very close connections to the wonderful work of the software preservation network. Although the Yale library has been the sort of technological nexus of the work. I do want to note also for those of you who were with us earlier this week, Natalie Myers and her colleagues at the University of Notre Dame have been doing some very interesting interlinkage integration between the work that's being reported here and their work on press QT. So, if you're interested in what you hear here you may want to go back and look at the video of the presentation from that session. And with that, I just want to thank our speakers from the Yale for coming to update us on this really important project, and I'll turn it over to Seth to start the presentations. Thank you, Cliff. Good afternoon and greetings from New Haven Connecticut. My home office here. It's a shame we can't meet in person. I always enjoy the opportunity to see folks face to face and talk about our project but we are thrilled to be here virtually to provide an update on the easy program of work. So as Cliff mentioned, I am Seth Anderson, the software preservation program manager at Yale University library and the co PI on the easy program. And I'm joined today by you and Cochran, who is the digital preservation manager at Yale and my, my co PI, who's joining us from New Zealand. I'm also joined by Ethan Gates, our software preservation analyst and user support lead for easy. So in July of this year easy entered its second phase of the program. Thanks to the continued funding from the Sloan and Mellon foundations. We are excited to share an update on where we've come since our last presentation on easy at CNI, which was in the spring of 2019. And then we're going to look ahead at our plans for this next two year round of funding. And then demonstrate some of the work that we have completed over the last couple of years, and some new functionality that we're hoping to incorporate into the system in the coming year and a half. So, let me just. Okay, there we go. Cliff gave a pretty good summary of what easy is but I will just venture to give a little more detail before diving into our project timeline. So, we like to think of easy as the combination of the emulation as a service technology and additional sort of user oriented functionality that we've been developing and of that technology, the services that we provide to our users to support their use of the system, and then also that community of users that are aiding us in determining a roadmap for easy you know helping us figure out what new features we need to prioritize, but also just generally helping us chart a path through what is the still developing and maturing practice of software preservation and access through emulation. So, for those of you who aren't aware, easy is built on the emulation as a service framework, which was originally developed at the University of Freiberg and continues to be developed by open SLX which is a company of developers from that original team at Freiberg. And the service allows users to create manage and access emulation environments through a web based interface, so that they can then load their own digital collections materials into the emulated computer systems that contain sort of underlying software required to render and manipulate those digital materials so for instance you have old word processing documents in a collection. And then load those into a computer environment that contains the compatible software and provide sort of authentic or as close to authentic access to those materials. As, as can, as is possible, using the emulation service. The program is made up of three of us from Yale, plus the developers from open SLX, and we also receive support from Jessica, Myerson at educopia, and Catherine who is the leader of the wiki data for digital preservation program and serves as our semantic architect utilize and distribute the metadata that we have been creating as part of our ongoing work on easy. So, just give a quick update from where where we've come since spring of 2019. I believe at that time we were still in the early days of having just put out our first beta release of the easy software, and that included the easy software, which I forgot to mention, which allows the users of easy to exchange and reuse software installation materials and preconfigured emulation environments from other organizations to provide access to their materials. So, if you have need of a software title that you don't have locally, either at your institution or even within your own easy system, you can copy that to your infrastructure and run it and reuse it as however you need. So, in 2019, we worked on the beta release of easy with the network. And we provided that to our initial set of network institutions or nodes, which included Notre Dame, who are here today. Stanford, UC San Diego. See if I can get them all Carnegie Mellon and University of Virginia plus Yale acting as its own node. I think I don't think I left anyone out of that first group. And over the summer after shortly after the meeting in spring of 2019, we released a public version of the easy system which contains sort of just example versions of open source operating systems and software that anyone on the web can access and poke around in just to give an idea of the kinds of legacy software you can interact with and through emulation. And that'll come up in a moment when looking forward in the next one and a half years. And then throughout 2019 we continued to work on an updated interface for easy. And that was released in early 2020 around March or April. And this was a huge undertaking that we were able to complete with the aid of the software development company portal media out of Wisconsin. And it provides a more intuitive and user friendly interface to search and discover the environments and software that would be available in an easy system. It will have additional features for documentation that we hope to implement sometime soon. And then also just has a additional administrative feed. So for instance you can create user accounts now we have role specifications that control the actions that certain user types are permitted to perform. And you know it just generally looks and feels a little bit better than previous versions of the system. So that has been in the hands of our users since April and we continue to tinker with it. And it is, you know, part of the focus of this next two years which started in July is to see what happens when users have access to this version of the system and to learn from that and continue to improve the system. So we have a number of targets that we've set for ourselves during this next two years. There are some more R&D focused development features that we'll be working on including incorporation of emulation of mobile systems, in particular the Android operating system that is open source and Google developed. We'll be incorporating support for emulating computer networks. And we're going to continue to improve on a number of automation features that we've started prototyping. And we'll incorporate into the interface to further simplify the process of creating and manipulating or or customizing your emulation environments. But I would say our real focus over this next two years is establishing easy as a service, even though it's there in the title, one of the things that we've learned over the first two and a half years of the project is just how difficult it is to support a growing number of users. And as we continue to scale up the system, we want to make sure that we have a mature and well thought out and that we have the capacity to support a growing number of institutions that will hopefully be using easy. So with that in mind, we're putting a lot of emphasis on ensuring the stability of the software and the code that underlies the easy interface and the back end. And we're taking a real look at the support services we provide so making sure that we have, you know, the mechanisms to allow users to report bugs or ask questions or seek advice on the use of the system and then use the expertise that we as a team have developed over a number of years to enable and empower the users of the system to implement it effectively. And then finally we are really, really, really, really, really emphasizing sustainability both from an organizational standpoint and from a technological standpoint so like I said we're looking at making sure we have a stable system. But also we want by the end of this funding period to in some way, and we don't exactly know the scope of this yet, but start providing an easy service, you know, with a service model and payment mechanisms so that we can continue to sustain the work that goes into easy and continue to expand its functionality and expand the community of users that are participating in using the system. So, just a few other highlights from our timeline, just to give you a sense of where we are headed. So for the past six months we've been working on a new addition of the system which will be the hosted edition so currently all the users have it installed on their local infrastructure, but we've seen a need certainly over the last year for a version of this that is supported and has infrastructure provided by the easy program so that institutions that maybe don't have the funds or sufficient IT support can still use easy and they will then just use the hosted edition of that. And to test that out, we have arranged with the Software Preservation Network to do a pilot program through most of 2021, where in the current membership of SPIN, we'll have the option to elect to use this hosted edition and we'll be monitoring and guiding activities with them to, again, learn what it looks like when we add more users into the system, and what it looks like from to the team and our capacity to manage a larger scale of a larger user base. And then towards the end of, sorry, I can't, my screen's a little too crowded. Towards the end of this funding period, we also hope to turn that public sandbox version of easy into a functional node, so that individuals who maybe are unable to use easy at, you know, through an institutional framework, through their institution or just members of communities outside of sort of institutional frameworks who want to use emulation will have an opportunity to do so through the the public node. And that will again only include open source software. But we are, we anticipate, given the breadth of software available from the open source community that that would be a valuable platform for, you know, both exploring the history of open source software but also using it as a tool to access, you know, order collections. So, those are our plans. And as I mentioned, we wanted to spend most of our time today demonstrating some of the work we're doing so I will first be giving a demo of an upcoming service that we're going to be releasing at the library, the Yale University emulation viewer, then Ethan's going to give a demo of the universal virtual interact, which is the sort of engine of our automation functionality. And then you and is going to show off our ability to emulate computer networks and talk about how that will be a powerful feature in the system coming up in the next couple of years. So let me just take a breath. Before I dive into demo mode. And of course, certainly, please feel free to post any questions that you have in the Q&A. And we'll get to them in a second. All right, so that is, okay. Emulation viewer is a service that will be released to the Yale community starting next year. And it is based on some work I'm going to pause that before it gets ahead of me. So this is started by you and when he when he started at Yale to create digital copies of the CD ROMs in the library circulating collection with an eye towards using emulation to provide access in the future. So, when I started in 2017, we were taking our first steps towards configuring the CD ROMs featuring the CD ROM titles from from the library. And what I'm trying to demonstrate here is just how many we have there are thousands of disks that are in the circulating collection, which are rarely accessed because of the obsolescence general obsolescence of the CD ROM format. And a lot of these are very interesting and we hope that will be will be helpful to researchers. And so the idea I'm going to take you walk you through the workflow that we foresee, even though it's still in the works in some cases. But the idea is that the emulation environment and the viewer that that users will access exists in the same way as you would say like an ebook reader in in the library's infrastructure. Users will, you know, search through Yale's various catalog formats, and there will be a link as I'm circling here where now you have this option to request the physical copy. But instead you would have a link that says view the emulation or don't recall exactly what the text would be. And then when you click through that link you'll be taken to this viewer that's displaying right now, and users have, you know, a limited set of functionality. We're not offering them the ability to like save changes or print or do do any sort of robust manipulation of the contents of these disks. But these environments which have been set up by student workers and we thank them for their service. They have been set up to quickly get the user to the contents of the disk so as you saw there. The environment brought up this viewer which then goes to an HTML HTML page. And from there you can click through the materials in this title on neurochemistry. So our setting certain restrictions on the service so you do have to sign in through Yale's authentication service so if you don't have a Yale account you will not have access to these materials. And then we've also limited access to one environment at a time so that the user tries to log in while someone else is accessing it they will not be able to, and we'll have to return later. When they, when the person is no longer taking their seat. So like I said that'll be made available in January and we're very excited to see what the response looks like. But I will now just quickly know change slides, because I'm already over time. What I want to point out is that we're using this work as well to serve as the basis of a separate service that we're going to make available to the, you know, other institutions. And all of these CDROM environments that we've configured will be available for institutions to say match up with their own collections, and then they'll be provided a link to a separate type of access page it's not exactly the same as the one that Yale is using now. They'll be able to skin it, and you know, align the sort of header and information to their look and feel for their institution. I really I'm glossing over this so please feel free to ask questions about that when we enter into the Q&A. So this is what the access page would eventually look like. Very quick. I realized so as I said please if you have questions, we'll discuss a little later but I'll now turn it over to Ethan to talk about the UVI. Thank you and can you hold off on a second before you start playing that video. Yeah, I'm really excited to share with you all our advances in over the past year or so and what we call our easy API, the universal virtual interact or the UVI and the, what the UVI does is it basically allows users to query the requirements that are available in the easy network based on an input file or set of files. And once those files are uploaded or provided to the UVI, the UVI is automatic file characterization matches those input files with software that can render that file. The API can either return that metadata so just providing the user with some useful information about what computing is or applications could open and interact with that file and it can even or can even return the emulated environment itself with the file mounted in it for interaction via emulation via the browser. So if you could start up our demo video here, Seth, I'm going to show exactly what that means. If I upload into our demonstration UVI interface, an older spreadsheet file, we can see that the UVI will return to results based on that automatic characterization that concludes this is in fact an Excel version 3.0 file format file. One environment runs Windows 95 and Corel word perfect suite which is what we're loading up first. And the other return result returned Windows 98 plus Microsoft Office 97 both of them capable of opening this file in theory so if I click render, as I just did in the demo video here. On on the first suggestion we receive a running Windows 95 emulation and even automatically mount and open the provided spreadsheet in the specified application were perfect quadro. And at that point you're free to interact and view your given file but you also have the option to compare with the UVI is other suggested results so again in the video I just showed I clicked on alternative results. Click on alternative environment and switch over to the Windows 98 plus Office 97 environment. And after a minute we'll get the exact same file. This time just running in a different application. So you can see in this case also that there is some content in the spreadsheet that was not properly rendered in the word perfect application. A minute ago so this ability to compare different emulations different applications different renderings, even if they, they are there are multiple environments multiple, multiple applications technically capable of reading the same file format is really critical for assessment evaluation and proper access of legacy digital content moving the future. And finally at the tail end of the video there you're. You can see that because we allowed data export at the beginning of this process. We can even use Office 97 as a conversion or a migration tool, saving out the Excel 3.0 file to a newer format, something that would be legible perhaps in office 365 and receive that converted file as output so again, we can use it both as an assessment as an access and a migration tool. The UVI is also capable of working with more complex multi file sets, as we are demonstrating here by this data set from the Yale Institute for social and policy studies is data archive. Here we've selected a legacy status script from the archive along with a number of additional data files that the script is intended to be run against. The automatic file characterization again identifies an appropriate rendering application environment in this case windows XP with state attend and renders that input. I'll admit that these videos are slightly sped up just for the sake of presentation, but at the tail end of this video. You should glimpse the state of script again just automatically starting to run mounted in in the windows XP environment and automatically run by state attend the state of script just takes a little while to actually crunch the data and actually produce any output. But by doing so, you know, that's just not why we're showing you're right here to you today but by doing so you can verifiably go back and verify the results of the associated study that this data set was a part of that was originally published and I believe about 2006. So really important for data reproducibility moving to the future. Next slide please. These videos have shown off what you've been seeing is our demo interface for testing and demonstrating UVI functionality but it is also important to emphasize that the UVI is an API is ultimately meant to be used with and integrated into broader preservation and data curation tools as well as work flows. To that end we have been working with the University of Notre Dame on their email is funded presqt file transfer web application as stuff mentioned earlier and Natalie is I believe actually here in the call and the Notre Dame team spoke more about the presqt project earlier at CNI earlier this week. So do go check that out. And Natalie do call me out in the chat if I mischaracterize anything here, but the presqt platform is intended to enhance reproducibility and open sharing of resource data. And their initial work does involve an easy service integration via the UVI API so let's say you have a set of files stored in OSF like perhaps a set of research about what it would take to build a universal virtual and once you've transferred those files into presqt you'll have the option of selecting the easy service as this button shows and next slide please. If you click on that button it submits the contents of that OSF project or GitHub, whatever the source project is submits that data to the UVI for the exact same process of characterization that we just demonstrated in the presentation about what software and environments could render those files and get them all packaged up and deposited that metadata packaged up in positive along with the files for the long term stability and characterization of that data. Perfect last slide please. And since so much of the UVI work is dependent on that whole process of automatic file characterization and matching of file formats with the software that created it or or or that can open them. I'll also note that Yale has been working with independent developer Ross Spencer to create a version of the file characterization tool Siegfried that integrates wiki data identifiers and wiki data is really important because wiki data already associates hundreds I mean thousands of file formats with compatible software applications. And there are you know more such associations by the day thanks to our colleague cat Thornton and the wiki DP project. So by piggybacking on that wiki data work via the Siegfried development, even more of the process that I've been describing with the UVI can be automated, but it's down on the amount of redundant documentation of re associating file formats with software applications. And between that and just the growing number of environments and legacy software applications that we have in the easy network, the UVI's recommendations and functionality are just going to get better and better and better over time as more people join and upload to the network. That's all from me. So I'll hand it off to you and to talk about networks. Okay. Good afternoon. I just want to say, in relation to what Ethan was just talking about, Ross has donated this time to do that work. Integrating wiki data with Siegfried. And I want to thank him very much for that it's been a lot of a lot of his, a lot of hours from him, and he's going to release a report at some point I think which I'll share and a lot of information in there about how much time he spent and just the amount of work that's gone into this but I'm really excited about what he's been able to do. So I just wanted to briefly talk about what we've got coming up around adding the ability to emulate entire networks in easy. This is come from some work that open SLX have done and some other work I've been doing with the developers there on a kind of commercial offshoot. So what you're seeing here is a screenshot from some of that work where at the bottom, you probably can't make it up very easily but you can see there are two environments. This is a configuration page for what we're calling an emulated network. You can see there you can actually add as many environments as you want and then if you click on edit on one of those is pop up and you can change the settings for those environments as and set things up like if you want to map a port and access the environment from your local machine using a secure proxy. We can enable that. You can also set the the network name or the URL for the machine within that network. So then if you say browse to that in the browser and it's running a web server. It'll map to that exact address. So the basic idea is you combine some machines together you maybe even add some network services like DHCP server or a file sharing server or an active directory server. And then you save that as a configured thing. It's a new type of preservable things so in a network of environments. And that network can then be started at any point in time or it can be run permanently and accessed on demand. So if you go to the next slide please. Here's an example. This what you'll see is Windows Server 2003 opening up. So this is the start of a network. We're logging in there. And this particular machine is running SharePoint 2007 with SQL Server. It's all in better than one machine. Yeah, SQL Server 2005 with SharePoint 2007 I think or maybe 2003. And in a minute you'll see or move up to the top right there where you'll be able to you can see you can switch between the two machines that are in this network. So there's an XP machine running into the Explorer 6 which is the one that was compatible with this version of SharePoint at the time. And it is networks just to that other machine so it's isolated from the internet making it much more secure. And only those two machines can now communicate with each other but as you can see it's able to access the server and you're able to interact with everything as you would have normally. And this opens up a whole world of new options for preserving the content that we all have to deal with. I think one of the most exciting ones is simply being able to preserve these large scale databases and have meaningful ways to access them. So I mentioned briefly that you're able to access these remotely if you want to through a secure proxy service. So it's possible to say run SQL Server or any other large scale database and access it remotely on your local machine from the emulated version. And then if you have custom clients you can add the client into the network as well so that the users can just use that client maybe running on desktop machine. As we're seeing here with Internet Explorer 6 to access whatever it is you've got in the server machine in your network. There's also all sorts of opportunities here for any sort of web based content and complex websites where if you're able to catch the servers, you'll be able to keep them available in a very similar way to how they were available originally. So that's just kind of a teaser for what we'll be working on with the networks over the next phase of the grants. All right. Okay, so I hope that got you all excited for the future of easy. We have a lot more planned, obviously, and we hope to continue to expand and improve the functionality, but I would have, I would be remiss if I didn't point out of course that none of this would be possible without the support of our funders. I also want to make sure to thank Yale University Library for continuing to serve as the host and main host of easy as a program. And with that, I'll say thanks and open it up for Q&A. Excellent. Thank you. Yeah, that was very exciting. Just tremendous examples of what this really powerful emulator can do. Thank you. Thanks so much for that presentation to all of you and thank you to our attendees for joining us here today. And as Seth said, we have reserved some time here for questions. So I hope folks will jump in and share some questions with us now. Let us know what you're thinking. If you have comments as well, pop them right there into the Q&A box and we'll be happy to address those. I have a quick, probably very minor question that I have. I was just wondering, you said that you're restricting access to one user at a time. Is that right? Yeah, in the Yale emulation viewer. So that is its own separate service from, it's powered by easy, but it's not the easy service. So the easy service, the interface and the back end functionality of that, that serves as, you know, essentially like a management system for your environment that you create and provide access to. So for instance, our students go use easy to find compatible environments for the CD ROMs that they're configuring. They load the CDs into the emulator, and then they save that as an access environment. With that access environment, we then put the link through the magic of the web page viewer when the researcher clicks through the catalog. It's calling the easy service and saying, I need to access this single environment, and that's what renders in the viewer page. And since it's a little, I mean, we have the controls just like let's say if Yale had two copies of a CD, we could set it up to allow two users to access it simultaneously. We wanted to make sure we had some constraint on this, you know, from a Well, speaking to lawyers and also in considering the fair use arguments for this, because these were purchased items at some point in the library's past, we kind of adopted the one copy one generation policy. So, while we have the option to just, you know, the technology would allow us to open it up to as many people as we at once as we wanted. Put some constraints around it. It is, it's, you know, one seat for one license essentially. Okay, thank you. I appreciate that. Very good. I mean, the easy platform provides the scaffolding for whoever is implementing the service at their institution to make those decisions and constrain in any number of ways so it could be a simultaneous sessions it could be IP restriction. And easy the platform sort of provides the administrative interface by which any individual institution can come to their own conclusion based on their analysis of with their materials of what they want to do with those. And I should say as well we are planning to provide more granular permissions within easy for for end user access. We are having a service outside of easy that calls to the back end to the API these the emulation service API. We are looking to have settings within the easy interface so within that system. You can see how it gets confused using all the different versions and components of the system. The easy system you would you would hopefully have the capability as you would with you know most access oriented systems like whatever your video streaming so if you use a very something similar where you can say, you know, only a specific user has access so for like special collections and reading room purposes you could say this user account can access this this specific environment for a week. And there we were looking into potentially also restricting it to specific IP addresses so if needed to have it only available to turn like a single terminal in the reading room. Or if you needed to restrict it to a range of IP addresses, you know within, say, your university network, you could do that as well. But we're you know, we, you know, since everybody went remote. We had hoped to get a little further along than we are now. But we obviously foresee a future where it will be highly beneficial to researchers to be able to access digital collections without having to travel to the university. And so we, we are excited and like I'm, I want us to be able to get there sooner but I can't kill our developers so yeah, that's what you and tells me. Very good. Well, I don't see any questions in the chat right now so I'm just going to take one last opportunity to thank our presenters for bringing this to CNI and sharing it with our community. It's really a fascinating project and we look forward to continuing to watch its development so thank you again for being here and thank you to our attendees for joining us today. I'll go ahead and turn off the recording portion of the presentation but any attendees who are still with us. I think our presenters will hang out with us a little bit longer for a chat so if you want to join us just raise your hand I'll be happy to turn on your microphone and you can join the conversation. So thanks everybody and take care. Bye bye.