 Hello, my name is Stephen Davison, Head of Digital Library Development at the Caltech Library. I am joined here by Tom Morel, Research Data Specialist, and Tommy Keswick, Digital Technologies Development Librarian. I'm going to kick off our presentation with a brief introduction, and then Tom and Tommy are going to describe two projects that to illustrate the Caltech Library's lightweight approach to digital content management and publication. There have been major changes in the way software is designed over the past decade or two. Significant characteristics of contemporary approaches include the embrace of open source software, leveraging hosted and cloud-based services, and modular software designed and microservices relying on application programming interfaces or APIs. The Folio Library Service Platform, the Invenio Digital Repository Framework, and Archive Space are all good examples of software projects with strong support communities that leverage contemporary approaches to building flexible and sustainable software. At Caltech, these are the three pillars of our suite of library services. Moving forward, we are striving to fill the gaps without adding any new large software packages beyond Folio, Invenio, and Archive Space. We would like to fill the gaps by building around the edges of these pillars using lightweight, reusable tools that prioritize modularity, reusability, flexibility, and continuous development. Consideration of issues around community resource development, system scope, and metadata have led us to a set of choices listed here. As much as we can, we focus on utilizing tightly focused software tools that have been embraced by community peers, our open source work well in shared and cloud environments, our API oriented, and promote continuous development and data replication. Tom and Tommy are going to discuss two projects that illustrate aspects of this approach. Over to you, Tom. So the first project we're going to share today involves a cell atlas, and this came to the library from a research group, and it was based on their vision. So they had seen print versions of cell atlases where you could flip through the pages and look at large images of cells, where the different types of cell components were labeled, as well as some supplementary text and references. And these were all for plant animal cells. The lab was focused on bacteria and archaea. And traditionally those have been viewed as just kind of, you know, a bag that has some stuff in it that there wasn't too much structure with these types of cells. But they, their lab focuses on electron tomograms and state a lot of data that showed that bacteria and archaea actually do have a lot of structure. The lab was interested in illustrating this structure. And so they created a whole bunch of content, working with folks who and took the raw data and annotated different cell structure types onto the videos itself. So you can, you could very, very well look at the actual image of the cell, as well as the annotations for what the different cell types were. So Catherine Mono and Grant Jensen also wrote text to go along with each of these videos. And then the group came to the library and they said, we've got all this video content we've got this text. How can we make an interactive textbook that displays all this content in the library looked at a lot of the kind of software solutions for digital textbooks. And we weren't able to find anything that would be able to host this type of, this type of kind of large video content in the way the group wanted to work. And so we were thinking about, well, can we build something that would be lightweight using kind of static HTML and CSS and JavaScript, but would give kind of the interactive nature that they wanted. So how can we kind of replicate the physical experience of flipping through a cell atlas, but in a in a sustainable digital form. So our initial technical approach. We utilize services that we already had available at Celtic library so for the videos, we should have those like data files that we could put in our data repository, and the group wrote up all the video metadata so what type of cell it was who collected it. We were able to then automatically using the Celtic data API wrapper, pull the files and the metadata together to automatically make records in the Celtic data in video based repository. So we were able to store all the videos with unique DIYs which we could use to then access those specific videos later on for the text component. The research lab started with a word file. We then did a little bit of scripting to automatically transform that into a markdown document. We were then able to feed that into pandoc. And we render this for this initial version via our studios book down plugin. So it has a nice kind of built in textbook like generation features. So it does chapters at the sections. And we were able to really easily generate a rough draft of what an HTML book would look like with these embedded videos. So we brought this to the research graph, and then the group and they look at that and they're like, wow, this is really cool. But they wanted it more custom, they wanted to have the videos kind of expand out be more interactive, put in some interactive navigation, and so they hired a contract developer to do more custom work. So the final technical approach and what you'll see today in the cell atlas starts from a markdown document is the group by this point have been familiar and comfortable with working in markdown. We use a custom pandoc renderer that lives in GitHub. So there's a GitHub action so anytime the text of the videos get changed. This GitHub action runs and is able to generate basically a view of the HTML version of the book. Once the research groups happy with this we had and do a publish publication, or we push that out to the main public domain for the book. We can also through pandoc generate additional representations in the textbook such as PDF. And everything is served either by the Celtic data repository or an AWS, an AWS S3 bucket with a cloud front CDN in front of it. The exciting thing about this project is it is completely open. So all of the source code is available on GitHub if you want to see exactly how we script this processing. All the source code is also archived on Celtic data so you can cite the specific code that we use the videos and other media like some images and 3D structures are also archived in Celtic data. The full offline version is also in Celtic data, as well as the PDF. The exciting thing about this project is there are no servers required. So every all the interactivity that I'm going to show you in a second is accomplished using standard JavaScript and CSS. And we hope this means that the book will be much more sustainable into the future. I'm going to give you an idea what this looks like. This is available today at cell structure at list.org. This is the splash page for the digital textbook. You get for each chapter, a welcome page, a little bit about what that chapter is. Each section has a video, which has interactivity so it can pop out, and it's got accompanying text with citations and links. Each page for each video is assigned a Celtic data DOI that's a unique identifier for this given video. So you can get information about who was collected by, as well as what the type of organism is. You can use this menu to browse through all of the chapters and sections. We're going to look at this one here. So each section has a video, like I showed you in the demo, that highlights the different parts of the cell structure. We have text as well as audio narration of each section. We originally have a slider view. So for select structures in the cell, you have this slider and you can add or remove the annotation to really see what the structure is. For select structures, we also have a learn more section, which gives a more detailed view of that specific structure, and it overlays the three dimensional protein structure of this element. So this is to where these structures are described in the literature, as well as a three dimensional viewer. It allows in your browser to view the full three dimensional structure of this protein complex. There is also a slider view of this content as well. So you can really see how it goes from the structure to the three dimensional structure of the protein. We have linking throughout the textbook. One noise you do linking is through the names of the bacteria or archaic. If you click on the name, you get an interactive tree of life, which shows all the archaic and bacteria that are included in the book. And if you scroll over each one, it then links you to which sections highlight that specific organism, as well as showcases how they're all related. We have additional information, including scientists profiles for the scientists who captured this data references. We have information about all the all the contributions to the book itself and the about the book section. And finally, we have an offline view. So we're able to, because this is a standard HTML and CSS JavaScript complex, we're able to bundle the entire textbook into a zip file, but you can download and read if you don't have internet access. We also have the PDF version, if you prefer a more traditional book reading approach without the videos. We've also published all of this through a single set of source material to be able to take the same source and generate out all these different representations of the Atlas of bacteria and our KL cell structure. So feel free to explore at your own leisure. And now I'm going to turn it over to Tommy who's going to talk about a similar project that we did with the Celtic archives. Next time. So we've taken this data centric infrastructure and started to use it for other applications. The Celtic archives oral history project is one of the first. We wanted to replace the current E prints repository that serves PDF copies of interview transcripts with an interface that is more native to the web, offering HTML versions of transcripts with embedded images and audio clips. Very similar components to the cell Atlas project, a source word file of the transcript metadata from archive space, hand doc and GitHub actions for transformations, markdown files, and our outputs are HTML and PDF documents. And so let me demo how all of this comes together for the oral history project. We start with a word file, because much of our transcribing is outsourced, and word is the preferred format for the transcribers and the initial editors. We keep the formatting minimal in the Word document with basic placeholders for where images will go, and using headings to delineate the sections of the interview. The first record is also created by archivists that stores a title and a unique identifier. Also dates of the interview sessions and abstract and links to interviewer and interviewee records. This component unique identifier though is really the key to what ties everything together in this process. The interview sources are ready. Archivists can go to our custom web application that kicks off a mix of automated and manual processing that will get us to final publication. The first step is to upload a word file that has a file name matching the unique identifier that we saw in archive space. This file name is then used to retrieve that corresponding metadata by the application. The metadata is used with the contents of the word file to create a markdown file with Pandoc. The markdown file then becomes the canonical version of the interview transcript. It gets uploaded to a private GitHub repository. And a digital object record is also written back to archive space to represent the transcript markdown file, and it includes a link actually back to GitHub. So you see this is a new record for the markdown file that was created, and there's a link here to GitHub that we can click into to see our markdown file on GitHub. And so we chose GitHub as an intermediary, partly because of its online editing capabilities. Archivists are able to upload additional files and edit the markdown directly in the browser without needing get or additional files on their own computers. And so what I'm going to do is upload some of the image files here. I'm going to do some of the corresponding that go along with this or this retranscript. And so they're uploading now I commit those files to the repository. And then what I will do next is to update the markdown file itself to link to those images correctly using markdown syntax. I open up the markdown file in here. And we can see here here past the metadata at the top of the file we see where our image placeholder our first image placeholder is we want to get that replaced without with markdown syntax and so I have it on my clipboard I'm just going to replace this whole file. And if we look at the top again, we can see that we've got our markdown syntax here with an image that links to the one, one of the files that we just uploaded. We're leaving this file at the bottom, let me down to the bottom we'll save this to the repository. And so once a markdown file is either uploaded or edited in GitHub, that kicks off a workflow or triggers a workflow that automatically will start to generate HTML and PDF files from that markdown file we use GitHub actions for this. And so we can see that this is running right now. In one minute. And so what's happening here is that it's, it's setting up all of the automation that requires pandoc to create the outputs that we are looking for the HTML file and the PDF version. Once that's all done. We will be able to go back to the form. Into the publish section of our form so that we can get all of our HTML files up on the web. We use the same identifier that we've been using. And then when once we click publish changes a number of things happen as well. First, the HTML file that we just generated automatically, and all of the images that we uploaded also anything that is required to construct the published transcript are sent to our static file host s3 is what we're using right now. It also creates a link in our persistent URL resolver pointing to the final location. And so we use a resolver just in case we change hosts and domain name change over time. And finally we also write records back to archive space that include pointers to those extra files we uploaded to GitHub or that were created for GitHub. If you remember in the previous archive space digital object record we saw there was only a markdown file. Now we have jpegs pointed to and the PDF asset also pointed to with with URLs to where they live. It's also marked as published now because we have set the publish step to we have we've gone through the publish step on our on our web application. We can click through to the final location here and see the published version of our oral history transcript as well with the images embedded and everything pulled in from the metadata as well as the word file so it looks really nice in terms of its presentation. And so that is how we've used the same kind of components and this approach that that we've developed. It does lend itself to many applications in the Celtic archives future plans. I think include creating public pages for image based digital objects that don't rely on a full fledged digital asset management system. To solve the metadata management piece with archive space and the object storage piece with S3 for now we can adapt this this infrastructure as needed going forward. Thanks Tommy. In these examples I hope that we have successfully demonstrated an approach that is flexible and sustainable. What we use to build workflows will change over time. So we must embrace continuous development and metadata migration. This has the benefit of better supporting the constant evolution of both staff and public requirements over time. This suggests the takeaways that we would like to leave you with the data is primary and that it is owned by the creators and managers. What we do should be empowering them also software is ephemeral it comes and goes and its development should not be driving decision making. We also embrace constant change in metadata in content in workflows and services and in tools and software to best support that tools need to be focused simple and reusable, if at all possible. Thanks for watching. We would welcome your feedback and questions, so please feel free to get in touch.