 And I think it's about time. Welcome everyone. I'm Cliff Lynch, the director of CNI, and you've joined us for one of the project briefings for our virtual Spring 2020 membership meeting. We will take questions at the end. There is a question asking tool labeled Q and A that's down at the bottom of your screen. Please feel free to put questions in at any time during the presentation as they occur to you and we'll just let them collect up and answer them at the end. Diane Goldenberg Hart from CNI will come on and moderate the Q and A at the end of the presentation. Our talk today is about the IIIF at the Getty. This is an architecture for managing and using image information and we will hear from Stefano Cossu, who is a software architect at the Getty. So thank you again for doing this Stefano, welcome and over to you. Hello, good morning, good afternoon, good evening, wherever you are. I'm Stefano, I'm a software architect at the Getty. I'd like to talk about adopting the IIIF at the Getty from a strategic and tactical standpoint. So IIIF is, well thanks to my colleague Rob Sanderson's keynote address, I hope I don't have to delve into what IIIF is too much too long, but I would like to point to a few things that we at the Getty really care for about what IIIF is and is not. So IIIF is a set of standards that are meant to deliver and share images and other audio-visual media over the web and there is a community of IT professionals in the mostly in the cultural heritage and scientific community who work to develop these standards as well as related software tools and also keep a discourse and communication alive to develop new ideas and collaborations between institutions. What IIIF is not specifically in spite of the most visual part of it, which is the viewers, IIIF is not a simple software product. So meaning that when I say that we adopt IIIF, it doesn't mean that we just bought or downloaded the software from from some open source repository and the property on our architecture is means we adopted a standard and we are building tools around the standard. Also as a note, you will notice that all the sidebar images in this presentation are taken from our amazing museum collections and they are all available as IIIF URLs that you can visit. So beyond the visual aspect of the viewer, IIIF offered several major advantages for us that were very compelling for to drive its widespread adoption in the institution. IIIF offers a logistical advantage because it gives us a methodology almost disciplined to organize our media archives and our media collections. We don't have to figure out every time a project comes up, you know, where which kind of derivatives we have to create, how we have to create and where did it go. It's all figured out from the outset and the architecture takes care of the rest. Also IIIF is designed for sharing which is of course part of the core mission of the Getty. We share a lot of information and knowledge and IIIF is an indispensable tool for sharing our visual materials. IIIF can be adopted by institutions of any size. You don't need to be the Getty to implement IIIF if you have a limited scope. However, when you adopt IIIF, its advantages become more and more apparent and grow with your collections growth in volume and complexity. There is definitely a learning curve and an upfront cost, but this upfront cost pays its dividends as your collections grow, which is the case for us. IIIF is not technically linked open data, but we have been, we have, you know, the Getty has been investing in the linked open data for a long time. We are, we publish the Getty vocabularies. We are behind one of the driving forces behind linked art and we have a very strong presence at Loudlam. So IIIF has been very compelling for us because it's very close conceptually to linked open data. I couldn't say technically is 100% linked open data because it doesn't share the semantic language IIIF manifests are meant to be read by humans mostly. They don't contain irreferensible permanent URLs for most of them, but IIIF can be a visual support for linked open data. In this way, you have a very clear separation of concerns. You can use IIIF which is optimized for presentation and the LOD for semantic discovery work alongside with each other. Short word about my department before we go on, Getty Digital is, you could see it as a traditional central IT department within an academic institution. We consolidate IT knowledge and centralized services that only a few years ago were maintained by individual Getty programs. And also we support the institutional role as a scholarly knowledge hub and a scholarly authority. And the way IIIF helps in fulfilling these goals is by providing, some of them are already provided by what's in place and some of them are in our goals. It provides an access point, a standard access point for searching, viewing, and comparing Getty images and other audiovisual media collections. It offers a single destination for the media production pipelines, as I mentioned, for actually producing the images. And also it is made up of modules. So it's a module of framework that allows us to build tools very efficiently and reuse them for many purposes, be it for new scholar use, for learning, for fun. We can build applications on top of the architecture without having to change the underlying architecture. So we invested heavily in IIIF. Actually, I was hired about a year and a half ago for the specific purpose of, with a specific goal of implementing IIIF to Getty, which already has been used before I came on board. It was initially, IIIF was initially promoted by Rob Sanderson when he joined the Getty several years ago. And it was implemented for specific purposes. When Rich Fagan, our current VP of Getty Digital joined the Getty, IIIF garnered an institution-wide importance. So it became part of the core mission of the Getty Digital. This is very important because we don't have to answer questions about what IIIF is and what it does and why we are adopting everybody knows about it. It's a very good thing. And as a sign of this commitment, we joined the IIIF consortium as a full members in 2016. The IIIF consortium was founded in 2015. And it's a group of institutions who are particularly committed to sustaining the IIIF community and therefore financially and logistically. Of course, there were challenges in adopting IIIF. We didn't really meet any resistance, any significant resistance. On the contrary, we had too many expectations, too much excitement about it. And we actually had to go through some information and education campaign about what IIIF does and what it doesn't. Also, we had to kind of, you know, one of the still present challenges is to manage project requests and prioritizing those requests because they are all very large projects and we have a limited capacity, human capacity, that the only bottleneck that we have. So managing what gets implemented first without overdoing it is been part of the challenges along with growing complexity and volume, which I will get to in a second. Also, there were some concerns about rights management. Some of our rights specialists were concerned about the openness in some ways of IIIF. Not about the openness of the images themselves, but the fact that IIIF allows you to implement any viewer on top of it. There were some concerns, for example, that the rights information would not be displayed where the rights managers would like to or there were concerns that it wouldn't be displayed at all. So we had to kind of come to terms with what I call bad old habits, such as embedding XIF metadata in the images themselves, which I find horrible and very unsustainable, but it was a way to kind of ensure and double and triple ensure that the rights information is always with the images. Also, we had to kind of tighten our quality control because by automating the production pipelines, we needed to rely on the authority for the image quality, which is our imaging department. We don't want to fix an image if it doesn't respond to the quality parameters. We need to just pass the images through because we can take the responsibility and the authority to change an image when we publish it. So we do very minimal changes. Basically just the format of images. We just format them to be served by IIIF, but we have to rely on imaging to give them give our blessing for the images. So we have this scaling pattern which goes along three main directive. One is volume of our archives and collections. One is traffic of our users and one is features, which means applications that can be built on top of our IIIF architecture. So when I started with my job, I took this challenge with a scientific approach. I thought I could not just choose whatever looked good at the first glance or even after some testing. I had to do a field survey that would be as comprehensive as possible and that would foresee this huge growth, which was not, it's unlimited basically. So the first thing was sorting out hack boxes and black boxes. Black boxes being what I call systems that you don't, you just rely on. They just have to work. You don't have to worry about them. Hack boxes being systems that you on the country want full control over, such as ETL systems, which are very ingrained in the Getty data model and IT architecture. Then started reviewing everything that I could find that was a IIIF implementation, all the possible combinations of image quality, image compression algorithms, image formats and so on. So discarding what didn't seem, what didn't respond to certain initial parameters or criteria. And when I came up with a short list of these elements, I ran a benchmark of all these combinations, a few different image servers with a couple different image types and formats with different image compression algorithms and came up with a clear performance winner. When I had a clear winner performance-wise, there were some considerations that were not tied to performance, such as the sustainability of the software or the standard that I was testing and the fit within our department skillset and so on. Then when we came up with a clear winner, we started implementing it. The benchmarks that I use for testing all these systems are available publicly. You can ask me for the link afterwards. I'll be glad to share the benchmark, the results, and the tools, and as well as the test data set, which is a set of opening content images from the Getty. The implementation didn't take that huge effort. It was basically me and another health full-time employee for, I would say, less than a year, maybe 10 months for the first phase. The most challenging and time-consuming tests were building the ETL pipelines because, as I mentioned, they were completely customized and written from scratch. Mapping the metadata, which involved a lot of back and forth with stakeholders and the content managers from the different programs. And designing the autoscaling architecture that would sustain all the fluctuations in traffic. The challenge here was not as much the complexity of architecture but R being new to an autoscaling architecture. So there were some assumptions in the software architecture that were made initially that didn't fit with the autoscaling paradigm. So we had to do a lot of adjustments to it. But now it's basically it's live, it's out there. And we can start reaping the benefits of it. And we have some clear, you know, some advantages that are very apparent. This is an iterative application that will be launched soon. It's based on Edba Rouchet's The Streets of Los Angeles project. Edba Rouchet took hundreds of thousands of images of sunset pool of our building by building and we have acquired them and rearranged them digitally in this application. So basically you can drive that little truck and the images will pan in the screen. So you can imagine that this will open a fire hose of images on the users, browsers, multiplied by the number of users that will peak when the application gets launched. So it's a big scale in IO performance challenge because we want this panning to be as smooth as possible. We ran some initial tests and actually the architecture responded very well. And we didn't have, so far we haven't had to make any adjustments to the architecture or do any tricks with caching or static things that are not triple AF. It's just been working. This is primary, of course, but so far it's been very encouraging. This is a load test that we did with some simulated traffic with Locustis, the traffic simulator program. And you can see that as the number of users grow at the bottom part of the screen, we get an equal growth in a proportional growth in requests per second and the response times are pretty consistent. This is what we wanted to see. We are testing here. This is a graph with 300 concurrent users downloading everything that's possible with large, small tiles and so on. So it's made us very comfortable about scaling up because when we reach a certain amount of traffic, a new server comes up and helps with sustaining the traffic. And when there's low traffic, the server goes down and we maintain a baseline of servers that are able to sustain the day-to-day, the regular load. I don't know almost anything about this animal crossing except that it's a video game and that there is a screenshot here with some blotchy images of our museum collection. The reason why I bring this up is that our interpretive team developed this tie, developed an integration with this video game using our museum images with AAAF in less than a week. So I was barely aware of this effort until it was launched. I didn't have to change anything the way that the API works. It just happened. And you can see from the bottom graph that we hit about six million requests in five days, in the first five days from the launch, we hit, I think, a peak of 380,000 images in an hour. The architecture didn't suffer from it, apparently. And going forward, we can make more. We can use more of AAAFs, of this architecture to do more things. There is nothing preventing us, for example, of course, it's not been decided yet, but nothing will prevent us from using AAAF manifest to build the collection web pages because AAAF manifest has all the metadata that's human readable and all the links to the order collections of images that can make up the web page. We can also build more ephemeral interactive apps. And when we don't need them anymore, we just turn down and retire the app only, not any data or any API. And we can also open up our API because it's a public endpoint and it's well documented because of AAAF. And anybody can use our API to build other presentation tools. So what are the next steps for us? There are implementing authorization and authentication, so we can offer richer access to rights protected images while giving a certain amount of access to the general public for some resources. We want to, of course, load more and more datasets. There is a continuing digitalization effort, especially on the GRI, the data research institute, which acquire a lot of archives. We want to start using manifests for more complex data structure, including user-generated data structures such as user collections. And we want to build more monitoring infrastructure so we can sleep better at night without having to get those nasty wake-up calls. Everything should be automated there. Also, we want to build an annotation server that allows scholars to annotate or whoever to annotate images and those annotations will be stored on our servers and that can be shared among other users. That's a very powerful scholarly tool, of course. So this is it, and I can't yet see. I'll stop sharing my screen so I can see the chat box if there are any questions, which I will be happy to take. Thank you. Thank you, Stefano. That was really interesting. What a great architecture and some of those load stats that you were quoting were really impressive. And I see that your talk has already inspired one question, so let's get started with that. Robin asks, first she comments, that was a great session, Stefano, and she asks, do you have persistent IDs for the IIIF images that are open for people to use? The reason I ask is if someone refers to the image instead of having it embedded, the ID might change otherwise. Yes, so the reason why most of our manifest images haven't been advertised is because we are making the final decision on some of the manifest IDs. So when those manifest and those images will be advertised as public, they will be permanent. And we also will build a redirect service in case we have to retire or change some identifiers. So we have a reliable path to get to IDs that have been saved by someone, might have been saved by someone in Bookmark, and they have to either get to the current resource or to 410 meaning the resource is done because we retired it or we removed it. But there has to be a very clear indication of what happened to that resource. So yes, the identifiers are permanent. Great. Thank you. Thanks for that response. And Robin, thanks for that question. I'd just like to remind you that the floor is open for questions. If you're interested in asking a question, just type it right in there in the Q&A box. And Stefano will be happy to answer that live. So we have another question now coming in from Charles Blair who asks, I'm curious why you would use, why you would not use EXIF metadata to embed metadata? I can say I might not, but I'm curious about your reasons. Thanks. The reason is it's hard to support change in metadata. So imagine you have everything that is about an artwork embedded in EXIF metadata in an image, in an image, which is our case. We actually have artwork metadata in the image. So if we change the title of the artwork or any other metadata that's already embedded in the image, we have to go back, fetch the image and change it. And that's not sustainable. One solution that I suggested in order to keep the title is to have the basic technical metadata embedded and possibly the rights metadata because you can't get away with that with our legal team. And a link, a permanent link to a web resource to a web page that has all the artwork information. So unless you retire that link, you don't have to touch the image if you change the title of the artwork. Thanks, Stefano. And Charles has a follow-up question. He's also curious to know more about the API you mentioned. The API, well, the AAAF API is basically defined by the AAAF specs. What we are doing is we put, we just put a gateway in front of it, sort of a shim software that takes care of other things that are not AAAF, such as caching the source images or implementing auth when it will be implemented. But the request, the API is, you can look up, you know, the AAAF.io website has all the specs. Basically, there are four different APIs. There is an image API that allows you to retrieve an image or the derivative of it based on your parameters. And the presentation API, which allows you to see the context of the image. So you can actually create a context around the multiple images, create a structure around the images, and embed the descriptive metadata. Okay, thank you. I'm not sure if that's in the kind of question, you know, if that answer, Charles? Charles says thanks. So Charles, let us know if there was something else you were asking about and we will field that as well. Moving on to Carmelita Pickett's question, she asks, how do you track images in copyright or images in the public domain? That comes from our system of records metadata. It's the, so since we gather images from different programs, we have different sources for the museum. It's TMS. So there are some rights metadata in TMS and some rights metadata in Rosetta, which is the GRIs, the Research Institute's archive, where I get the images and the metadata. And there are some specific fields. So what we are doing is determining the copyright status and the rights and the access criteria based on those metadata, as well as on the user's parameters, whether the user comes from an internal IP address within the Getty, for example, that's assumed to be, you know, someone who has full access. If it's not coming from the Getty, they should be logged in as someone who has particular rights. So based on that intersection of all these parameters, we make a decision of what kind of quality image we can provide. Thank you. And thanks for that question, Carmelita. Patrick Yacht has a question for you, Stefano. Wondering what implementation of triple IF server you chose? We used IIP image, which turned out to be the most resilient, the most efficient, and the most compact solution. That was the best solution for us. I'm not claiming it's going to be a general best solution for everybody, but that's what we were aiming at. I have to say we had some challenges in fitting that particular software we in the AWS architecture, because that software makes assumptions such as being able to access a file system, and that file system has to have a local path. We can access S3, for example, with triple IF. So that's why we had to create a shim, and we had to create a shim that actually downloads the images from S3 and puts them into a fast cache, which we have to do anyway, because we can't pull the images from S3 all the time. It's too slow. Also, we had to adjust our auto-scaling architecture, so we have to have a shared volume, because these IIP image instances can come up and down, and they all have to point to one shared file server. So there is something in the AWS ECS architecture that doesn't really play well with that, but we worked around it and worked out in the end. So if you need more details, you can contact me. I'll be happy to go over the details. That's great. Thanks. Thank you, Patrick, for the question. Thank you, Stefano, for the answer, and thank you all for these great questions. I see we have another one from Ray who asks, is the Getty thinking about future possibilities with the next version of the IIIF server and other imaging innovation possibilities here, IE video, etc.? Also, is the Getty thinking about integrating IIIF with any other software for further innovation for its audience? For future possibilities, I assume you intend the version 3. So we are actually implementing IIIF version 2. IIIF version 3 will allow us to deal with audio-visual, so moving image and audio. It's a hot topic at the Getty. I realistically don't see it coming very soon, because we have other priorities, i.e., getting stuff out there, and it will be a major effort. However, it's definitely a big advantage, and it's on our high priority list. We want to we want to implement it, you know, also being part of the construction. We want to, you know, be kind of on the forefront in adopting it and giving an example. And just to confirm there, Ray, it was version 3 that he was asking about. So okay, thank you for that. I see folks thanking you for this wonderful presentation, and I would also like to thank you on behalf of CNI for being here today, Stefano, and sharing with us Getty's implementation of IIIF. It's really fascinating and so amazing to see this standard applied to this incredible collection. So looking forward to exploring that further. I see other folks are thanking you, and hi from Chicago. So thanks so much. Thank you to our attendees, and I just want to say that we we do have the capacity to turn on your microphone. If I will turn off the recording and end the public version of this presentation. If you'd like to stay back, raise your hand and approach the microphone. I'm sorry, approach the podium as it were and have a chat with Stefano. If you're thinking of implementing IIIF for your collection or have some questions or want to make a comment, we'll hang around for a little while and be watching for those hands. So with that, I will close the public version of this and the recording, and thank you so much Stefano. Thank you Diane. Thank Cliff. Thanks so much, Stefano. That was a great talk. Very, very enlightening. Thank you.