 And I think it's about time to get started, so let me welcome everybody. I'm Cliff Lynch, the director of CNI, and you have reached one of the project briefings for the spring 2020 CNI virtual member meeting. This afternoon we have a presentation with four speakers dealing with archival resource keys in the open. And you can see the speaker list here, John Kunze from the California Digital Library will lead off. And then we'll hear from Best Missile, Karen Hansen and Tom Creighton from the Smithsonian, Ithaca, Portico and Family Search International respectively. We will take questions at the end of all four presentations. The questions will be moderated by Diane Goldenberg Hart from CNI. There is a Q&A tool at the bottom of your screen and feel free to enter questions at any point during the presentation as they occur to you. And we'll just batch them up there and get to them all at the conclusion of the presentation. So with that, I think we can get started. I welcome all of our attendees again. I thank our speakers for joining us and it's over to you, John. Great, thank you Clifford. Welcome to this panel on ARCs in the open. ARC identifiers were born at the California Digital Library, which is located at the University of California office of the president, where we represent the ten campuses of the University of California. That includes ten campuses, hundreds of museums, archives, galleries, etc. So, excuse me. Why should we care about ARC identifiers? The lifetime of a URL was once said to be 44 days. At the end of its life, a URL link breaks, meaning that it gives you the dreaded 404 not found error, which most of us have seen. You know, irritating as that may be it's extremely awkward when prior results are unavailable, as you try to renew your grant, or and it's a minor disaster for memory organizations like archives, libraries, and museums. Among web links, many of which were never meant to last forever. A persistent identifier, as I'm sure you know, is a link that in principle keeps working far into the future. Services that do research discovery and interlink between artifacts and literature prefer persistent identifiers because of their stability. Surprisingly, few archival service providers support persistent links to content, which is especially noticeable when you change vendors and all your old content links break as a result. So, ARCs are meant to be persistent, reliable links to any kind of thing, whether digital, physical, or abstract. Conservatively, 3.2 billion arch identifiers have been assigned by over 550 institutions located across the world. ARCs are decentralized. This counts just the ARCs that we're sure about. Who is registered to create ARCs? Well, a wide variety of memory organizations from archives to libraries to publishers, including nonprofits or profits, government agencies and more. A few of them are listed on this slide. So, what does an ARC look like? Well, at first glance, it's a particular kind of URL bearing an internal label. How can you tell if you have an ARC? Well, here's the internal label. This is the assigning authority number and it identifies the organization that created or assigned the ARC. The part after that names the thing that the ARC is assigned to. The host name over on the left makes it actionable, which makes it something you can click on to get to the thing. It's also known as the resolver. ARCs are unusual among persistent identifier solutions in allowing organizations to run their own resolvers. The part after that is the core globally unique identity, which doesn't depend on the host being available. It doesn't even depend on the future of the wide web. So, what are ARCs used for? Well, here's a list in descending order according to numbers of ARCs assigned. The heaviest user of ARCs is the largest genealogical research organization that's one that Tom Creighton belongs to. It's called Family Search International. Next is the archive of all the source files for mainstream published content. And this is what Karen Hansen will talk about in a minute. They have apparently 40 billion ARCs, but apparently there are only 100 million of them that are meant to be interesting. The Internet Archive has been assigning ARCs to scanned books since 2008. The Smithsonian is rapidly expanding their use of ARCs from 15 million with a target of 100 to 150 million. And you'll hear from Abess in a few minutes. The French National Library was an early adopter and here's a wonderful image depicting the biblical ARC and of course it has an ARC. Speaking of images, qualifiers make ARCs well suited to triple IF vendors who don't want to have to register 100 different identifiers for 100 different views into an object. They can just register one identifier and get and have access to 100 or 1000 different ARCs underneath it. ARCs are also important for people, especially historical figures who weren't alive in time for orchids. For example, there are ARCs for Ada Lovelace and Isaac Newton. ARCs are also important for abstractions used in the semantic web, things like diseases, vocabulary terms, time periods. These all get ARCs. So why ARCs and not DOIs or handles or pearls or URNs? Well, ARCs were started in 2001 in response to flaws in earlier persistent identifier solutions. If you want a very terse and somewhat snarky summary, you can Google for 10 persistent myths about persistent identifiers. Some of this comes from this history of open, but there's also this whole concept of flexible resolution. ARCs are unusual in that you can use them in a centralized way via our global resolver into t.net or via your own server. In this sort of University of California tradition of open, you'll have things like the free speech movement in 1968 and 1982. This is 12 years before the word open source was coined. I was fortunate enough to be doing exactly that open source on Berkeley Unix, which is now running on everyone's Mac. The groundbreaking open access research policy of 2013 was a response to University of California publishing 50,000 articles per year, only 15% of them freely available. And in 2019, there was the bold determination of the contract with the largest academic publisher Elsevier, hoping that the rest of the world would follow. Consistent with all this is the ARC effort to challenge the notion that making your scholarly content reliably available for the long term requires you to give over control of your URLs and to pay for a publishing solution. That requires you nonetheless to maintain its resolvers tables as well as your own content. Instead, since 2001, the University of California has been offering these non-paywalled and decentralized identifiers called ARCs. So there's still some questions probably about ARCs versus DOIs. DOIs are digital object identifiers made popular by the publishing industry. ARCs are similar to DOIs in being persistent identifiers for content and metadata, but ARCs are also similar to URLs in being decentralized and without fees or limits. Many institutions in the world turn to ARCs for affordable and flexible ways of providing long-term access to their scientific and cultural assets. Like DOIs, you can find ARCs acting as primal links in the Data Citation Index, which is linked to the Web of Science, a HathiTrust collection of digitized books, WikiPedia articles, WikiData records, Internet Archive Collections, Orchid Researcher Profiles, and more. Break the siloed paywall paradigm followed by most persistent identifier schemes. You don't pay to create URLs and you don't pay to create ARCs. You can run your own resolver and then you do so, or you can hire a vendor to do it. It's easy to get started using ARCs. We have a comprehensive, frequently asked questions document in English, French, and Spanish. You should get started assigning ARCs. You just fill out a request form. It's a one-time thing. It takes about a day and you'll be all set. About 18 months ago, the California Digital Library, together with Duraspace, now Lyrisis, we started the ARCs in the Open Project. It had the goal of creating a community-owned scholarly infrastructure for ARCs. CDL remains committed to ARCs, but is forming a global community to ensure the shared and redundant infrastructural support for them. We welcome you to visit our website and to get involved. And with that, I'm going to turn it over to Beth. Beth, I think you're muted. Okay, can you hear me? Yeah, you're good. Okay, wonderful. Thank you. Hi, I'm Beth Messel. I'm a librarian at the Smithsonian Libraries, and I'm going to talk about our implementation of ARCs at the Smithsonian. This is just an overview of what I'll be covering, who we are, what we assign ARCs to, when did we start, and how many are assigned so far, why we chose ARCs, and why we are involved in ARCs in the Open. So I work with the Smithsonian Libraries. We are a network of 21 specialized libraries, which support the Smithsonian Institution. We have central support services, which include the Smithsonian Research Online, a bibliography of Smithsonian publication citations, and our repository. We're part of the Smithsonian Institution, which is the world's largest museum education and research complex. We have 19 museums plus our zoo. And because we are so large, any SIY project like this becomes very complex. So we are assigning ARCs to our collection systems, objects in our collection systems. And this is just a sampling of the things that we assign ARCs to. We have scientific specimens from the National Museum of Natural History. We have cultural artifacts from the National Museum of American History, sculpture from the Freer Gallery of Art and Arthur M. Sackler Gallery, photographs from the National Museum of African American History and Culture, and paintings from the Smithsonian American Art Museum. And there's many more. We began assigning ARCs in 2015 with the National Museum of Natural History. And they began assigning ARCs to their scientific specimens, their metadata collections. Later they began to include multimedia, so the photographs of the specimens. Currently they have over 10 million ARCs assigned to metadata records, and 3 million ARCs assigned to their multimedia records. Then this past year, the Smithsonian began their open access project, which released 11.5 million metadata records and 2.8 million multimedia records into the public domain on February 26 of this year. And we chose ARCs to be our global unique identifier, our grid for this project. So we have over 15 million ARCs, and we'll be continuing to add to that as we add collections to our museums. ARCs were chosen because we needed a large number of ARCs, with over 15 million ARCs and growing, and ARCs scale very well. Cost, we, the ARCs have a low annual fee, as opposed to, as John said, the DOIs for instance have a cost per identifier. And so the cost was great. In the implementation, we were already using them at the Natural History Museum and knew how to implement them and knew that they were going to work for the project we were doing. And finally, the growth of the ARCs in the open project encouraged us to choose ARCs as a viable, sustainable identifier. So the Smithsonian libraries were chosen to manage the ARCs. So we actually, that's my job, I issue register and maintain the ARCs for our Smithsonian collection systems. This was a logical assignment for us because we also registered DOIs for Smithsonian publications and research. We maintain a Smithsonian GWID webpage for SI staff and researchers and the public can also see it, which lists all the different identifiers and use of the Smithsonian and the types of data they are used with. This slide shows one of our ARCs up at the top. And the yellow part is your resolver that takes a web call over to EZID where we have registered our NAN, our naming authority number. So the Smithsonian's number is 65665. And then what we've been doing with the open access project is to assign dataset IDs. So that's this three digit ID. And we have done dataset IDs for every collection management system at the Smithsonian. And each system is actually getting two. One is for metadata records and one is for multimedia records. And that's because they often resolve to different servers. So we have a specific URL that's registered with the dataset. And so this is just talking about how these are pointing to different places. The VK7 is registered for the Smithsonian American Art Museum for their metadata records. If you replaced it with the BJ9, that would be actually pointing toward the image delivery server for American art. This is a screenshot of the EZID web client. Just to show you, here's our NAN. And then this is where I can actually assign a dataset ID to go with it. We, the Smithsonian, we did write a schema for our dataset IDs, which I follow. And then I keep track of all the IDs we have so that we don't repeat any. So we use two randomly selected lowercase letters with no lowercase L, no RM, NM, or FU. We were told by programmers that those could be commands. And one randomly selected number two through nine. This is a screenshot of our collection search center, which is one of our databases. And you can see in the highlighted yellow area, that's one of our arcs. The PY2 in the arc down here is associated with the Hirshhorn Museum and Sculpture Garden Collection. And one of the things we did with the open access project was for all of the Smithsonian collection systems that were involved, they are now configured so that when a metadata record or an image file is saved and arc with the appropriate dataset ID is automatically assigned and created for that record. We did have some challenges with our implementation. We had a tight schedule to meet with, we pretty much started the project to try the implementation in the fall in September and we had to launch in February. And also the size of the institution. We have multiple collection management systems, and these are supported by various administrators. And then those systems are also supported in different ways for their IT support. We have a central Smithsonian IT, OCIO, but some museums have their own in-house IT and some museums contract out. So that was definitely a challenge to get everyone with communicating and on the same page with the rollout. Probably the biggest problem that I encountered was having the IT folks identify the correct syntax for the URL that needs to be registered with easy ID. Early on we figured out we needed to have the dataset ID in the URL, but because our collection systems are managed by different IT folks, each system is a little different. And so this example here shows you one URL format, which works with the collection search center. However, for a different museum, like American History, they had their configuration set up a little different. And so it might have a question mark in here, it might not have the word record in here, and it needed to change to resolve properly. So we did a lot of me registering URLs and then testing to see if they worked and then changing them and that sort of thing until we got them working. We are moving on to phase two of open access now. We're going to be implementing ARCs with our archival records, and we did not do this in the initial launch because they are more complex records. And one of the issues we have are that we have collections that are split between two collecting systems. We also have times when objects are moved from one collecting unit to another. So we wanted to make sure we had time to deal with those correctly. And so we waited for phase two. The museums that have gone live are already asking about the ARC inflections option. And this is where this screenshot is actually from the CDL and to t.net website describing inflections, and you can put a question mark at the end of the arc, and it will go to, for instance, a policy statement, or a brief metadata record. So we have folks that definitely want to try playing with that. So that's on phase two also. And my last slide is just showing you the data flow one more time. This is one of our arcs going to the end to t.net resolver, which goes to easy ID, where this URL is registered for our NAN and this data set ID. Which you also see here. And that happens to be for the American History Museum. So it goes down here. And it's a metadata record. So then it serves up the American History collection web page. If this were actually for a media resource, then it would be moving to this server, and it would have a different data set ID. This shows you just the idea that we have lots of systems. And we work with TMS museum. I'm sorry. Yeah, the design museum TMS system. We also have MIMU, MIMSY XG, and for those, we also have multiple different types of installations. So thank you for listening. And if you have any questions, I'd be happy to answer them at the end. And now we're going to go on to Karen's presentation. Okay, so hopefully you can see that. So I'm Karen Hansen. I work at Portico and the senior research developer. And I'm going to talk about how we use ARCs from the Portico Archives. So Portico is a community supported preservation archive. We work with both libraries and publishers to preserve electronic scholarly publications and make sure that they're accessible for future scholars. We assign ARCs to every package that goes into the archive and a whole bunch of other things inside those packages. And I'm going to go into that in a moment. We're designing ARCs around 2006, and we have created over 2 billion of them. We chose ARCs because they're flexible, opaque, unique, and we can easily generate a lot of them, which is very helpful. And also we wanted something that was recognized by the community. We're involved in the ARCs in the Open Project because of our extensive use of them. And we'd also like to see that community perpetuate and grow over time. And also we may be adopting some of the updated technical specifications that are being developed. So we wanted to add our voice to that. So I'm going to explain how Portico uses ARCs. And in order to do that, I kind of need to explain a little bit about our process. So once we have an agreement with a publisher to preserve their stuff, then they start sending us batches of content. So for example, we might get zip files that have a bunch of articles in them in XML and PDF format. And then for each publisher, we have a customized workflow that takes all those files, checks them, and reorganizes them into packages. And each one of those packages is usually either a single article or a single book, and we do have other kinds of digital content as well. And these building packages are called archival units. These are placed in the archive where they're replicated and monitored for changes. So that's the very high level process, but I want to talk specifically about what goes on inside of these packages and how we think about them. So this is our conceptual model for an archival unit. So there's no need to pay attention to our terminology that we have across the top but just to understand what these layers represent. So I mentioned the archival unit, which is the top level package, and that represents a single article in this case. Then within that there might be one or more than one version of the article say if one of them had an error in it or it was something missing then you might get an updated file that's a correction. And then within that content is grouped according to files that are intellectually equivalent. And I think the easiest one to understand here is a figure graphic so you can envision that a publisher might have the original high quality image and then a lower resolution version that was embedded in the book in the article. So it's intellectually the same content, but there's more than one format of it. And then finally all the way to the right you see just the files that make up the package. And the reason I wanted to show you this is because every green box in this image represents something that would have an idea assigned to it. And of course, each of these are global unique. And though we don't currently have a URI resolver for them theoretically, it will be technically possible to connect one at any of these levels, without any risk of colliding with another identifier. So if you open up an archival unit, its folder structure inside is flat. And what you would see is the files that make up the package. And each of them has been renamed to reflect its arc. And then in addition, during the process that I mentioned earlier, a metadata file is generated that describes how all of those things fit together in that content model. I'm just returning to this. So what the ox have done have provided these fixed anchors that can be used to articulate all of this structure and apply metadata at a very specific point of detail. So at the article level, you'd have, you know, the general title abstract DOI, then at the version level, you might have the date that the update was provided, what files were changed and why. And at the functional unit level, this is more structural to organize the content. So there's not a lot of metadata here. But if we started to see a lot of very rich metadata coming from for supplements from the publisher, then that could go with this level. And then finally, the file level where you would have things like technical metadata checksums and so on. And of course, all of this helps describe a structure of the package. So with this background, what I'm trying to highlight here is that the advantage of having all these arcs is that it supports our self describing archive. So the protocol in the protocol archive, the files are the archive, the software over it that manages things and supports different user interfaces. But if you take that away, then you just have the files, then all of the content is in there. And its structure is documented by this preservation metadata, and then each piece is uniquely identified by its arc. So the fact that you can assign these arcs at any level to concepts and sections of metadata and digital objects, it's really critical to the way we use them. So here's a tangible example of how we use them to support the expression of a package. So when a publisher provides full text to XML for an article, we normalize it to a format called Jats, which is an NLM XML format for articles. And when we do this, we take out the references that the publisher puts in there, which usually points to, you know, their local drive or a URL on their website. And we replace those with arc IDs. So there's no ambiguity about what belongs in that point in the article. So if you're still wondering how we created over 2 million arcs, here's a breakdown of the numbers. We have 110 million archival units. That's the interesting ones that John mentioned that maybe will be the first ones we resolved to. And then, and within that we have 121 million versions of content. So from that you can extrapolate that there have been about 11 million updates to packages. And then we have 1.8 billion files that have arcs. And then we also assign arcs to other things like sections of metadata. So for example, as things go through the workflow, different different things happen to them and it generates events to describe them that those each get an arc as well. So I don't have an account for that last one, but the others add up to 2 billion and I expect that for the point we'd add another half billion or so. So historically, we've chosen not to reference the resolvable URL form of an arc and that's partly because we want the archive to be as independent from software as possible. But we're starting to see some use cases recently that make us wonder whether we might implement a resolver to support rendition for certain kinds of publications. So I just wanted to finish with an example from a current project I'm working on called Enhancing Services to Preserve New Forms of Scholarship. And this is a Mellon-funded grant. It's a collaboration between NYU Libraries, Fortical Clocks, and a number of university presses that are generating these really new interesting kinds of scholarly work. And these works go way beyond just texts and images. They have enhanced features such as interactive visualizations, embedded multimedia annotations and things like that. So the purpose of the project is to identify what about these could be preserved at scale. So one specific challenge we looked at very early on was EPUBs that have remote multimedia resources that are visually embedded in the book. So if you're familiar with EPUBs, if you're not familiar with EPUBs, it stands for electronic publication and it's a format used for EPUBs. But if you open them up, they actually look a bit like a website inside. And they have a bunch of XHTML and XML and this actually is helpful for us because we transform a lot of XML so you can, if it makes them possible to transform using XSLT. So for example, this is a book from the University of Michigan's Fulcrum platform and this book has quite a lot of long, high quality embedded videos. And although the publisher has the video files and can provide them, if they were embedded inside the EPUB even at a lower resolution, they would make the EPUB very large and just not very portable. So instead they host the media file on the Fulcrum platform and just reference it. So if we received these and left them as they were for access, then eventually if ever we needed to make this available because it was no longer available on the publisher site, you'd get this 404 box in the middle of the publication. So on the one hand, we could do something like a PDF and move the resources inside the file and have a really large file. But alternatively, one other option might be for access, if we had an ArcID resolver, we could replace the links that are embedded with a link to the protocol resource version. And this way it would be presented in a form that was closer to the original. And not also, this would be pointing directly to the video file, which is similar to the XML example I showed earlier where the ArcID pointed directly to the image file. So that's one example that we're thinking about how we might implement a resolver beyond just the top level. And just a quick plug, I'll be doing a, I'll be part of a panel on about this later in May, if anybody's interested in watching that part of the C&I panel. So hopefully for this, you can see we use a lot of ArcIDs. We've served us really well, but because of the flexibility and how easy it is to create a lot of them. So I'll just leave it there and say thanks to everyone for organizing this and I'll hand things off to Tom. Tom, can you turn on your microphone? Tom? Yes, sorry about that. I think we're, the problem is having multiple screens. This is Tom Creighton, and I wanted to just introduce myself. I'm the CTO of FamilySearch and we have been using Arc's for some time. This is a quick overview of our use of the Arc standard. And I'll just point out that we started minting Arc's in 2012, it really got going in 2013. We had already implemented our own mechanism for supporting long-lived IDs, but we wanted to transition to Arc's because we wanted to let our partners who depend on our links to, you wanted them to know that we've, that we really stand behind keeping those alive. And by going with Arc, that gave us a stronger message in that respect. Since that time, we're up around the 20 billion Arc publication level. We are the, originally we were a genealogical society of Utah, which was founded in 1894 and immediately put everything on the web. And in about 1998, 1999, there was a decision that we need to remake ourselves so that we could allow people easier access to the content that we've been collecting for many, many decades. And so we now publish access to source materials gathered from around the world that are of use in doing genealogical research. That sounds so difficult and in some ways it is. And I think it's important that I should switch and say the word family search, or rather family history, because not only is it easy to spell, but also it indicates maybe a little bit more approachable thing than is genealogy. But our site is open to anybody at no cost, all of our content is available, typically to everyone, and it is supported by the Church of Jesus Christ of Latter-day Saints. This is a quick overview of the process that we go through to publish source materials. These are things like census records or church records or civil records that we've gathered. And up here this picture is a depiction of one of dozens of portable imaging stations that we've developed. And these are found all over the world. Often we set them up in a very small archive some place and some other way place. And the results of those images and the metadata gathered with the images goes into a set of servers that are hosted in the cloud. Almost everything we do is hosted in the Amazon cloud and so that's where the images get processed and we do some image enhancement and so forth. We also have an extensive collection of about three and a half million rolls of microfilm covering similar kinds of data, just different collections, different data sources. And these are archived in a, literally in a cave, bored in the mountains east of Salt Lake City. We've retrofitted some of that cave to put a small data center in there. And that small data center hosts one of our preservation stores. We have two full preservation stores, each of which stores at least two copies of everything that we have. And so the images go off that direction and then they also go down into the reduced in size and produce thumbnail images. And these are placed in our online access store in Amazon. We can then engage through a crowdsourcing model, engage lots of people, typically volunteers who do partial to full transcriptions of the records so that they become searchable. And in addition, in very recent times, we've managed to make use of machine learning technologies that enhance that process. And then finally, everything is available on FamilySearch. So let's go through and look at how arcs play in this. This is a screenshot of a search page. And so in this case, I've searched for very simple search. Thomas P. Creighton, my grandfather, birthplace in Texas, and we got along his hit list. This top one here is the 1940 census. So this is an entry for his household with my grandmother and my father and my uncle. And this hyperlink right here, hovering over it, causes the URL down here to appear. And so if you click on that, that then gets resolved. We're going to talk about that so you can see that's an arc. Here's the arc that we were just looking at very quickly. We've done something somewhat similar to Portico. We call these things sub namespace providers or sub name assigning authorities. And this gives us a couple of things. One is, we have a very distributed environment. And so the naming of these arcs, the actual identifier, this part, which that's just one example, that that is distributed to multiple servers and different teams managing them. And this lets each of them run largely independent of each other. It also helps in resolving to the artifact and I'm going to show that in a second. If you entered this request into a browser, you would get a page that looks a lot like this. And this page then shows the data about the person that I was looking for, Thomas P Creighton, as shown in the census record. In addition, there are several other elements and these all tend to have arcs assigned to them. So the ones down in the lower left corner are other members of the household with for that census entry, and they correspond to cut off the locator portion of these arcs and just showing the differences. So each of these one colon one pieces references what we call a persona, which is an entry in a source record for a person. And in addition, then there is this thumbnail of the original document and that's one of those things as I mentioned earlier that we produce during the image processing we produce a series of thumbnails, we also produce the full image and I'll show you that in a moment. This, this particular thumbnail then has associated with it a link. This is a hyperlink right here. And that hyperlink is represented by this URL. So the actual image URL is a different ID. Notice the three colon one that all this is opaque to the end user, of course, and has meaning only within our systems but this indicates that the mentor and the resolver also have to do with the image management portion of our environment and notice the ID is much different. That's because these images as they're captured out in the field, you're not communicate centrally for a centrally managed ID. So it's much more like in fact they really are ultimately a form of a UDU ID. This other portion is a query parameter attached to this reference, which is an optional query parameter. As part of the HTTP specification says such things if they are URLs must be URL encoded. This encoding is exactly this particular ID. And by doing that, we're going to now sort of, if you will virtually click through this URL and see that oops, sorry, I lack one thing. This is another arc. This references a we call our conclusion tree or a shared common pedigree. And that's where the name Thomas Percy Creighton has a bunch more genealogical details and it's associated with this entry on this page. Clicking on this will take you to a different application which is where we have our managed pedigree. We're not going to look at that today, I just don't have time, but this is then we click through the other arc, which shows the image. And, and so over here really quickly. This is the arc that got us here. Now I've broken it into pieces. So you can see them this piece of course is the locator. Just locates what it's basically the domain where the resolver is the resolver is within our infrastructure and I'll show you a picture of how this works a little bit. This part of this is the main part of the arc. And this gets you to this overall image. This other thing says, look the context for this particular view should be based on Thomas P Creighton and so it brings up the record data that's associated with that. And shows it down here. So I've highlighted the portion of this image where the transcription came from. In addition, then there's this information tab. The information tab is another portion of our system also based on the arcs. We do a look up in a different part of our system for the metadata. And we create this, this stuff that's up on top saying where it comes from and then a proper reformatted citation because many of our researchers want to have a correct citation. So, now this same arc and I typed this wrong. It should say Thomas P Creighton. Anyway, it's the same arc that we started looking at. If we made the request instead of asking for HTML to come back, it's called an accept header. HTML, if we asked for JSON, we would get a JSON document. It's also possible to ask for XML. You'd get exactly the same data back. So this is the actual transcription of the record. Formatted in a form that we invented and have put out in our industry called Jetcom X is for how you represent genealogical information. But in this case, we've represented it in JSON format and could also be represented in XML as I said. So it's the same URL. And so very much like Portico used a portion of the URL right up in here, a portion of their arc identifier to direct traffic to go get different, either metadata or different representations of the same resource. We do that slightly differently, but it's the same idea. We make use of the accept header to do that. I wanted to summarize our use of them. I went through the other day and added some things up. And as of the 15th of April, we are at about 8.8 billion arcs referencing what we call historical record persona. That's like what we started looking at Thomas P. Creighton. So any individual who's referenced from a source record, we internally call it a persona as opposed to a person because person records are in our vernacular are part of the family tree. That's the shared pedigree. In addition, we have a number of uploaded personal pedigrees. Each one has personas and each each tree also has its own arc. And so that's running around 1.5 billion right now. Digital images are 4.3 billion and it makes sense that we would have more personas than images because you tend to have lots of personas on each of the original documents. And then I can show you in millions the growth rate for each of those things. So we're in total something north of 21 billion arcs that we're managing right now. Very quickly. This is the last slide that I have. And this is my way of looking at and believe me, this is a simplified view. I know it may not look good of how we do resolution. So if one of these arcs like this one right here, which is that persona arc for my grandfather, if that request comes from a browser, it comes in through our gateway and is handed to one of over 100 reverse proxy servers. And they look at the arc referenced as well as the accept tenor. In this case, if we're asking for an HTML page, it routes that right through and it's not routing, it's doing a forward proxy or reverse proxy forwarding. That goes out and into the Heroku cloud where the application lives that knows how to paint the page we were looking at initially that shows the record with this thumbnail and all of that stuff. If on the other hand you ask for Jason or XML that the DTM is what we call our reverse proxy will say, oh, well, the media type that's being requested is different. And I know how to map that into a particular fleet of microservices that knows what to do with the same thing with the images that we're getting. So the image of the original census sheet would have come in to a fleet of microservices handling images and they know how to look up that image in the particular resolution that is being asked for if there are the parameters. And it will return a signed URL assigned as three URL, but then a client can resolve immediately. And so it's a, you do it that way for a variety of reasons as opposed to simply remove returning the image straight through this infrastructure has to do with scale and availability. But in any case, that's how we handle arc resolution. We chose to do it all ourselves, in part because we can, and in part because we have very stringent requirements with many of our partners who have who require certain service level agreements with us in terms of how fast we will respond to your questions. So we couldn't afford to hand that off anywhere else. And that's the end of my presentation. I'm happy to take, hand it back to John and handle questions and go from there. Thank you, Tom. Thanks to all of our panelists. That was a really interesting presentation on arcs. Persistent identifiers, a nice cross section of the institutions using arcs and the billions of arcs that have already been minted. So thanks so much for bringing us up to speed on that. And at this time, I would like to invite our attendees to type any questions you might have for our panelists in the Q&A box, which you should see a little button at the bottom of your zoom screen that says Q&A. And if you click on that box should open up and you should be able to type in your questions there. There's also a chat box if that's easier for you. Feel free to type in your questions there and I will happily pass those along to our panelists. I will just note that we did already have a request in the chat box for the slides from today's presentation. So once we get those slides from the panelists, we will put them up on the meeting website. And happy to bring those to you. We'll also be embedding a video, the video of today's webinar at that same project briefing page. So while we're waiting for folks to type in their questions, I just want to take a moment to remind everyone that this webinar is part of CNI's spring 2020 virtual meeting. We'll be running this meeting through the end of May, and I'm just sharing with you there in the chat box, the direct link to the schedule for the rest of the meeting. So I hope you'll check that out and see there are lots of offerings yet to come in the four or five weeks left in the meeting. So there's lots of good stuff coming up, but tomorrow we'll have a presentation from Micah Vandegrift and Shelby Hallman on immersive scholar development documentation, display and dissemination of experiential research and scholarship. That will close out our week of webinars and many more coming the following week. So please do take some time to check out that lineup and join us for more offerings from our CNI spring virtual meeting for this 2020 year. I also want to just mention that with this webinar environment through zoom, we do have the ability to turn on your microphone so that you can engage directly with our panelists make a statement. Are you considering adopting ARCs for your collections for your project. Do you have questions about its applicability for your collection. Now's a great time to ask some folks who have firsthand experience with this resource. And with the schema and please feel free to raise your hand type in your questions and we'll be happy to field them live to john best Karen Tom any, any final thoughts from you on arts and your experience with this project. I'll just say that how humbled I am at the numbers that I've gotten wrong for apparently by my panelists have corrected me in a number of the wilds numbers I pulled up. I was way under counting with with family search and over counting perhaps with portico although portico I did say 40 billion by Sheila Morris your predecessor had mentioned that they'd used up 25 different mentors and I computed the capacity of a mentor and that was. So I'm guessing about 38 billion ARCs maybe got were used assigned to temporary files and they got thrown on the floor. That's my guess, or they were just discarded, but you have about 2 billion then right Karen. Yeah, about two and a half billion that and that's based more on counting the actual objects where she may have gone directly to the mentors so. So it might just be a different counting method. Yeah. And Tom you have, you had mentioned, we've been saying 3 billion. So now you're saying 8.8 or 21 billion depending on how you count. That's correct. The family tree application does support arcs, although we don't show them to the end user. But if you typed in an arc with with what you do see from the end user, you would get that thing and our partners know this and are able to use this technique so. Yeah, even discounting the tree which is at least one in 128 billion or so, we're up there a little ways. Okay, well let's get together and talk about those numbers, we can prove our estimate for for the audience on the experience. We, as I mentioned we looked quite a bit at different techniques. Handle was one that we considered but knowing how many artifacts we would get, we knew we would end up building our our own substructure to a large degree and only purchasing a few handles because the cost would be prohibited. And then also because of the redirects that we couldn't handle handle overloading that term. We, we wanted to make sure we dealt with that ourselves. But the final takeaway on this is, no matter what technique you use an arc is a great approach. We're very pleased with it. Ultimately, your ability to stand behind a URL has everything to do with the commitment of your organization and far less to do with the technology. The technology is an enabler, and it helps us but ultimately it's the, it's up to the organization and I can, we don't have time but I could tell you about cases where we've made blunders and as an organization and and caused ourselves trouble, in spite of the fact that we were using so. Yes, your identifier is not a magic bullet. You can still screw up big time with whatever identifier system you choose. Yeah. All right. Well, with that I, I guess I would like to just thank you one last time for coming to see and I'm filling us in on what's happening with arcs these days and sharing your experience with them very much appreciate your being here and also to our attendees who took time out from their day to join us for this live webinar I hope you'll come back for more offerings from CNI's 2020 spring virtual meeting and I think I'm going to go ahead and turn off the recording here and I will end the public portion of this session now and anyone who remains among the attendees if you would like to come up to the podium and speak informally with any of our presenters. Please go ahead and raise your hand and I'll be happy to turn on your microphone and facilitate that process. So one last time. Thank you. And be well everyone. Yeah, thanks to my panelists until you Diane and Clifford. Thank you. Thank you everyone. That was really fascinating.