 So back in the spring, we put up a little note on the web. It was a hopeful note. And it said, if you would like to build something that might be the DPLA, or a little bit of it, or a big bit, or a medium-sized bit, just start. Tell us by June 15 that you want to do it, letter of intent. And then by September 1, you send to the steering committee your results. Please just send us a URL. And it can be big or small, or code or not code. But the basic idea is, give us a glimpse of what this might be. And what we'll do is have a review panel of people who will review these on behalf of the steering committee, people without any conflict of interest among the sprinters. And then we will invite the most promising of those, according to this review panel, to present to the National Archives, which is where we are today. So we're going to have six presentations that are full presentations. Each group will speak for no more than seven minutes. And there will be about seven minutes of discussion with each one. Then our review panel also chose three for lightning round presentations. They'll go five minutes each, sorry, together in a series. And then we'll have a commentary for that group after that. I want to thank the Blue Ribbon Review Panel. They were wonderful. They came to Cambridge after having done a ton of homework in reviewing these 40 submissions that were unbelievably great and rich. And they spent time locked in a conference room and came out with recommendations that I think are fantastic. They were John Weiss, Patsy Baudin, Michael Santangelo, Maeve Clark, Eli Newberger, Laura Debonis, who I saw here today, David Rumsey, and Jessamyn West. So I'd like to have a round of applause for the volunteer group that did the reviewing. All right, so the first of the groups is one that was referenced earlier. It's called Digital Collaboration for America's National Collections, representing three organizations that have worked collaboratively together, the Smithsonian National Archives and Library of Congress. Martin, Kel-Fadevic, Jin-Sing Wang, and Pam Wright are here to present. So over to you guys. The main focus of our Beta Sprint Entry was actually to show that these three large institutions could actually collaborate together. So that was sort of the quick thing. And I think that's sort of the proof of concept that we did. So everything else is all sort of just the special sauce part of it. So again, we've heard already some of the statistics of our different collections. And what we primarily did in this case was using some records from each of our three institutions. We put them into our existing Smithsonian's Collection Search Project, which Jin-Sing will talk about in a minute. Primarily just to show how this different type of data from museums, archives, and libraries could all interact in one large data system and provide different types of access to the data. So that I'm gonna turn it over to Jin-Sing who's gonna talk a little bit about how we did this and then you can ask some questions about the details. I think Mark has alluded to the fact that we're using the existing Smithsonian Collection Search Center and extended to the Beta Sprint Project to see how things turned out. And the three major organizations are participating. And we have all sorts of interesting materials. And we wanted to make sure that this is something that we could use not only for one institution or the interface, but for many. We started this project by making sure that we have a dynamic, extendable metadata model. And this is something that the Smithsonian has spent time reviewing all sorts of national standards and looking for commonalities from among them, looking for specialties anywhere from standard printed materials in mark formats to scientific specimens, mosquitoes, and little bugs, and making sure that the taxonomy information is properly addressed. So therefore, looking over the national standards of various cataloging standards, they ensure that the metadata covers all of the commonalities. And this is one of the examples, a tiny snippet of what the metadata model actually looked like. And this particular section addresses the geographical location of the tags. So for example, a photograph could be taken at a particular location. And therefore, we could have latitude and longitude information, which later on can be used for searching purpose. A bug, it can be collected at a particular location so that this can help the scientists to track the migration patterns of a bug or a bird or a whale, anything like that. So this is just a snippet of what the metadata looked like. Obviously, we have great cooperation from the three organizations. These two records shows the 11 records from Aaron's National Archives. And this was under a time constraint, so we did data manuals conversion from the National Archives and from the Library of Congress on the right. After we get those records, we ingested those records into the existing Smithsonian 7.4 million records collections online environment. This is with the existing mechanism, data ingest process, everything, nothing else has changed. And of course, as a result, we now have the libraries, archives, and museum collections all in one place. The key element is we wanted to make sure that everything gels together so that they're not separate independent little dots floating around, but rather, we wanted them to work together. So the higher level system architecture basically includes, as you can see, that on the bottom, it represents data from various independent sources with specialized databases and focus. And each of the organizations, these large organizations, all have similar kind of a challenge. But the goal and what we ended up doing is that we allow the metadata model be the guide and standardized data format. And the data is then ingested into a central index, search index, or temporary metadata repository. This is not permanent. This is purely for indexing sake. In coupling with the concept, then we allow several web services for example, to use the data. For example, metadata delivery service is focusing on searching and retrieval of the textual information. The, on the right side of the screen, you will see a image delivery service. And that is for the purpose of doing manipulation of indexes, images, zooming, and resizing. And finally, the tagging service is to allow getting public input and letting people tag our materials. With these three services then, we allow people to start developing their own web interface, mobile interface. So by no means that this is a single interface or architecture. I don't have a whole lot of time, but this is a high level system architecture. Our processing goes through starting from the bottom up. You will see all of the databases. It gets ingested into a raw index and it's holding place. And then between, you will see a little box go up and down the arrows in the middle there. It's a pre-processing stage. And that is where we do extensive scrubbing data, standardization, making sure that data coming from different sources actually conform to certain standards. And then we push that into our master index. And the master index then gets replicated into a number of slaves to handle high level of traffic. At this point, the data has no particular phase until some web applications start to happen. You see the red line, dotted line that indicates the firewall. Outside of the firewall is when a number of applications can be developed. And I will end there for questions. Hello. Hi. My name is TJ. I guess my question would be how often in the final implementation, how often would that indexing happen? Would it be a one time per record thing or would that be updated as they might change? The answer is at different intervals. Different organizations. For NARA and Library of Congress, so far we have only done one time. But I'm sure that if we progress down this road, we can negotiate a refreshing schedule. For internal Smithsonian, we range from daily updates to weekly to monthly to quarterly. It all depends on how often the data is refreshed and updated and also how often the organization who are contributing to the unit feel that they can handle on their side. But the system is built in such a way so that it's accommodating all levels of frequency. Good question. Jerry Simmons, National Archives. I'm the team lead for authority cataloging, and I'm interested to know how you got these data to interact. The data, obviously the system part actually is not difficult to build. And ultimately, what really glue everything together and have the data start interacting is really through the metadata themselves. And I think that the catalogers, professional catalogers are the people that we need to thank them for. The data, if obviously the three organizations that we happen to be dealing with all use at least the internal standards, if not national standards. So we have the benefit of take advantage of AAT, Sasori, and architecture, Arden architecture, Sasori, LCSH, name, subject, headings, all of these terms are ultimately very important. However, when we talk about, and I mentioned that there was a lot of data scrubbing process, the state of scrubbing really is the standardized terms. There are times it's just simply misspellings. There are other times that are just way too detailed levels and we've just matched those terms up to the national standards and we flip them from there. And we have a database that's designed to do just that kind of thing. We identify exceptions and we identify what to flip the terms to. And we have right now about probably 50,000 lines of exceptions that we process through to scrub the data to make it standardized. Once the data is standardized, then the records naturally start interacting with each other. Jerry McGann, University of Virginia. Can you speak to the kind of traffic that you have to expect if this becomes the platform for DPLA? It's a gigantic upscaling of the amount of traffic you can expect. Okay, that's kind of a hard question to answer. I don't have the numbers on hand. However, the Collection Search Center, which is a main website that retrieves content from the end users. We have in the neighborhood of probably about 25,000 unique visitors per month. However, we have this, as I mentioned, that this system is not built for one interface. We have web applications using the data. We also have five different individual web applications using the data. We have actually two mobile applications each with a different focus accessing the data. I don't have their statistics, but I know that majority of the traffic actually come from those other applications other than the main one that we provided. So in other words, if DPLA takes off and the local towns wants to write their own application to tell the history of their town or a particular story of their culture, they can really draw on a lot of materials from other organizations. One example I'd like to point out is that Smithsonian has a well-known American Indian Museum and they have wonderful collections. But out of pleasant surprise, they have found out that Natural History Museum has an anthropology department which has tremendous amount of collections that they find extremely supplemental, useful. And also, there is a National Anthropological Archives that the Smithsonian has that has even more cultural and language materials that they find it useful. So they are all these pleasant surprises that before we didn't realize and now we find it good. Please join me in thanking you, too. Thank you.