 So the afternoon sessions as you have seen include two substantive sessions on the technology development that is very actively underway right now in advance of the April 2013 launch. We are keeping expectations low with the hope of outperforming of course. We've got some wonderful partners who are working with the secretariat and with the DPLA technical aspects work stream to do that and one of the people who has been at the core of this effort is a man named Jeffrey Licht who's on his way up to the podium right now. Jeffrey is a partner to the effort as a consultant and really the sort of core project manager for technology all the way around trying to implement what we have heard from the community in the technical aspects work stream led by Martin and SJ and others in that respect. Jeffrey comes from a group called Pod Consulting. He's a partner at Pod which is a Cambridge Massachusetts based firm that we've had a lot of experience working well with. He also is at Sapient one of the big integrator firms as well in an international setting and we're very lucky that he has devoted himself so substantively to this project Jeffrey Licht. I guess the purpose of this is to really start getting specific about what it is that we're actually going to be building for April and the process for getting there. There's been we've been talking a lot about you know big visions, big goals and wonderful content. At some point we actually need to build something specific and this is what I want to do is just give a very try to give us clear a picture of possible what we're planning to do for April so you know we can all go out can communicate that to the world and also if it's wrong or you have some questions we can give you enough detail so you can you can try to answer those questions. So this is what I said I want to give a concrete picture I also want to talk about how what our progress is in towards getting towards that goal how we're moving what the plan is between now and April and very importantly talk about the different opportunities for getting different groups involved because we know there are a lot of people who want to contribute to the DPLA in some way either by attending these meetings there's a very good opportunity for contributing very solidly in the technical front and we need to make sure we make those opportunities as clear as possible and make sure that there are things that are going actively contribute to our April launch so this is sort of the the big picture of what the DPLA platform for April 2013 is going to look like so we are building at the core of it is a metadata repository which is going to be aggregated from the various service hubs and content hubs there's a ingestion process which is going to basically take the aggregate the content from those hubs and put it in the repository in a form that can be easily used and sort of most importantly there's an API on top of it I think that's something we we haven't actually been talking about a lot today so far that I've heard but the API is really the mechanism by which we're going to expose the great content within the DPLA to the world for those of you who are not familiar API is application programming interface it's basically a set of transactions or calls that you can make against the API that will let you do stuff with the repository and I think this is one of the sort of key distinguishing factors in the DPLA overall DPLA platform because we're not telling you you can only look at it through this website we're not restricting the API to a subset of people who have special privileges to do it everything goes to the API and everybody has the same level of access and that will be access to all the information they have in the repository we're also you see if there's a front end which is going to be built on top of the API which will be sort of the you know the place where people can go in April to browse the content within the API and then lots of colored boxes which represent all the great stuff that we expect people to build on on top of the API and one of our main success criteria here is in the ways it will know that we've done the right thing is that people want to build stuff on top of the API that they have the ability to build stuff on top the API and they actually build stuff that people want to use and you know we're going to talk a little bit later about how how we want to make that happen but that is something which depends critically on the community of developers that we want to build on the DPA participating and getting engaged in it the other piece here is the you know there are things that we expect people are going to want to be able to do with the content within the DPA that aren't necessary practical to do through the API an example is you think about Hathi trust you have the you can um if you want to do analysis of the entire corpus of material we have within the repository that's not something you can easily do in the API so we're providing a mechanism by which you can just take a copy of the whole of all the data and do what you want with it and since you know I believe the the rights for all the metadata are going to be c0 that is something that we can actually do finally one other piece of this is that we've got this um we want to package all this up not just the content but the code the entire infrastructure in a way that anybody else can go and take the entire DPA platform and run it locally run it on a cloud server just have the ability to get that set up which will we hope will foster um you know the development of tools that might be useful in a more local environment and um you know has other benefits we'll talk about in a bit but the main thing here is like when we talk about when when I today talk about the DPA platform what I'm referring to is the purple stuff so that's sort of the core of us what we're putting together so in terms of the metadata repository this is um you know we are building this is a little remedial for everybody who's who's been involved in this but we are building a metadata repository we are not building a full content repository it's just metadata metadata is information about the content um and as we've talked about um today that's going to be aggregated from the various service hubs and content hubs that um that participate in the DPA there are different types of entities or objects that we're going to be storing metadata about we have this concept of an item which is an object that we have metadata for um like a newspaper or an image or a piece of text we also want to keep track of collections so that um when you have for example we often have many items that come from a particular collection we want to keep the relationship between those items in that collection together and have some information about the collection which gives valuable context to um the people who are looking at that item and trying to understand what it is and trying to find it we also want to keep track of contributors in terms of who's actually contributed the object to the DPA so we have um you know know that sources and also something which is a little farther out in the roadmap but is you know opens up some really interesting possibilities is the idea of events around objects so um you know if you have you know if we're dealing with book data for example an event might be that somebody has read the book or borrowed the book or someone has you know viewed the object by keeping track of the information it enables us to do some interesting things in terms of highlighting most used items and um you know providing some uh additional guidance or navigating through the repository to find stuff that other people find interesting and find um things that other people who are looking at the things that you might like might find interesting um this is just an example of metadata just being super explicit about the types when we talk about metadata this is what we're talking about our schema is based on Dublin Qualified Dublin Core we have a um we published the draft schema out on the DPLA wiki we're starting to get some some very good feedback on it which we're gonna which we're gonna be addressing which is great I would encourage everybody too who you know has an opinion to take a look and and send us your thoughts but we are you know these are pretty much some basic Dublin core items um we're also planning on keeping track of having previews with the items in the repository but you know this is sort of what's going to be coming in and as we talked about in the discussion around service hubs you know we're expect part of our expectation is that the service hubs are doing a certain amount of quality control and standardization normalization before the data comes to us because obviously one of the big challenges in dealing with um uh repositories aggregation of lots of other repositories is inconsistent metadata metadata various levels of detail um and uh you know we want we at you know at the DPLA we don't obviously have the the people to go through and and fix all the metadata so we're gonna be relying heavily on the service hubs to do the best job that that they can um the and we'll talk about a bit more about that in terms of ingestion I don't know can anybody guess where this comes from nobody Clemson South Carolina that's right and obviously the um the thumbnail doesn't really work blown up to like six feet high but there you go um okay content sources this is these are the service hubs and the content hubs that we've been discussing so far um you know obviously these are primarily cultural heritage objects that we're talking about there are lots of newspapers we we are definitely planning to include books in the in the um um you know in the first in the first iteration by April once we get the the data sorted out the we are starting out by doing OA IPMH harvesting to get the metadata from the content from the service hubs that's going to work for four or five of them there's some which don't have that harvesting setup right now which will provide us a good chance to test out some of the other ingestion mechanisms that we want to use so you know you know we're starting the OA IPMH it's not because we think it's you know the greatest thing in the universe or um you know it's it's what we want to do but a lot of content out out there available through it and so we want to in the interest of getting things moving and getting real content into the system we want to start with that um so you can get going as quickly as possible so ingestion so this is we are building an ingestion pipeline ingestion process that is going to manage the retrieval of content from the various content service hubs and sort of normalizing it cleaning up and putting it into the repository so obviously there are different elements to this actually getting the files doing format translations is necessary um and then sort of enrichment normalization one of the things that we are expecting we're expecting not to do is really try to manage or massage the metadata within the DPL itself because we simply do not have you know the people of the time we're not the people who are going to be closest to the content so you know we want we want that cleanup to be done before it gets to us but we know that they're always going to be they're always going to be inconsistencies or differences between the different aggregators and different service hubs that we're dealing with that's just the fact of life so we're going to have to we want to have a library of essentially modules and cleanup routines and enrichments that we can apply to the content as it comes in to you know normalize date format so we have two separate date formats are coming in we can figure it out we can standardize it for ourselves and not have to try to get all the different service hubs and content hubs to standardize what they're providing us um because that's would be difficult um it would be very difficult um and the other reason is that we um there are certain functions that we want to provide through the api that depend on certain types of data being available so for example if um you know one way of browsing the data within the repository could be let's look at a timeline and see what types of you know what's happening around 1950 1945 and 1955 if we have content that comes in with a date range of say you know 20th century you know does that go on the timeline or does it not we need to basically um massage the data to a point where we can we can use it effectively within the visualizations and then and then navigational paths are providing people through the api into the front end so they can actually get to the content another example might be doing if we get um you know simple things like getting we get a city we need to figure out what the actual latitude and longitude that city is and make sure that we have that in our repository so when we do geographic searches we can pick it up um we have a lot of ideas around this and you know this is not something that's new a lot of people are doing this um but the the goal is to gradually increase the library of enrichments that we're applying so that we can over time not have to rebuild these things and reuse them for for each of the um different service hubs and content hubs that we bring on board the final another piece of the ingestion process is that one of the concerns that came up yesterday when you're talking about metadata and the technical architecture is how do we make sure that we don't unnecessarily lose the richness of the data that we're getting i mean for example we're getting mark records they have a lot of data that's not that's not going to be part of our core schema so what we're going to be doing um is preserving the entire metadata record that we receive for each for each piece of metadata as as part of our record so even if we don't have you know we'll do all the mapping to get to a standardized set of metadata but the original record will still be there for anybody to be able to retrieve it through the api um and we ideally we would like to be able to index that so you can do faceted search and faceted browsing based on fields that may be specific to a particular collection or a particular repository a particular type of data we're not exactly sure how we're going to do that yet but that's one of the goals but we figure that by keeping the complete original set of metadata there that gives us a lot more flexibility in the future to um to do interesting things so for the api what the api is basically going to allow you to do is going to allow you to search stuff and do faceted search it's pretty much pretty much what it's going to allow you to do so you can do keyword search you can do date range searches location searches faceted browsing is absolutely key part of this so you can narrow your search down to everything that's within colorado and then narrow down by date and so on and so forth that's the the key functionality it's very it's pretty similar to what we would have seen in the um the alpha version api that came out you know back in april so it's it's it's similar in terms of types of functionality it's exposing the api is going to be completely it's going to be a rest api it's going to be pure HTTP to connect to entry data we're going to allow you to specify whether you want to get the data back in json and xml maybe in schema or html um link data is sort of something we're not really fully addressing right now but it's something we know is in the future um and we also want to to make this useful to people we want to provide you know documentation so it's easy to use full documentation api sample code and potentially libraries for comment like like a python library or ruby library you can use to just very easily access the api um these those are also things which potentially members of the community might be able to build as well one of the one of the questions that came up yesterday was how does somebody who is you know not particularly technically savvy or doesn't have you know technical people on staff how can they take advantage of the api so an example the type of thing you might be able to do is it's it's certainly possible to build a javascript widget which would make a given a title or a search box go to our api get the data back in display it is something which would be just dropped into any any web page anywhere without any particular programming experience so that's the kind that's the kind of thing which um you know actually i'd expect someone else someone is going to build as soon as you make the api fully available because it's pretty easy to do but we would want to be able to put that out there so we build easy on ramps for people to get using the api so the point of the api is to get people to use it um one of our big challenges and things we're trying to do is to make sure that we're making it easier for people to do that so we've heard about the the you may be familiar with the beta sprinters who did work um even doing for the past year so we're looking at it continuing that effort so people can build some more substantial things on top of the api improve it out for us and let us know what we've done right what we haven't done right because someone's trying to build something on top of the api that's going to be useful and it looks like it's going to be valuable to the community but they can't do it because of x y or z we want to know that and the only way you can really do that is by getting people to use it they're just i mean we can do all the theoretical design that we want but that's really the proof we also um have an app fest which is scheduled for november 8th and 9th in chattanooga and i encourage you all to sign up for that give your ideas for what you want somebody to build for you um and you know that's sort of continuing the vein of hackathons have been a part of the dpla for a while now and they mentioned before documentation sample code client libraries all those good things to make it as easy as possible for people to come on board and start using it so in terms of the front end we heard earlier um somebody mentioned that iFactory has been retained to essentially build the front end and so i guess we need to be really careful what we say about the front end because it's not the front end it's a front end to the dpla that's a gesture to what the types of things that you could do a gesture to the possibilities and the the scope of that is basically to um provide search functionality provide faceted search provide detail you know detailed views of the content within the repository and also to allow people to you know identify form collections of their own and share those collections of the people so tag things that they want they want to put in a group and then share people and let other people other people use them this is um i guess a really important thing here is that the this front end is does not have any sort of privileged access to the dpla or to the repository um compared to any other tool that people might build on top the api so you in theory anybody could be building this this um this front end right now but we expect that by having you know a fairly substantial web application built on top of the api that will help us figure out are we actually exposing um exposing the right calls and will also help us understand if the cleanup that we're doing as part of the ingestion process is working because of some if you're trying to build a front end on top of our repository and you find out that you know the subjects are you know it may become very obvious that the sub browsing by subject is impossible because you know the subject cuttings haven't been um have been normalized properly that's something that we'll find out very quickly and then you know we'll see is that something that we need right now something later how can we address it so in terms of the roadmap so this is these are these are the things we're trying to build for april 2013 um it's the wet clay metaphor is is good i think our some of our clay is almost slurry you might say so it's i mean the approach that we're taking towards towards building this out is where it's iteration it's about move like moving the ball forward getting real data seeing what works and just pushing along we we don't have sort of a a day by day plan until now in april but we have a set of priorities and a sequence of events which want things to happen um and we you know we want feedback we want input that's sort of the one thing throughout this project that has been you know i've been sort of banging the drum maybe not very effectively because it haven't gotten huge amounts of it but we really want as many people engage in and looking at what we're doing so we can make sure that we're doing the right you know right thing so in terms of the roadmap for so this past april there was an alpha version of the api that was put that was put together um and we're actually we're and that that um the api some of you may have worked with or seen seen in the past that contained a lot of um many many book records from harvard's collection as well as assorted other pieces of metadata we are sort of following on in the spirit of that api it's the same many the same types of transactions but really focusing on um you make the ingestion process more less handcrafted less boutique more more industrial strength and putting the whole putting the whole thing on a more of a production putting the olcarius board over the next you know months and and maybe maybe years so our immediate milestone is really early november to have a version of api up which contains some which allows you to do some stuff and which has some cultural metadata in it because we have an app fest in november 8th and 9th and it's going to be a really really sad app fest or there's no api or there's no data so you know it's it really it really concentrates the mind when you have a specific deadline so previously depilated midwest was my deadline but then this came up and thought that's better that'll be that'll be a good thing to work for so in terms of the front end the front end design process is kicking off um on monday and we it's um you know the i factory is going to be going through a typical design process in terms of doing some discovery looking personas doing wireframes doing a visual design and the goal is to get to a fully fleshed out design you know front end visual interaction design by the end of december at which point um you know it'll be put in the html and then development of that is going to continue um in parallel with the api development in preparation for our launch in april so the a couple things about this one is that the um the front end it's you know april 13th is is just around the corner it's even more just around the corner than it was was yesterday so the front end design process is going to have to move fairly swiftly and we are still trying to figure out what's the best way to get community involvement so people can see what's being um you know what what the direction is that we're taking without um compromising the schedule which means that you know decisions need to be made fairly quickly so you know i'm going to apologize in advance for that that process is going to sort of be plowing ahead at full speed between now and december but um you know if you want to provide input and you don't think the opportunities are there just keep on chasing us and we'll make sure that happens um so between in parallel the front end being you know once the apis is done in parallel with the front end being developed we're just going to be iterating through and developing the platform and you know we are it's going to i guess when it says we are planning to once after the apis new versions of the api with the new features that we develop are going to continually be made available on the public facing site so people can continue working against it and and seeing what's in there we are going to be taking our sort of general approach is let's take one of our content sources ingest it see what doesn't work fix that get that to a to an appropriate level of of quality um so that it doesn't want to do then do the next content source and hopefully as we move through this process we will um subsequent content sources will be easier to do and if they aren't then we are obviously doing something wrong so but it's basically a continuous process of iterating proving until until april this is a you know this is a this is a challenging plan because we're building the front end and integrating and lots of things are going at the same time um so we are going to do our best to to keep on top of it there's a lot of other stuff that we have sort of in the hopper for april and again this is going to be prioritized on um you know and as we need it basis so uh you know if we don't have anybody who if none of the service hubs is trying to provide us data using ead then we're not going to we're not going to do that right now but this gives you sort of a sense of the types of things that are out there um you know as as don said earlier it's all about opportunity cost and prioritization so there's a ton of stuff to do some of this can be before april some of this can be after april i don't know which is going to be which at this point or what the main thing is or for a main priority to get the core the core functionality in place get the ingestion process working and get the front end in uh in a point which is going to be compelling and interesting so as essentially we want people to get involved this the dev portal on the dpl a wiki is really the best place to find out what's going on we're posting um status reports in terms of uh you know what's happened in the past week there that is the the hub in at which you can get all the information about the schema all our development documentation we're also have our our sort of week by week plan is being tracked in red mine which is sort of a project management issue tracking system it's open there it's public you can go and see what we're doing and harass us about it if you want to but it's um you know you really really want to get the details that's what's going on the code is on github we have two separate repositories set up now under github.com slash dpl a platform and ingestion are the um the ones we're starting with um which are the platform and the ingestion and you know again once code is is developed we push there and you can take a look and if you're adventurous you can download yourself and um and see what's going on so in terms of um sort of contributing the atfest is a great place there's a a spot in the dpl a wiki where you can go and suggest things you want people to build which seems like a great idea because if they don't build they haven't lost anything and if they do you you've got something um if you can attend great register sign up um and also just you know build on top of the AFI build something try something out see if it works for you and if it doesn't let us know and if it does great that'll be exciting let us know features you want there's also there are a couple of things that we um you know think that are also potential areas for other people to contribute that we haven't really figured out exactly how to do yet so just let you know what the these are on a radar one is that um you know there we're planning and building a library of utilities and modules that help in terms of data enrichment cleanup and so forth and that seems and how we're architecting this is that each of those libraries is accessed over arrest you know arrest api they use internally so there's a very you know it's in theory possible to just build a module that does um you know entity extraction or some interesting type of um enrichment that we haven't thought of so we would like to be able to let people you know give people the tools to build those modules and plug them in I'm quite figured out what the best mechanism for that is yet either what the best mechanism for that is yet we also want to make it easy for people to discover other things that people are building on top of the depilate api so um you know maybe this is a shared space and GitHub may be something else we don't really know yet but we want we what we do know is by the time people start building things with the app fest we don't have any place you can go and see what people are doing um on top of the api so these are the folks who've been working on developing and some of you some are developing so I've been part of the you know a team that's been reviewing and and sort of monitoring progress as we go um it's been you know it's been great um actually they're also um and that's pretty much all I have I've left a lot of time I think for questions so um I see one right here but we're just you want to wait for the microphone um a lot of libraries it says it's on okay there you go a lot of libraries use a Drupal to develop a front end the various applications and uh it seems like uh if you had a if there was a Drupal module for DPLA this you you could uh really leverage a lot of front end development has there been any thought of trying to interest uh someone in the Drupal community or or or somebody in the in the core coding group to uh come out with a Drupal module by April 13th for shortly thereafter I think I think that's a great idea I think it would be an excellent app test thing as someone in the Drupal community I would say yes that's that's great we we don't have any we don't have any specific plans to build ourselves right now but it'd be a great app test thing question over there or a comment I should say Jeff um I'm wondering uh given the fact that most of us have to do a lot of reporting of things uh to justify the work we're doing I'm wondering what as far as analytics and metrics has been talked about from the infrastructure side um I think it's been talked about in the sense that we want to have reporting analytics but we haven't but it's um I think once but it's it's one of the things we know we need to do it we haven't drilled down to the detail I expect once we the first time we wanted to issue we're gonna say okay we need some analytics and reporting on this and then we'll start building at that point but if there's specific analytics or reporting that you think would be useful then that'd be definitely something we'd like to know about any other comments or questions when you said search dpl a what exactly does that mean what do you search what do you search yours that would be a either a full-text search of all the metadata um or a search constrained by a particular field or it's a date search constrained by a date range of some kind or location search actually one thing I forgot to mention earlier is just in terms of um you know technology that we're using to build this there's a whole layer of discussion underneath this I don't really want to get into because it you know it's pretty time consuming probably of interest to only a smaller subset of people but we are we are using um elastic search to provide the core searching and faster search functionality the repository is going to be in couch db and then we're also using a bunch of um code from the recollection project library congress to kick off the ingestion work so relying on that any other comments I don't want to cut anyone else go for it um so there's a one email we got from David Rothman who's been a frequent participant in the dpl discussions online and I thought I just take this opportunity to share his idea just float it up hello thank you um it's not exactly a tech dev idea but it's one that you might react to and get it on the record um David's suggestion is that the dpl a eventually not necessarily a version version I think about surfing as a digital locker for books and other items for library patrons to use and enjoying eternal access to content they bought from amazon barns and noble and other places even if it were sort of a spin-off idea now this is not clearly core to where we've been going and I know it raised a lot of other issues but I wonder how just to use this as an example we might think about sort of keeping in a parking lot a bunch of ideas such as this one from Mr. Rothman in a um I don't know if it's someday desired functionality or how you're thinking about ideas like this that obviously we'd have to think about as a policy matter would we ever accept drm would we you know lots of things would flow from it but just so we kind of capture thoughts early in the process that aren't going to be in the the april 2013 version but we might want to return to conceptually I think that'd be great I mean it would be I think it'd be definitely useful to have a place or all the things that we've thought about and the radar and either we've said you know it's our understanding we're not going to be doing this or we want to do at some point in the future and giving people a way to um to contribute to that would be would be great that's something you can get that's something you can set up yeah okay um I know this isn't a question that applies to the situation between now and April but it's a question that's talked about for a long time um we talk about orphan works in a in a um paper environment I often think of this whole world of bastard works in a digital environment that is to say um digital uh archives of material that have been built in html uh and that uh cannot be ingested I've been doing an awful lot of work with incredibly rich depositories that just cannot come in yeah what kind of um ideas are there out there for automating the ingestion of this kind of stuff so I mean this is something that we we have has been on our radar and we've called it the web harvester basically and the the idea is you would point it at a collection like essentially a website which contains a collection and then um you know it sounds it's not easy it's hard but you know go and harvest that and like try to understand like based on the um clues in terms of the URLs the hierarchy of the content figure out what the structure of it is and then um you know basically pull out the data and get it ready and have it prepare for ingestion into the dpla obviously there are a lot of pieces to that you know there's certainly a certain amount of probably tweaking to you know read you know read out the website look at the data come back tweak the setting so we can identify the particular fields that map to a particular field of dpla and there's it's more of a cleansing process that could be something another idea that came up the um yesterday was looking at the service hubs so we have service hubs right now that are geographically based essentially there's no reason why there couldn't be a service hub whose job was to um you know work with smaller groups who have websites to have um rich data in it and they could build the tools to ingest that data um deal with a lot of the sort of hands on communication and manual work that would be required maybe crowdsourcing that or however you want but handling those interactions and then feed to the dpla a cleaned up um you know a cleaned up feed that we could then use it does it that does introduce some other complications because one of the things that we're doing in terms of looking at the the data that we're getting from service hubs is part of our remit you know we think is to go and make sure that the original content's still there and you know get changes and so forth so in an environment where you're doing a one-time ingestion of website and then cleaning it up and then filling the dpla that becomes a little more problematic um but I think that's a great example of where a service hub could spring up and you know take on you know take on that type of role concludes the session jeffrey great job we will reconvene here in about 345 thank you