 Very good afternoon, everybody. Today, I'm talking about OARP image with it's really the metadata harvesting. First of all, let's look what is OARP image? OARP image stands for Open Archives Initiative Protocol for metadata harvesting. So from my point of view, OARP image was just harvesting records from one repository or one archives to another one. The first version of OARP image was released in January 2001, followed by a minor revision in July, become version 1.1. So latest one was version 2.0, which was released in June. It's quite a while. First of all, let's look at an online technical overview of underneath of OARP image. From my point of view, there was two techies. First technical issues here, first one was HTTP. Second one was XML. I'm going to explain you for details. So first of all, let's look at protocol to transfer metadata we use HTTP. Given you guys are all experts, so you know it's typical client server computing models. You can get a response, you can get a request, and get a response, all sent by HTTP protocols. So requests are encoded, specific requests are encoded in GET or POST operations. So response always give you what well-formed XML documents. In principle, OARP image can support any metadata format. So it can be doubling call, mark, mod, RIFCS, anything. But in practical, in end OARP image harvester, we only support RIFCS. Last year, we successfully implement our harvester to support ISO 19.1.1.5, which is my colleague Alicia will be happy. So she don't need to prepare another format for our translations. Of course, the default format for OARP image was doubling call. So now let's talk about the key two important components in OARP image. First one was data provider. Second one was service provider. So from my point of view, OARP was the device world between data provider and service provider. So specifically in data capture project point of view, data provider normally is your institutional repositories. So what you do was you expose your metadata to end harvester. And it's possible you supply free access to metadata, maybe item level data. And service provider here, which was we end OARP image harvester. Basically, it's a client application that issue OARP image request. Now let's talk about another issue was sets. The purpose for sets was all over for harvesting of sub-collections. Unfortunately, there was no unique guideline on how to define a set. So this is really up to your institutional reporting managers. So for example, in this diagram, so there was three sets. First one was end, sets for end. Second one was a bunch of images, maybe send it to picture Australia. Third one was a bunch of journals. Maybe it will send to people, National Archive of Australia or something else. So given there was no unique guideline on how to define sets, this become issue was similar to end collections, how to define a unique collection. So it's sort of become a BA's task. You have to negotiate or you have to engage with your clients to get mutually agreed sets for end. Finally, what I'm talking about was sets can be overlapped, which means there can be one item in end set, another item, same item can be in image set, but they fit two different places. So now, given we've already talked about service providers and data providers, let's talk about a brief talk about how it works. So as we said in data capture projects, service provider mainly was Ends Harvester. And the data provider can be your institutional repository, can be say, Monash Arrow repository, Swiband Research Bank, Nova in the Newcastle, or for Vivo in Unimelban or somewhere else. So what it doing mainly was our harvester always request six important verbals to your data provider. Say just give me data set based on different verbals. And all your institutional repository given was just giving me well-formed XML file. So this was six predefined verbals in OARP image. Let's look at them in details. First one was identifier. What's identified doing was its return general information about IR, and maybe it's related policies. So let's look at examples. So this command, what I'm using is I'm using Griffith MetaHunter Harbor project. They have successfully configured OARCAT to support RIFCS. So what I'm doing here was I just tell you verb equal to identify. So hopefully you can give me what's your institutional information and the related policies. This was a screenshot of the response. As you can see, let me just point out this is your repository name, it can tell you, it also can tell you this is your base URL, which is very important in OARP image art. So this it tell you was protocol was 2.0. Also it tell you if you got any problem, who you contacting to. Second one was list metadata formats. So the purpose was just show you listing all the possible metadata formats in your institutional repository. Also it can show you where's my schema location and where's my namespace. So let's look at example of this request. See it's response, what it going was it's just request say list metadata formats, a project you may not see my cursor is moving, it's too tiny. In this response, as you can see was MetaData prefix was RIF, which means Griffith's unique OARCAT is only support RIFCS metadata format. Let's look at the set. So it's just provide a list of sets in with records which may be organized. So again, in this command, what I'm going to do is I'm going to ask Griffith's unique OARCAT say, hey show me all your sets in your repository. This is a response, it show you we only get one set, set name was research project. Now let's look at list identifiers. So this can list all the unique identifier corresponding to records in your IR. So this identifier can be as Nick point out in the morning, it's can be global unique persistent identifier or can be local identifier. Of course, you can also get the parameters I want to show for a certain period of time, show me all this identifier you created. Or you want to say I want to see in end set how many identify I minted. So let's look at examples of which I proposed. So in this command, what I'm going to do is I'm going to ask OARCAT in Griffith's unique say, hey show me all your list, all your identifiers which is metadata format you go to RIFCS, which means show me all the RIFCS set identified. This is just part of the screenshots, it's giving me all the list of identifiers. Last two are the list records, which means you can retrieve your metadata for multiple records. Of course, you must tell this record which metadata format you want for this record. Also you got choice to say was I want for a certain period of time for this record. Or you can also specify for what sort of set you want. So let's look at this example. This example shows you was I want to list RIFCS metadata, all the RIFCS records. This is part of the screenshots, a project I'm just not confident enough to show you the real one because it's take time. So I'm just show you the half of my screenshots. If you look at, if I can guide you, look at at end of this XML file, you can see registry object group is equal to Griffith's unique, which means this is a party record. Of course, it's got lots of records there. I'm just because of limitation of the screen, I can't show you everything. So final variable which I'm going to say was get record. This one, all you want to say was return metadata for single identifier. For example, you TARDIS may mean ends and identifier like slash double zero two. So if one day Steve and Alex said, I just want to get this record back. So you can say was my met command was get record. My metadata format was RIFCS. My identifier was this. So this will give you a single one record meet your requirement. Okay, now let's talk about another issue which was data stamp. So as in OARP image, each record can need a data stamp. What's the purpose for? Two purposes. One to show when this record was created. Second one was to tell you when this record can is modified. So using this data stamp, OARP image and the harvester can harvest by data range. For example, if one day ends harvester pointing to Monash Uni in Anthony's OARP, my TARDIS report, we just can say, show me all the records from 1st of August, 2001 to 31st of August, 2001. So it's just give me a range of records. Or another better thing was it can support incremental harvesting. So let's talk about what is incremental harvesting. So again, as we always said, the service provider here in data capture project is always ends harvester. I'm assuming you just contribute your records. You not harvest the back. Okay, so your data provider can be different. So what ends harvester doing in general way is it say, hey, what's new since last time I came in? Show me all the records. Different repository return different XML file. It can be already containing all the new records, all the modified records, or you may flag this record is being deleted and you need action was in our harvester side to delete these records. Let's talk about a slightly complicated issue which is you may use resumption token. The purpose for resumption token is, maybe one day your repository just has too many data set. You may one day contribute 10,000 records to end. But given you may worry about your network traffic may lose part of your record. So what it's doing is you can regenerate resumption tokens for your data provider. Let's look at what it's going. So ends harvester pointing to your provider. Say, hey, I want all your new records. And I also say I want all your Rift CS record. As you can see, that's Rift. And also I say I need time was after January of August. Ah, sorry, first of July. This provider say I have 250, but given network traffic or given the policy of give you at once per time, you can define the number of items you can define to end. One day, for example, if in Monash Uni, if they divide, I just give you one time, always give you 100 records. Then this become issue that you have 250. This time I only give you 100. So what it's doing Monash Uni doing was it's generating me a token that I give you 100 first. And please remember your token is MON1 or something else. Then our harvester were reading, getting your record, continue start to, sorry, talking to your data provider say, I need more data set. This time I give you a token was MON1, which means in your provider, you know you've already contributed 100 records. Then your response will be I have 250. This time I give you another 100. Then I generated you a new resumption tokens. So finally, end harvester finished harvesting all the records with the rest of 50. This time your token was empty, which indicates our harvester, it's end of your harvesting. So finally, what I'm talking about was I'd like to talk about some existing OAR PMH solutions in ends, or which can be, which has already been used in some projects. First one was JOAI, which is you can find detailed JOAI solutions in end's website. So this solution has been adopted by Monash Uni Melbourne FIVO, am I right, SARS? Okay. So second one was OARCAT, which is being successfully adopted by Griff's Uni MetaHunter Hub project. Finally, one was PRO-OARI for Fedora. We know it's a trouble, but lucky our friends from SARS, they successfully developed a solution to support PROP, configure RIFCS to support, give records to end. Of course, you may ask him, there was plenty of OARP image solutions, and here I only list you three. Does that means you have to choose from three? Answer is definitely not. So whatever solution, which meet your best internal requirements, you just go. Correct me if I'm wrong, Andrew. I'm sort of doing. So you can choose, in the market there was plenty of OARP image solutions. Okay, too much. Finally, I'm talking about end supported MetaData, which is I have sort of already covered in the previous. First one was RIFCS, second one was ISO19115, which was Geographical Information MetaData Format. And of course you can say, in your institutional repository, you have other format, DC, Mods, MAC, or other format. So what it required is it required cross-working from other different format to RIFCS. So it's inevitable is you have to write XSLT to translate from say for DC or MAC or other format to RIFCS. My message here was given and some project has already written some XSLT translations. So before you decided to write a new translations, please talking to our BAs, we may can give you existing one. So you just need to modify or you just adopt it. So save your time, unless you really think it's been joined to do it yourself. So finally, what I'm going to talk was, I'm just giving you a screenshot of in-ends sandboxes and in-ends production. So suppose you have already have institutional repository and successfully configured your OAP image to support RIFCS. So we need a couple of mandate information from your side to set up information in our data source. So first one was key. What sort of key you can name for your data source. Second one was title. What's the title? It can be and, or Monash Merc, or Unimailbone Vivo project or something else. Finally, third or fourth one was the base URL, which is you only supply the base URL of your OAP image. The fifth, sorry, the fourth one was in your provide, as we said, OAI divides the world into provider side and harvest side. So you really need to talking to us is which type of provider type you supply. There was three types here. First one was RIF, which indicates you are going to use direct harvesting. As Andrew mentioned before, with you just use direct HTTP get, we just use direct HTTP get to get your record. The second one was RIF OAI. Third one was RIF OAP image. In harvest the method, you can also talk about what is direct harvesting is harvested based on OAP image protocol or something else. Of course, if you wanted to, you can also set up some sets. So clearly tell you I'm going to contribute this set to end, this set to picture Australia, this set to somewhere else. And you can also define how frequent your ends harvested to get your records, once per day, once per month or once per year. So also we may recommend you to leave your content number, your name, just in case if your harvester was server down or something else, we can contact you and say, hey, could you please take a look? Okay, I guess that's end of my talk. I'd like to leave you with some useful links.