 So, welcome everybody. We've got the queen of Tim matching here, Natasha Simons, who very generously, yet again, has agreed to give us the benefit of her wisdom and experience. Natasha is wearing an extra crown because of the recently awarded Stanford Prize, and I'll hand over now to Natasha to tell us a bit about that so she can bask in her glory. Okay, so I'm the project manager for the Griffith Research Hub, and it's nice that Simons let us have a little bask in the glory of the combination of merit in the Stanford Prize for Innovation in Research Libraries, and the whole place is kind of abuzz with that news at the moment, and we're really hoping that the people who have the money also hear that news because our project finishes on the 30th of June, and we'd really like to keep extending the hub, and we've got a list as long as your arm of stuff that we could do to it, but we actually need some internal funding to do that. So, we're having a celebration here, and I'm inviting all the people who have some money to come along, and hopefully they will give us some more funding to do that, but we were really amazed that they gave us that prize, and there's some really awesome comments from the judges about that. I'm here to talk about that, I'm here to talk about Tim, as Simon asked me to do a Tim matching workshop. So, I worked at the National Library for about eight years, and I worked on the party infrastructure project, and I left just before that project finished. So, I'm going to talk today though, not as a National Library person, because I haven't been there for nearly two years now, but just talking about Griffith's experience of the party infrastructure, and also a hands-on like how to match using the Tim, which is the Trove Identities Manager. So, there are lots of different paths into the party infrastructure and getting NLA party ideas, and in this one, I'm actually just going to talk about our experience here. So, and it's sort of following on from a couple of discussions that we've had about the party infrastructure at this clinic, which is Hoylan gave an excellent presentation a few weeks back, and Simon's done a question and answer about it. So, the type of things I'm going to cover are setting up harvesting of party records by the NLA, the signing up and access to the Trove Identities Manager, the hand matching records using Tim, and that's sort of, I'm going to use their test service to do that just as we speak, and also getting the NLA identifiers back and putting them in your metadata store and providing them to RDA. And also just end with a summary of our experience. So, I'm going to mention the successes, some of the issues as well. So, first of all, why bother? What's the point? So, it's just taking a moment to wonder why we're actually doing this. So, researchers have a lot of name variants. They publish under a lot of different names, could be their first name, their initial, they could change their name. They might be even a spelling error in the name that they publish under. And they also have a number of identifiers that are signed by different publishers and also assigned by the university and so on. And those sort of variations and multiple identifiers actually make it difficult for other researchers to find all the works by a single researcher. And it also makes it difficult to find a specific work by a specific researcher if you're searching through a repository that holds those items. So, historically the focus of research has been on publications and not on data. And it's also not been on the people who authored the research. But information about a person provides a lot of contextual information that's really valuable for people, particularly on the web, to be able to find out more about the person who published the article and getting a whole list of other articles that they've published that that person might be interested in using. The other reason we're doing it is persistence. So, web pages as URLs come and go and we need persistent identifiers for records about people and organisations so that they're managed in the long term. And the National Library Service is an identity resolution service which means that you can have multiple identifiers which all have the one NLA party identifier. And that groups all of those identifiers and records contributed from different organisations under the one NLA party ID. And the National Library is government funded and they have a commitment to maintaining these identifiers over time. And that's the point about persistence. So, just briefly, this is a bit of an eye-watering diagram. So, I'll try and talk you through it. It comes from the ARDC, actually the acronym for the Party Infrastructure Project was the ARDC PIP, Australian Research Data Commons Party Infrastructure Project. And there's a document which describes the whole process and how you can contribute records and so on available on that Project Wiki and this is a screenshot from that. But basically the project built on existing National Library infrastructure. So, the National Library already had a service called People Australia. And that was about identifying prominent Australians and their biographical information, their occupations, just a whole lot of contextual information about them. Their different name variants, their different publications and so on. And making that available in the end, the front-end service became the Trove People and Organisations Zone. So, Trove is the National Library's search service and it has a number of different zones, which are things like digitised newspapers, pictures, and there is a zone for people and organisations. And that is for exposing that type of information about prominent Australians and so forth. So, the National Library already had that and it sourced contributions to that service from the Australian name authority file in the first instance. And that is a library name authority file that's contributed to by libraries across Australia. So, that became the basis for the party infrastructure. And most of those names are by people who have written books. So, journal articles and so on have been a little outside that realm because they've been in the hands of commercial publishers rather than the books, which clearly print people's names, birth dates, so forth. And so, the name authority file has been based on people who write books. So, a lot of the stuff that we are contributing through the party infrastructure as institutions are by researchers who write journal articles. So, a lot of the records that we give will be new records that don't actually exist in the current infrastructure. Anyway, so, the National Library Service also has a number of other contributors. So, Australian Dictionary of Biography Online, VIAF, which is the virtual international authority file and so on. So, there's actually a massive number of records already in the NLA infrastructure. And basically, I'm going to talk through how we have contributed to it. So, basically, you have to set your records up for harvesting by the National Library. So, the National Library has a custom-built harvester that they built in-house and it's used to harvest your metadata records. And you can provide your party records in either RIFCS format or EAC-CPF format. So, EAC-CPF is the native format of the party infrastructure of the Trove People and Organisations Zone. And it was put together by a range of organisations from the American Society of Archivists to the State Library of Berlin. And we had a Vassel Juherst, who was the project manager for the party infrastructure project, was actually on the board for developing that format. And it's a specific metadata format for describing people and organisations, but I'm not going to go through it in detail here. At Griffith, we provide our records in RIFCS format. You don't need to produce a special feed of party records. You can just give the NLA the regular RIFCS feed and their harvester will ignore the non-party records. So, we just give them everything that we give to Research Data Australia and that just pulls out the party records. So, you have to provide your base URL for harvesting to the NLA. So, once you've got your RIFCS records ready and you've made them available for harvesting, and that little screenshot there is a list of our records using our... that we've made available for harvesting there. So, you can actually see them in a web browser. The NLA then requests those records. You also need to advise the NLA of your ISO, which is your... I can't remember the acronym... International Standard Identifier for Libraries, something like that, which is basically AU for Australia and your National Union Catalog Symbol, which you can find on the ILRS guide. We'll actually show you online how to actually get your NUC symbol, but all your library people should know what your NUC symbol is because it's used widely in libraries. So, for Griffith, it's AU for Australia-QGU, which is Queensland Griffith University. So, it always starts... the NUC always starts with your state, the first initial of your state, and then goes into your university code. Okay, so then once you've got your records set up, you ask the NLA to harvest your records. If you like, you can ask the NLA to put in your personal email address into their harvester, and that means that you will receive automatic notification from the harvester of the status of your harvest, and that's a really useful thing to see that the harvest went ahead, how many records were harvested, how many were there is that sort of thing. Now, in this harvest, we are expecting those record errors because the record errors are for the non-party records, so they failed because they're not party records. So, first, the NLA harvests into their test environment, so that's completely safe. It does not connect into production at all. The first test harvest checks your records, it doesn't actually get them, so it basically just checks the quality of your harvest. Then they run... if that goes ahead successfully, then you go ahead with the production harvest, which basically gets all the records into the National Library's test system. So, as they come in, oh yeah, you can see that little diagram, it's a bit hidden for me, but the harvested records hit what's called the National Library of Australia's identity service, and that's where the matching rules are applied, and they're applied to see if your incoming records match the records that are in the Trove test system. So, we're just talking about the test system at the moment, and records that fail automatic matching appear in the Trove Identities Manager for hand matching. So, the matching rules are fairly conservative, but they basically have to match exactly with an existing record to be co-located with it. It also says if there's definitely not a record in there with that surname and that first name or the initial of that first name, then a new record is automatically created. And so, the ones that fail then go into hand matching. So, they're possible matches, matches that couldn't be worked out automatically. And if you read any of the literature around this, even from the name gurus overseas, particularly in Europe, that you can't actually get a matching algorithm that's better than about 95% to 97% accurate. And to improve that matching, we probably would need a lot of contextual information in the records, richer records to build up, is this Jane Smith the same as this Jane Smith? Is Jane Smith who wrote a cookbook in 1980 the same Jane Smith who published on Biology in 1982? So, you need a lot more information to actually determine that match. So, that's just an overview of the components and how it works. You then check your auto-matched records in the Trove test system and check your remaining records in Tim Beta, put the URLs up there. And if you're happy, the National Library can repeat that harvesting in their production environment. And so, there's a little tip there that you can view your records in Trove by entering your ISO or your nut code in the People and Organisations Zone. So, this is the snapshot of Trove and I've just typed in QGU and it's brought up all of Griffith's records. Although we've increased it since then. Okay, so to sign up for the Trove Identities Manager, you need to actually sign up to Trove and then you need to send a request through the Trove Contact Us page, which has to say request from Ann's contributor for access to Tim. You give them your university name, your ISO code and your username and then they will set you up for access to Tim. So, I was actually just going to demonstrate Tim now. Let me get to it. Right, okay, so this is the Trove Identities Manager. This is the Tim Beta service. I'm just going to get set up for this. So, you just log in. This is once you've got your log in. Okay, so this now I'm in the Trove Identities Manager. So, you only have access to action the records of your institution. You can't action records from other institutions for very obvious reasons. So, this is a very simple drag and drop functionality that's quite easy to use and it was developed by an external company to the National Library and we did quite a lot of usability testing on this and I think it's actually that you'll find it's actually quite good. But it's basically a left-hand screen for your unmatched records and a right-hand screen for records that already exist in the infrastructure. I'm just going to have to put that down. Okay, so, and we'll just ignore this tab at the moment, which is another... This is actually used for disambiguation that is for correcting stakes. If you've accidentally matched the wrong person to a record you can actually split them apart using this tab but I'll just leave that for the moment because I'm not going to show that today. So, I can only see Griffith's records. So, if I click on Griffith University it opens all of our records. Now, most of the ones that failed matching for us were our corporate records. So, our party group records and that's kind of... It's actually almost impossible to match party group records. Pretty much no one sticks to... No one outside libraries and possibly even within them sticks to a proper way of describing organisations and that makes it very hard to match them. So, I'll just have a look at the first one which is for our Ascitas Institute. If I open the details tab I can see the history which is a little biographical statement about Ascitas and this link here is back to our hub record for Ascitas because that record is coming from our research hub and here we have a link back to the record in the Research Data Australia service. So, if I want to search for possible matches I click on the Ascitas name on the name here and it over on the right hand side has executed a search for that full name string as well as any records with that identifier. So, nothing has actually come up but it doesn't convince me that there's nothing possibly in the infrastructure. So, I can use the general search box to type in just the word Ascitas and here it's kind of a key wordy search. So, it's searched all name fields and records for Ascitas. So, this person Catherine is somehow connected to the Ascitas Institute so that's you can see it down there so that's why she's come up in the search results but that's not the record I'm after and no other Ascitas records have come up. So, I'm pretty confident that there's no record for Ascitas in the party infrastructure. So, I'm going to create a new record with this so I just click on it and drag it up to the top where it says drag here to create new identity and the tab turns orange and I let go and it says I sure you want to do this so you've got time to change your mind or if you've accidentally dragged over the wrong thing I do want to so click create new identity and it's instant, instantly created record with an NLA party ID here. So, one issue with these so instantly assigned and instantly available in Trove so if I go into the Trove the best service which the URL is up there and I open the people and organization zone type in Ascitas and there it is so it comes up it's an instant thing once you've actually created a record so you can see that it's actually really quick to do and very simple and the time consuming process is in working out if there's any possible matches and so forth. So, one issue with the Trove identities manager that Hoylen pointed out too is that this NLA party ID in the test system they're not real party IDs they're just sort of fake ones so they can lead to the wrong party record for you, it might be a bit confusing so the way to get around that is to get into the Trove test service like I did just then and just do an actual manual search by name and then it will come up rather than click on this which will open in this case it should open just a production record with that NLA party ID so if I didn't know that that would have been a bit confusing to me so I know that's obviously something that needs to be fixed but something to be aware of so let's have a look at a person record so that was a group record so I'm going to have a look at Ross Fitzgerald some of you might have heard of Ross he's quite a famous academic retired now but he's linked to Griffith University so put a little bio for him in there and a link to his web page so to search for him I just click on that name field and again it's searching for either his name or that identifier but the problem with this is that it's actually searching for Ross and Fitzgerald and emeritus and professor so actually I'm not convinced that he's not in the system so if I put in to the search box look I so can't spell Fitzgerald Fitzgerald search identities it's just taking its time you can see there's a little wormy there that turns while it's thinking about it so there's 297 results and I don't want to go through 297 results so I'm going to use advanced search by clicking on that and then you can actually narrow down the search to the name field and that would be name and name variance so if I put in Fitzgerald well I've already done it before here's one I prepared earlier search identities and that record has come up so if I'm not sure all I've got here is Ross Fitzgerald 1944 over here I don't have a birth date because Griffith can't provide birth dates it's against our privacy policy so I could actually open this record I'm going to click on that one now this by chance has actually opened the right identifier but you'll see at the top that it's actually in the production service and I just to confuse you I actually we actually matched these yesterday in production I know that that will actually show the Griffith record there even though I haven't matched it in the test system yet so I can see from here though that under resources for Ross Fitzgerald he's got my name is Ross in Alcoholics Journey blah blah blah a number of his publications on politics back to the Trove service if I open his record in the research hub he doesn't have a long record because he's retired now and it's not a public it's you have to get there through the hub link not you can't actually search the hub for Ross Fitzgerald policy issue another issue I'm not going to talk about at the moment but I can see here I can actually go through his books and see whether he was the same person or not judging on what he'd written so I happen to know that he's the same he's the same professor so I'm going to match this record our unmatched record with the one in the Trove service so I click on this and move it across and this record then goes that sort of yellowy color so I let go it's just click and drag that's all I'm doing I asked you if you want to add a note so probably the only reason you'd add a note is if you were splitting records rather than adding them so I'm sure I just click move record and there you go the Griffith record has gone straight over there on the right hand side and it's basically co-located under that one NLA party ID so you've got Libraries Australia contributing Griffith University contributing and on the record itself in Trove you'll see Griffith University Libraries Australia and you'll see the different contributions so that's basically how the matching works I could do one more though and I have to say that we are going to do some improvements to our corporate records following having looked at this because this sort of corporate heading is not very useful it comes from an internal Griffith system and it needs to be better than that before we give it to the NLA I think so you can also search your own records here it will search your unmatched records so this is one for Professor Paolo Singh and if I click on that it opens this particular one got a hit because the name is exact and when I open it there's a Libraries Australia and there's already a Griffith record there now before I was at Griffith and when I was at the NLA actually I just come back to bite me this key here is griffith.eud.au instead of edu.au so someone made a mistake in the identifiers at Griffith and so that's why this record hasn't overlaid that record if those identifiers had been exact then the new harvest would have just overwritten this record with that record overwritten this record over that one so it hasn't and I want this one to replace it so I'm going to move it over here and then I'm actually going to hide this one from public view I can't delete it but I can hide it so if I click hide then that means that that means that in the public view of Trove you will actually only see this record and this record and not this record which has got the wrong key in it so I think that was all I was going to show of basic overview of how to use the Trove identity service just go back into my thing so you can actually get your records back from the NLA so you can get them back through OAI harvesting so you can harvest your records back as a set and they'll come in a set with your ISO so for us it'll be AUQGU you can get them back through the SAU search and retrieval interface or you can get them back manually by looking into Trove and getting them that way although that will be a very time consuming way of getting them back and then you can store them in your metadata stores and in your metadata store and provide them in the feed to research data Australia so just some the summary of things that I think would be good the problems as I mentioned about the NLA identifies resolving in the test system to the production pages which is quite confusing the NLA would like RIF CS name parts to be in a certain order and I don't think we should have to do that if they're called given name and surname, I've forgotten the RIF code for it then really the NLA should be able to deal with that and we shouldn't have to order them in a particular way so that they appear in Trove in a particular way so other tips are ask your librarians to do the hand matching because they're familiar with authority control for names and researchers and they've got the skill set already you should also I would get more than one person to use Tim to spread the knowledge and the skills so yesterday myself and Sam Searle and Stacey Lee sat down and did the matching for our records and it took about an hour and a half to get through 30 odd 33 odd records and that was including to them how to use Tim so it's not actually really time-consuming it wasn't a time-consuming thing for us and a lot of the issues that we had were actually to do with our own records and the quality of our own records embarrassing like so you can also, as I said, set up automatic notification from the NLA harvests so that you get your personal emails don't change your identity unless you really, really have to because you can see what happened, the Griffith ones that you have to go through and hide the old ones change them and you mark up matching that way and nice to have it would be really good to be able to get into the NLA harvester and execute our own harvests but you can't do that at the moment you have to ask them to do it but they can also set it up to harvest at particular intervals so you can't flee daily, whatever you want and also to get automatic notifications from tiermon statistics so how many you've got left to match and so forth but you don't have a time frame to match these in with the unmatched records they will just sit there ready for your hand matching until you get in there and do it so I'm actually going to leave it there okay I know that there are often questions arising I've got a set of instructions where there's lots of training modules on Ian's website and links to those from the metadata stores blog where there are detailed instructions on obtaining ISO codes if you haven't done that already the other thing is that we have a running list of unexpected behaviors some of which Natasha has mentioned in her problems list that list I know is read by the NLA and it is a working list so the NLA are aware of these issues so they're not just going nowhere they they may form a case for them to get more resources to try and address some of them but the thing is we don't want you to overwhelm Natasha with questions outside of here probably your first port of call will be us at Ann's and beside me is Julie just there she is, Julie McCulloch and Julie has done a written training modules and has been taking detailed notes here so if you have questions please contact us now Natasha is the reigning queen of matching but we don't want to bother her unless we've run out of ideas is that okay with you Natasha yep I'll look if yeah because my pain job is managing the research so I kind of have to do that one first but if you maybe I was just thinking yeah if there's a discussion thread that people want to use I could answer some questions on there because I know it's a bit frustrating if you can't get answers very quickly to the things you need to I do have some insider knowledge but I can't always answer them because sometimes they require someone looking into a technical problem or it might be that something's actually changed in the NLA system and I might give the wrong answer so yeah it's if they're on a list and then people can share the questions you know and I know it's a bit hard because you sometimes feel a bit embarrassed I think this is one of those areas where there are lots of questions and there's really a lot of first time questions about it because it's quite a complex thing to get well with that Natasha I'd just like to thank you again for your generosity of time and spirit and the benefit of your wisdom thank you for that it was a full house there's a lot of interest and we'll try not to bother you with follow up questions although I know that you just keep solving people's problems so thank you very much go on ok well thanks to me and I hope it was useful for people