 Okay, so to start us off this month we have Tim Sherritt. As I said he has been looking at ways that wiki data can include Australian government data such as linking wiki data to the National Archives of Australia. For those who use Twitter you may have seen his regular tweets on that very subject and the amazing work that he's been doing creating resources around his own Glamwork website. He'll also be at the WAO conference next week so don't miss that. And I will put the Glamwork bench into the chat and a recent blog post that he wrote for us as well which you can read later on. Don't read it now it's about to be a presentation. And then I'll also put a link into the WAO just in case you haven't registered because you will be able to watch it online later. And now I'll hand you over to Tim. Thank you. Thanks James and thanks everybody. It's great to be here with you. And thanks of course to Wikimedia Australia for supporting the work that I've been doing. Although I'm sort of reporting on it tonight it's not at all that I feel it's finished. It's not the end of the project. It's really the beginning for me in some ways because working on this has really opened up all sorts of interesting questions and possibilities that I want to explore further. And I'll be talking a bit more about that as I go on. Apologies if I'm a bit croaky. I've already given a six hour workshop today. So I'm hoping I just don't start speaking gibberish at some point. But I might start sharing my screen. I don't have slides today but I do have some things that I want to show you if I can find where Firefox has gone to. There we go. Okay so that's the blog post that James was referring to where I've talked in detail about some of what I'm going to be describing tonight. But I thought I'd give use tonight to give a bit of context to this and what I've been doing and where I think it might go in the future. So first of all I mean I suppose it starts for me this particular project with the collections of the National Archives of Australia which I've felt very passionate about for a long time. As a baby historian it was really the place where I started to do my first research and just the richness of the collections is just amazing, astonishing. And they make obviously their data available. That's one of the annoying things about Record Search. They make their data available through their online database Record Search. And again it has a very rich data not only about their own collections but also about the agencies that create the records in their collections. And I'll talk a bit more about that structure in a minute. So there's fabulous data there but Record Search itself as an interface to this material is quite limited and I mean I should also say I did work at the National Archives for a number of years and again was frustrated by you know lacking really ways to make people understand the richness and the possibilities of the material that's held by the National Archives. So over the years I've been involved in a number of projects which has tried to sort of expose that material and make it available for people to use and see and understand in different types of ways. So for example the real face of White Australia is actually a sort of ongoing project that I'm involved in and back in 2011 Kate Bagnell, historian of Chinese Australia and I downloaded I think about 12,000 digitised images from the National Archives of Australia, ran them through a facial detection script and created this wall of faces. This has been updated a little bit since then but it's pretty much the same. To show you the documents that these photos come from. So these were documents which were used in the administration of the White Australia policy. So basically if you were you know deemed not to be White and you travelled overseas you had to carry special documentation with you or else basically you wouldn't be let back into the country. You would face the dictation test and that was a mechanism of exclusion. So there are thousands of these documents and they're amazing and they're sitting in the National Archives of Australia and we wanted to make them more visible. We wanted people to understand them so we created that wall of faces. We've since done other things so actually just a couple of weeks ago we launched the latest version of a transcription project which is actually getting people to transcribe information from these certificates so that we can actually explore in more detail the sort of the population and their movements across that White Australia period. I've also spent a bit of time downloading publicly available ASIO files which are also available through Record Search and wrote a little program which found redactions in ASIO files and assembled a collection of something like 200,000 redactions and you can actually, I do have a red bubble store where you can actually buy scarves which are covered in redactions amongst other things and again it's you know giving people a different perspective on those collections these things that are in Glam collections giving people a different ways of seeing them and using them and understanding them and one of the best things that happened in terms of pulling out these redactions was I got to discover a whole new genre of art which is redaction art so some of the people who were actually doing the redactions of the files obviously got a bit bored and they started turning the redactions into little little animals and ships and things like that so these are actually in files in the National Archives of Australia. So as well as sort of creating these sort of alternative interfaces I suppose to Record Search data I've also created a series of tools to help people get data out of Record Search. The latest version of this this is the latest version of this data scraper so Record Search doesn't have an API and application programming interface so to get data out of it it's a it's a matter of screen scraping the data and it's quite a complicated business because of the nature of the technology behind Record Search but versions of this scraper have you know been around now for I don't know nine or ten years and I've used it in you know a number of these projects such as the Redactions and the White Australia Policy one so this is available now the software it's a Python package and you can download it and you can pull out data out of Record Search to do you know all sorts of analysis or whatever sort of projects that you're interested in and some examples of that in the Record Search section of the Glam Workbench so the Glam Workbench really brings together a whole lot of stuff that I've been working on over the past of 10 years or more in terms of helping people make use of the data that's available through Glam Collections and so it's not obviously just the National Archives of Australia it's also there's a lot of resources and tools and examples of working with data from Trove for example both the newspapers and other bits of Trove various other libraries and archives mostly focused on Australia but some other collections so there's a quite a big collection of tools and resources that help you use web archives for example both the Internet Archive as well as the UK Web Archive and the Australian Web Archive so this is something that I'm continuing to develop and I'm very pleased we've added a wiki data section as as part of the work on this project and I'll show you what's in there shortly but this brings together a number of examples of using that data from Record Search so various things that I've developed over the years ways of looking at that data that exists within Record Search now just to I suppose talk about the the nature of the data in Record Search a bit more so Record Search and I sort of descend into Archives Nerdery a bit here because it's really fascinating the data structures within there so Record Search is based on something called the Commonwealth Record Series System and it's a way of organizing knowledge organizing the collections within the National Archives which it was quite revolutionary at the time in terms of describing archival collections because they didn't just sort of describe a body of records and say that that was created by a particular agency they actually documented the little histories of agencies how they related to each other how they changed over time and also described the functions that different agencies carried out within the context of government over time so and then around those agencies or connected to those agencies which are all interconnected themselves are the records they create which are organized into series so an agency performs certain functions and it creates certain records or record series and I can actually show you a little thing I actually created this when I was at the National Archives this little pyramid it's still there which shows you how much record search has changed it's been 10 years but it just shows you the the arrangement of the entities within Record Search so agencies which is the ones I've been working with lately they're connected up to organizations they perform functions they create series but then of course there's a whole lot of relationships within the agencies so agencies have you know predecessors and successors they have they can be controlled by other agencies or they may control sub agencies so there's some really sort of rich connections data connections and relationships there but and again this is getting back to the limits of the sort of interfaces that we have currently to work with these sorts of collections you it's difficult to actually search across these sorts of relationships or explore or visualize the relationships between agencies within Record Search itself it's very much you know type your text into a search box and get back a list of results you've got no way of understanding that totality that sort of you know rich model of Australian government which is effectively documented within Record Search so that's really where why I suggested this as a project of working with wiki data because if we could get some of that data out of Record Search and into wiki data and linked with data which is already available in wiki data we would then open up new ways of querying that data and of course the the you know the data model within wiki data the the having properties and statements and relationships fits very well with that sort of entity relationship model which is at the basis of the series system within Record Search so I thought if we could you know as a first instance start to get well as I should also point out that each agency within Record Search has its own unique identifier which is obviously a crucial element to this indeed all of the the different parts of them have have their own identity identifiers so organizations agencies series and Commonwealth persons there are people as well all have identifiers and so as part of this project we've got that those entity identifiers as a property now in wiki data so I've been focusing on the agencies but those identifiers can now also be added to people as well those Commonwealth persons identifiers so yes so the idea of of getting those identifiers into first of all into existing wiki data entries but then also adding entries that are that are not there and you know when you're talking about government agencies or government departments first of all I suppose the sort of top level agencies they are you know those sort of changes are documented by you know when we have machinery of government changes you know when a new government comes in or there's a shape up of shake up of shake up of departments you have a machinery of government document which shows how different pieces of what pieces of legislation are being looked after by different government departments and so that you have those sort of different changes so it can you know get quite complex at times as we'll see so the first stage obviously of this project was actually well the first stage was getting the property into into wiki data accepted as a wiki data property and then I using the tools that I had I harvested out all of the agencies from record search agency data it's about there's data on about 8 000 agencies within record search and then I started to think about how that could be modeled within wiki data and you know that's when I started to really grapple with some of the complexities and the the limitations of some of the data so the first thing I I suppose I should say is in terms of agencies within record record search there are a number of record search identifies different types of agency this is under agency status down here so we've got department of state head office regional or state office in these other ones as well so my first focus was departments of state so you know these are the government departments which do the the core work of government but I've also been adding head offices in some regional or state offices as well so in the first instance I've been taking those those identifiers from record search for government departments then using you know tools which you'll be familiar with in terms of open refine and quick statements to match those agency records with ones which were already in wiki data add the identifier and also add some other data from record search specifically the existence dates of departments so record search has all of the records have the dates when an agency started and and a finished within them so making sure that all that's added into wiki data as well as just a few other sort of making sure they're all sort of associated with Australia and things like that and particular instances of departments of state and government so there's and having got those identifiers in and I added departments which weren't obviously already in wiki data and sometimes you know this is one of the when things start to get complicated is that you may have a government department which has the same name over a period of time but whose functions have actually changed so machinery of government has actually changed some of the their responsibilities over time so you'll see you know there's a number of different departments of defence for example I think there's a couple of treasuries as well the only I think pretty much the only department has which has gone through unchanged is the department of Attorney General's department so that meant that you know where some agencies would be in wiki data as a the sort of current agency with a particular name but I had to add the sort of historical versions of that which related to those previous groups of functions and legislation so having added the entities I then added the relationships between them to get the predecessors and successes that was a bit complicated because in many cases you know agencies aren't sort of fully replaced by another agency it's that they're the functions that they do maybe split up between a number of agencies they may even go to an agency which is already in existence so some of their functions are hived off and given to another you know all these sorts of complexities right now I was hoping to sort of be able to sort of map those changes a bit more accurately than than I was in the end so record search where it has that sort of relationship where one agency goes to another it does have the dates when those sort of changes shifted and it sometimes does include a note about what sort of responsibilities shifted but there's a sort of a bit of a mismatch there because really what record search does is it describes the you know the activities of government as functions they've got to have set a a thesaurus of functions which are used to describe those but they're not currently used in describing the transitions from one government department to another so there's a bit of a gap there what would be fabulous to to and what I would like to do some more analysis of is seeing there are functions I'm getting into far much too much detail here I'm sorry it's you know getting a bit enthusiastic about it but the functions are associated with agencies and they do have date ranges though past analysis has shown that even they can be a bit inconsistent but there is the sort of beginnings of that data which could actually break down the functioning of the the activities of government to individual functions and map those over time as they move between departments but there's a lot more sort of data analysis and cleanup I think that would be needed in order to get to that point but that that would be a really exciting thing to do but as it is so we've got that data about departments in to wiki data now and I was able to start then doing some visualisation of that that those relationships so this this is within the glam workbench so the glam workbench has a and these are now public so you can go in and use them so the glam workbench is a collection of Jupiter notebooks I know that the wiki media itself has a has a notebook environment where people can work with data so you may have seen something similar but it's really just showing the sort of sparkle queries that you can do of wiki data with across the government agencies and sort of then ways you can start to explore and analyse it so this is I can just I'm just hitting shift enter on these little cells here in order to run them and you can start to this was a sort of example of a like a gantt chart of agencies showing the years that they were they started and then ended and you can mouse over to see the department so up the top here we will have the sort of regional apartments like attorney general's department it slides off here so you can see attorney general's department going through to the present day so that that was a sort of early attempt to start playing with that data this one is a bit more interesting so this creates a network diagram I need to run it again so it's pulling the data from wiki data using sparkle queries and then just sort of developing a sort of customised visualisation so this is the network graph so and I find this quite interesting so the colours here represent decades and it's basically going down in time so we're starting at 1901 and heading towards the current departments down here and you can see that things were relatively stable for effectively the first 50 or 60 years of the Australian Commonwealth government well this is the world war two era and there was some changing of departments the defence split up into different departments during the war and we had the department of post-war reconstruction and those sorts of things so there are a few changes there but what you can see is when we get to the 1970s and particularly the Whitlam government we had you know a lot of rapid changes in the structure and organisation and functions of government departments and that has continued on ever since so the the size of these nodes indicates how long they lived for effectively so you can see there none of them have well most of them don't have a very long lifespan as they get sort of merged and re-divided and things like that so it was an interesting exercise to actually sort of explore see how you could visualise these sorts of things I've created another a notebook which enables you to do this for an individual department and this actually runs as a sort of an app effectively using a thing called voila which is a way of representing Jupiter notebooks so we can in here we can just from the drop down choose a department to get an idea of its sort of context and change over time so again this is pulling data live from wiki data and showing the connections so this is the agency we selected and then it goes sort of three steps in either direction to see the agencies that were associated or its predecessors and successes so and since then I mean this was the sort of start of that work and I've added more agencies so I've gone down that list of different types of agencies to look at the head offices and others so I think there's about 1400 now in wiki data which have those national archives identifiers I've still got some relationships to add between the head office type agencies head office is a bit misleading because it includes things like royal commissions for example and various other authorities and committees and bureaus and things like that so yes I've got some more relationships to add for those but I've also started to get you know a bit fascinated by some of the other possibilities particularly for HASS researchers humanities and social sciences researchers which the Glamwork bench was developed for to enable them to think about how they can use wiki data in their research so so I did start doing some playing around with looking at data about that's not the one data about individuals people so just pulling out sort of a year of birth for example of people born in Australia who are in wiki data I did some work around occupations and occupations over time so this is again pulling data from the the sparkle interface and then building some custom visualizations on top of that so these are Australian rule footballers Australian rules footballers first names some word cloud so I just sort of started exploring different ways we could represent some of that data in terms of thinking about uses for humanities research also become rather fascinated with the range of identifiers which are already in use which you probably have been adding to these records so you have been adding to these records but sort of the range of them was really interesting so looking at what's already there and then thinking about you know what becomes possible when other collections are sort of adding to that and linking their own records and that sort of network of relationships that they can then pull back from wiki data in order to enrich their own collections there are obviously some fabulous possibilities there which you know would be would be great to explore further and I suppose getting back to that point so the you know in creating those visualizations of the government agency data in record search what has really become possible is that we can query record search effectively within wiki data so we can understand what's in that collection by building these wiki data queries which are using those identifiers and so I started to think about other ways of doing that with other collections so there are obviously a lot of entries within wiki data which have Australian dictionary of biography identifiers and the adb doesn't actually make much of its own data available openly but by using wiki data we can actually build queries around the adb content in a way we can't within the adb site itself so from the outside we can build more complex and different types of queries and questions than we can within the interfaces which are provided by these sorts of services so just as an example of that this is a sparkle which shows us the people in the adb the youngest people who have entries within the Australian dictionary of biography so this is obviously not a query you can do within the adb site itself but because those identifiers have been added to wiki data we can build a sort of parallel interface to the adb and explore it in different types of ways so this has got me sort of really excited about thinking about other ways we can build these sort of you know parallel interfaces by adding these identifiers into wiki data and that's all because of the great work of people like you have been adding these identifiers in that we've got to this point now where we can really start to to to make use of this explore it in different types of ways so really from here I'm continuing to to add some of that agency data but I will also at the moment the Glamwork bench has as I said has a wiki data section so it's got those three notebooks which visualise information about government departments within it and you can run these live online but I'll be adding a number of more notebooks so the ones around people the ones around the adb and things like that will be added to that over coming weeks and again I'll be trying to use this as a way of getting humanities and social sciences researchers thinking about the data that's available there and the perspective it gives them on these sort of existing collections so yeah I think I better wrap it up there again thanks very much for supporting this and really looking forward to continuing to develop some of these things specifically in making you know new resources available through the Glamwork bench Excellent thank you very much Tim and does anyone have any questions for Tim I saw there was some put into the chat there's a question there about is the model the same for state archives generally yes the series system is used across other government archives within Australia the level at which things are described you know the amount of data they may have make available is can differ but the series system is also used in other sort of archives as well so not just government archives the series system is can be more is more broadly used Sam just put a question in the chat here just asked if the data model is compatible with gov directory.org I don't know the answers to that I did sorry I probably could have just looked it up myself couldn't I it's a new seemingly new idea of using wiki data as a back for a government directory it sounds exactly like what you've done for Australia so there was actually a tweet about this too somebody responded to the post I think from people who are doing that the gov directory asking about some of that data I haven't had a chance to sort of look at the at the answer to their question yet but yeah so so I don't know exactly but I mean it's obviously the same sort of sort of data and the same sort of intention behind it I think I mentioned it to you in an email oh yes yes yes yes yes that's right yes it's not a question but but yeah and I think I might have mentioned this to you in an email too but but I'll you know just something similar I was I'm sort of had in mind but haven't had a chance to work on giving my not copious free time but um was was sort of ministry so that's in not department departments but the the ministerial portfolios or positions and and you know that's sort of quite similar in that usually I guess it's there they're almost parallel and the department changes name or splits or merges you know the the ministry is the same thing so um so yeah it'd be interesting to to sort of try and tie what you've done there to make it a bit easier to to see the history of that over time and yeah and some of those commonwealth people relationships might also help in that I'm not sure I've never been quite sure the extent to which that is used within record search like how how many I I don't think it's complete basically but but certainly you know there are some of those relationships between both at the ministerial level but also you know department departmental secretaries are sometimes named as as commonwealth people and associated with agencies as well um and then Amanda's just raised her hand she's got a question yeah yeah I was just thinking about um as I put a comment in the chat you know an issue like arts or science you know they they sometimes they're in sometimes they just disappear you know with the the previous government there was no science in anywhere and that was a big you know kind of proof haha about that and then it got added in is there a way to sort of see where say something like arts presumably you know you when you look at your history but when they only had what five five agencies at the big get you know in 1901 arts wasn't there somewhere along the line probably in 1971 and got it added in uh and then you know it gets it's a classic thing that it gets sort of shunted around from different to different agencies or drops off is there a way to see some of that kind of yeah so you sort of you can see some of it so I'll just um if I just go back to that that network uh graph for example I can show you with the things that you're showing is a bit hard to see because there's nothing nothing that was identifiable as a particular agency like you know could you track the agency as it as it went down through time kind of thing so um so yes so these uh you know individual agencies and if you click on any of these uh sort of nodes in the network graph and it's it and you can zoom in and you can see where that the sort of functions of that agency have gone to different agencies so you can start to track those connections through time in terms of the sort of named functions and that's what I was saying it would be great to to have those sorts of data about the named functions in there but it's not at the moment not sort of consistent enough within record search to pull out to show so you could say you know the arts function and you could trace that through individual departments so what you're sort of really working with here are the the the names that they're looking at the departments and seeing what sort of what sort of departments then followed them and how that sort of tracks through time and similarly those sort of individual ones as I was showing here you can start to see you can choose a sort of department from here and start to see where the connections are from that how the sort of responsibilities are flowed from there so there are possibilities there it's the data's not quite complete but I think it's certainly something to explore further so we've got two questions so I think Margaret may raise her hand first then we'll get to Sam Margaret you're muted I popped my question into the chat but I was interested in the connection between your departments and legislation yeah yeah yeah and just sort of hoping that it might tie in beautifully because um yeah anyway so I was hoping you might have to expand on that perhaps yeah yeah yeah no that was something I certainly been thinking about and I was going to mention it so thanks for thanks for raising it um yeah so I mean machinery of government changes link departments to specific pieces of legislation so they have responsibility for certain pieces of legislation that data isn't I mean it's in the machinery of government documents wherever they might be it doesn't seem to be available directly through through record search at the moment but it's not just government departments I mean what's I think also very interesting is the range of other types of agencies that are specifically created by legislation so you know various boards or councils or commissions or whatever and they're often quite easy to find the connections because the name of the agency is in the piece of legislation and I was finding this you know as I was trying to disambiguate government agencies I was getting the the legislation pop up so I think there's certainly with some a number of those sort of lower level agencies linking them to the the legislation that created them would be fantastic it would be because one of the things that bothers me about Wikipedia is we're really very ahistorical I mean it's all the present and um anybody born after whatever thinks that life has always been you know exactly as it is now and um I just you know you know I was thinking about writing an article about a person and one of the things that was important was that in the times when we didn't have dual citizenship you know heads of department who reported to ministers were required to be Australian citizens but you know that's sort of all lost in the midst midst of time and I sort of love to be able to tie it all together because um life has changed enormously and we don't see it in Wikipedia as well as I would like to see it you know neither in wiki data yeah and I was also the um that the work um that has been done on the Australian legislation is just fantastic and I was I was looking at that too as a um as something to point like humanities researchers too um to have that available in that sort of form to be queryable um again opens up some really interesting possibilities uh Sam did you want to go ahead uh yeah um I think your your um creation of an API for record search is is wonderful work um I mean you're no longer at the the national archives but um do you know what their plans are it is record search gonna ever change is there gonna be an API um and also like yeah what is the legal status of your scraper uh well the the all of their data is now available cc by so oh cool um uh so yeah I don't think there's issues there they've sort of aligned themselves with other government departments in terms of um that that sort of metadata um uh look I don't know um I um like we even when I was at the national archives you know there was money being spent on planning for a record search replacement which never happened um and um you know I've heard that there has been some well there were already there've been a couple of projects which have come and gone and nothing's actually happened and so I'm not quite sure where things are right now I mean there was you know I did have discussions about an API probably you know six seven years ago um there was a short lived API to some of the World War One records which now seems to have gone away um so I wish I had an answer for you but I really don't and and the the um copyright thing um like importing into wiki data obviously cc by isn't uh public domain um is there a an issue there I mean not that data is copyrightable anyway so it's the wrong license for a database but um um well the data imported so far is like dates and identifiers um so um I don't think there's an issue with those if uh there might be I don't know like if they have sort of text descriptions of departments there may be other issues around if you were going to sort of import those um but you probably don't want to because they're a bit of a mess anyway um but that sort of basic metadata I don't I don't uh I think there's a problem uh pretty had a question yeah thanks Tim lots of questions but I guess just going to all right if we you know who are going to do more um give more grants two questions one what could we do you know what does wiki data need to make it more accessible for people to work in and and and then how do we actually attract people to want to do that work so you know with how what kind of things could be attractive to past researchers for instance yeah um I've had a number of uh you know since doing this work I've been talking to people about different sorts of projects and really um you know it's changed my thinking around a number of things and thinking that you know really they should be going to wiki data first in terms of starting to for example today so at this workshop um there was somebody who was talking about bringing together resources relating to designers in Australia um and yes I know there's already Dow that um um directory of uh archives sorry my words are really going now um but um but they they want something a bit different um and um uh and you know I I really thought that thanks for looking to Dow yeah um but I really thought that uh you know in terms of what they were wanting to do which was actually just create linkages between collections and information about people and things like that um you know it was something to be done within wiki data and then they could uh you know build a uh any interface to that uh on on top of that data rather than create yet another platform or yet another um you know database or whatever um so I think there's and obviously you know there's been work done by glam organisations across the world making use of wiki data and more and more um perhaps I don't know perhaps be less in Australia than say in Europe or or the US um specifically around um you know putting their own identifiers within wiki data in a way that enables them to find those connections with uh with other people in collections and things like that um and again I've had some conversations around that recently about um you know encouraging organisations to to do more of that um just to understand the value they get in terms of enrichment by um sharing their own identifiers and information um so um I mean I know you're doing all that already but I mean I think I suppose just sort of reinforcing those sorts of opportunities and and what I can do like through things like the glam workbench is to demonstrate um some of the value that accrues from those sorts of things by showing how you can explore linkages and and create alternative pathways and interfaces to some of this material um I think um you know in terms of HASS researchers particularly I suppose my discipline history um we're still at the beginning of really understanding of working with um a lot of uh data uh collections so you know a lot of my work is basically pointing to things like trove newspapers and say okay well let's not think of this as a list of search results let's think of this as a as a huge data corpus that we can explore in all sorts of ways and people are starting to to do that um and starting to explore those data possibilities and I think sort of wiki data then sits alongside those sorts of other uh platforms and projects in terms of humanities data and and seeing the possibilities again for linking some of these things up um which aren't possible in the platforms themselves um and that's sort of what I'm going to be talking about at the the well uh conference is really those possibilities for building a round sort of glam collections using a variety of tools like wiki data and creating these sort of alternative ways of seeing and using and exploring them um so yeah I mean um uh and yeah I suppose um no I think that's probably what I had to say thank you and thanks for your advocacy you know because that you know you have an important voice in that uh area so really you know happy for you to connect people with wiki data or with us if they need um further support great awesome