 Hi, everyone. My name is Maria Weaver. I'm a librarian in research services at Griffith University. So for the context of this presentation, I'm just going to give a brief description of my role. I'm a client facing librarian. I do training workshops, consultations in literature search, reference management, researcher profiles and publishing. My clients are PhD candidates, researchers and academics. So my presentation today is based on my experience in Marco Fami's program for digital for librarians to sort of undergo a 12 week mentoring program. So it's called digital librarian in residence program. And for this year, I was able to participate in this. So I'll be talking about my experience in it and Marco working in the data co-op platform projects. So before I go any further, I'd like to acknowledge the traditional custodians of the lands on which we are meeting and pay respect to the elders past and present and extend that respect to Aboriginal and Torres Strait Islander people. I live on the Gold Coast, which is part of the Gumby, Gumby Mary people's lands. Okay, so for the data co-op platform, this is a project that is funded by LEAF, the LEAF grant to support data science research. By making spatial temporal datasets from different domains available in a uniform way. Combining datasets requires values of properties from multiple data sources to be consistent by having the same format syntax and the same meaning semantics. So the project started with the ABS census data with the aim of making 200 to 300 datasets available by the end of the project. This is the methodology that is pictured on the data co-op website. So the information is out there. As you can see, the data will be coming from various sources. Census, urban data, health services, public health, jobs from governments and from not-for-profit organizations. There are family service providers, community services, humanitarian organizations and backbone institutes. There will also be data from community and social media. So the platform intends to use the stakeholders use cases in gender community engagement and perform some data engineering on the datasets. Allow the users or consumers to do data analytics visualization and they will then provide data products in the form of secure data packages. Both open and also open, openly available link data packages, plus access to interactive dashboards. So this is also information on the data co-op website. So from the left to the right, you can see the first column is talking about the open data providers, the sources. All of these datasets will be very varied, will be from different domains, probably different structure, formats, etc. So they will require data harmonization and then that will facilitate data linkage and on to open access as well as secure data publishing products. So I will just focus on, so those are the things I was talking about. I'll mainly be focusing on the back end side of things and these are the part of the activities that I was able to observe during my placement as digital librarian in residence. I was shadowing Marco and attending data co-op meetings and trying to understand, so this is all new to me, so new to vocabulary as well. I'm very happy to learn more. So the different data providers will be providing a lot of datasets and that is focused by data co-op on social science researchers. So they've started with ABS census data, which is a very good place to start because that this is information that is very valued by researchers. And so the activities I've observed have to do with semantic harmonization, mostly plus also the use of semantic technologies like JSON-LD and referring to schema.org for properties to match the control vocabulary that the data co-op collection team was trying to put together in order to harmonize. ABS census properties to start with. So the ABS census data can be accessed using a table builder. So I have a screenshot a little later. I'm not familiar, all that familiar with table builder, but I've had a look and had a play with it the last couple of days. The team has processed data and produced JSON files based on the properties of the census data. And the dictionary used is the 2901 defines all the variables. The variables were combined to produce, can be combined to produce thousands of attributes. For example, single variables would be age, sex and occupation. And so you would have properties that would say, for example, females in the range of 45 to 55 in a specific occupation say professional. So the raw data may be further computed to produce the sticks. This is what we were hoping to do with the consumers of data will be trying to do. So this is just a screenshot of the census table builder. With the data sets selected dwelling characteristics. So it has a focus on this particular section because it's, it's showing a lot of information there. So that section shows that there are three variables in there. The age repayments with the mnemonic MR erd are the property or the variable ID as the ID or dwelling structure and then the states. So this is just a screenshot of the ABS census dictionary. And in the inset picture lower right you can see the age property or the age variable with the mnemonic AG AP. So that will be invoked in Jason and Jason LD taggings. Here's a sample of a typical Jason data statements. So for the, the text in red, these are actually the name of the variables that are in the dictionary. For example, age persons, and then it's, it's combined with the qualification median, and then to the right of the red text which are the variables there's a colon and then you're given the values for them. So we have variables like age persons, mortgage repayment monthly, total personal income weekly, rent weekly, total family income weekly, number of persons per bedroom. Total, I'm not sure what this is, weekly household size, etc. And then there's the value. So that's how a sample Jason statement would look like. So we're going to use the dictionary of course to define the attributes. So for age persons in the dictionary it's identified as a AG AP age. The text in red, or what appears as labels in the CSV or the CSV files or the spreadsheets as the property, total person, total personal income. This is the mnemonic in the ABS dictionary. And this is the actual name of the variable. There is also household size which refers to average persons per household and further definition of qualification to it, number of persons per bedroom. This is sort of a new variable for 2016 and it's a derived item. I think it's, it's derived by dividing the, the number of bedrooms or number of persons living in a household with the number of bedrooms, something like that. So we need to use Jason LD, which is JavaScript notation for linked data to provide a context where each attribute will be well described. The definition can be provided by the organization. It's very own context, or use an existing one, which is widely used, which is what is advised, for example, or which is already, which some people sort of refer to as a de facto standard for web for scaling up on the web. Each attribute is associated with a URI, which provides the identifier and as well as the context for the variables. And one context can be used for all Jason LD collections so that there is an association of all the data that are responding or corresponding to the URI or that particular type of property. Just a screenshot of the schema.org website. It does say that the vocabularies within schema.org are produced through community consultation, and it has been very widely used in web pages making it easy for data to be published to the web. So this is an example of a tag, three tags, age, sex, occupation, where provided to one of 16 tags that were talking about this in a long about way. So, in ABS senses. The team clustered the, the properties which amounted to over 15,000 properties just for ABS senses. And so, because of that volume it was decided to create a group of tags of concepts that would cover most of those properties. So Marco used Open Refined to cluster those over 15,000 properties down to 16 concept tags. So the three of those are age, sex, occupation. And then they were. They were matched to equivalent or near equivalent properties in schema.org. So that's what we're seeing on the slide for example for age. The closest match in schema.org is the property typical age range for sex it's gender and occupation is one that is occupation in schema.org. So this is an example of Jason LD coding using the app context. Property which provides the, which is the one that Jason doesn't have to this one is going to provide the context for the property and sort of apply. And also syntactic consistency to the data sets. So at context for example the name, the name of the property. On the web when we look at when we look for recipes, they usually have a common structure. So that's the name of the recipe. That's the ingredients list. Sometimes there's a yield like good for four people, etc. And then a set of instructions. And for every set of instructions you'd have the steps for the recipe. And then further on there could be description. So these are the ingredients that are measured there. There's the use of integers, for example, two cups, let's just say so the, the integer also is defined like here in the XSD so it provides the URI for the exact definition definition of XSD as used in this context. If you were, for example, tagging a recipe for a web page, then you would use those properties name. I'm sorry about the bullet points they were meant to be there. I was pretty sure I needed it but they're here again. So name of recipe is mojito. Ingredients. It's already been defined so it accepts the proper values that we want to see on the web page. So ingredients list. Inside the square brackets and then yield. There is meaning in it so we know what yield means in this context, we know what instructions mean in this context, and then the step which would belong to the instructions. So whenever you go look for a mojito cocktail or mojito recipe they or any recipe for that matter. Always, not always, they would typically have these structure, ingredients, methods, etc. These may just have to drink some water. Okay, how to create a context for ABS Census data. ABS Census data is available on the web in HTML as well as in a downloadable format document. But we're looking or the project the data co-op is looking at converting that into SCOS with the ARDC Vocabulary Service. So SCOS there will be the URI so it will be very easy to, it will be easier to use it in the tagging and the JSON-LD syntax. So the data co-op project is working with ARDC Vocabulary Service. At the moment for that there are some demo versions available already. However, it's been mentioned that there could be questions of governance and quality assurance. So that is under discussion. Okay, so that's just ABS Census data. And we still have to think of harmonizing ABS Census data for example with other data sets from many other providers, possibly different domains as well. So this is a visualization of, for example, you begin with ABS Census data. You've got a control vocabulary for that that you could easily convert to link data using the schema.org contextual URIs, for example. But then you have to contend with having several or many different data sets from different sources. So there would be a lot of variations in it. So the harmonization is semantic harmonization within the data co-op providers across the providers data sets is something we, the project is still looking at. At the moment, there are demo versions for these classifications in this course version demonstration version from the ABS Census vocabulary. So for example, when we look at Australian New Zealand standard classification of occupations, this is how it might look like. And it would have this various inclusions within the hierarchy of that one. And then you've got the IRI, which is going to link the definition of the variables when you use JSON to other properties. And also facilitate connecting it with other read schema.org or other standard. But at the moment it is schema.org that the project is using. So this might look like this. I'm sorry about the noise. I'll just close the door. It's moving day in my complex, hang on. So we've got the usual contextual statements within JSON-LD. At the top is the use of the vocab from schema.org. But within ABS Census, the URL, the SCOSS version of the variables from the dictionary will be referred to the SCOSS URIs using the URL property. My apologies for interrupting Maria, just to let you know that we are running a little over time now with your presentation. Okay, this is nearly done. Sorry. Okay, so we've just tackled ABS Census. However, the problem is, as I've mentioned, harmonising throughout the different datasets from different providers for the platform project. This is just a picture of the 16 properties from the ABS Census and their corresponding schema.org URIs. So, for example, for age, this is the additional property. For that variable, we've got a URI pointing to the schema.org. Connecting the age presence from the age presence variable from the SCOSS version of the ABS Census dictionary into the schema.org. We would like to thank the data core project partners and Australian National University, Griffith University, University of Melbourne and University of Tasmania. And of course, Winverne University.