 screen-shared hopefully there we go does everyone see that and you can hear me yes we can and we can see thank you fantastic okay so i'm lizzie wank i'm at unsw and i will be talking today on documenting the aus traits ontology and i'm speaking on behalf of our entire team with a special thanks to the ar dc which is providing our funding helping us develop into a national data asset um and briefly i just want to introduce aus traits um australia's largest plant traits database and then what i'm talking about today is much like edmund a very specific case example rather than sort of a large level under a large level ontology that's very domain specific so i've been working and this is still in progress documenting the aus traits structure in an ontology and this is very much an exercise in using existing vocabularies to document our database structure in turn hoping that we can have a larger impact and have the ontology we develop reused by others and then secondly at the very end i'll touch upon a separate project which is establishing an australian plant traits dictionary merging many existing resources into a unified referenced um dictionary that's actually missing among plant ecologists um and so in both cases we're really hoping that by building an ecology we will be increasing our impact so why do we need a big plant database um research and conservation require big accurate accessible data sets if you want to understand how leaf size varies across the globe if you want to look at the evolution of floral traits you need to have trait data in a readily accessible tabular format for analysis conservation also depends upon these resources following the 2020 bush fires having austrates and being able to rapidly look up which plant species resprouted following a fire and which were killed allowed people to prioritize the use of funding to protect the most endangered species but as across most disciplines data in plant ecology is scattered across endless resources in endless formats and coming from a university background i know very much that the concept of an ontology is foreign to people and they're going to much more prioritize their own research agenda to try and to create a standardized output so we've been hoping to do that through austrates austrates was first released in september last year with a concurrent release of a data paper in scientific data and the data set was released on sonodo it incorporates more than 300 data sets contributed by more than 200 contributors in some there are nearly 500 unique traits and more than one million rows of data we have at least one trait value for nearly all of australia's 30 000 plant species as the graphic in the middle suggests some traits have good coverage we have plant growth form data for nearly every species and for others we might have covered for just 50 to 100 species our workflow is all open source hosted on an open github repository and it harmonizes and standardizes those data sets first taking those original data formats sitting on each person's computer wrangling them into the raw data inputs that our script can use and then going through aligning trait names to a controlled vocabulary to our trait dictionary updating taxonomy to match the australian plant census which but follows darwin core adding common units etc to create our harmonized database the output itself is a series of relational tables there's a core traits table um this is a bit of a stylized version but core information like the taxon name a location trait name and a value and this links to a series of ancillary tables by the highlighted column headers and i will be talking about a few of these as i talk about our ontology but pay special attention to locations and context so that's where we stood a year ago um and one of our big projects over the past year has been compiling our workflow into a standalone our package which will very shortly be released allowing any group of ecologists for any taxonomic group to build their own trait database the impetus for this was when a group of invertebrate ecologists approached us and asked if they could reuse our austrates to make an invertebrate trait database so a group simply needs to add data they need to have their own trait dictionary their own taxonomic resource but everything else required is provided and this of course led to the next thing if another group was using our data structure we really needed to establish a true ontology to more properly define what each and every term meant and in fact two ontologies one is an ontology for the database structure and the second is much more specific to our austrates for plant trait data is an ontology of our trait dictionary now i will say that ecologists the concept of ontology is completely foreign when i started working on austrates i had never thought about this and still when i talk to anyone within our department about what we're doing they kind of look at us blankly darwin core and taxonomic standards are the only thing that crosses their mind um from the beginning austrates was broadly aligned to oboe daniel foster the project lead had been involved in the original development of oboe um and this has worked fantastically well for a trait database um for especially for ecology there's some core classes the entity the object of interest that which is being measured on that there is an observation made the observation is a cluster of measurements a single group of rows of data each measurement is a single row of data that is a measurement of a single trait and references some trait concept within our controlled data dictionary being extensible is perfect one of the reasons we were really attracted to oboe so we can link that core observation of a of a plant to its location to any other contextual information and within ecology these contexts are absolutely essential a plant a trait value measured has no meaning without knowledge about the location and the context and the location these are in of themselves observations of entities location is an entity the context property is an entity and then as building ontology has been a very iterative process for us one of the first questions was well wait a minute what is the entity we didn't have entity in our original database structure and as we talked about this we realized that actually within austrates we have three types of entities sometimes the observation is on an individual sometimes it's on a population sometimes it's for an entire species and so we extended the backbone of our ontology our oboe ontology such that each of these represents a broader subset a broader number of rows of data to narrow and narrow our groupings this brought the next thought bubble for our group now where do we link contextual information in our expanded hierarchy and at different points is the answer quite a bit of it we have linked to this level where where you make an observation at a population level so the location plot context this is something so many researchers will give us a specific geographic location but within that they identify some level of stratified variations such as the top and bottom of a slope there's also actual manipulative treatments you've added nitrogen to the soil for some groups of individuals and not others we also have temporal context if a single individual if there are multiple observations on an individual over time perhaps during the wet in the dry season and oboe has continued to allow us to add all these things this diagram could go on and on adding identifier as adding other links um but it's worked well for us but i'm going to jump to the process of when we actually look at our output table it also aligns remarkably closely with another ontology that was recently published the ecological trait data standard which is itself an extension of Darwin core the well-established framework for biodiversity databases the ecological trait data standard has quite similar tables to ours with quite similar um headers ours is action extension and that they don't provide the detailed information on locations or context that we are able to capture so i've taken this and i am very much a novice still and welcome feedback but i have taken my much more elaborate mind maps than i've shown here and built an ontology in protege it's consistent with oboe the ecological trait data standard in Darwin core it reuses terms for many different ontologies among them all our sources align with bib text contributors are identified by orchids the contributor information aligns to data site and various other individual terms i have used we've tried as much as we can to always reuse terms not developing quite few of our own the process also i spent a lot of time looking at three actual database ontologies that build upon oboe to try and understand how one does this it's there have been some difficulties that a lot of time spent i think one of my hardest ones and this will be familiar probably to many of you in the room is understanding conflicts between reused terms so as a well the example that's frustrated me the most is that within oboe measurement value the actual value of a trait is a class within ecological trait data it's a data type property so ets each of their tables is considered a class but all the columns within them are data type properties so here i have the same identical concept and a fairly core one for a for a scientific table trait value table um but they are fundamentally different um and it's been hard for me even to reconcile um because value is in some ways both it's the property of a trait but it also has some of its own properties units value type and of course since i'm borrowing from these two ontologies to build our own i sort of need the context around both of these terms at the moment they're mapped as equivalent values which i i'm quite certain is not an ideal solution um and i guess this is part of a broader tug of war and it's been very interesting to listen to the talks um as i said i came at this from the perspective of the field ecologists and the more i look at ontologies the more i appreciate them and they've really forced a lot of semantic clarity in our project but i also realized that the very simple table format that ets has is much more approachable to most ecologists and now for the last few minutes i'm just going to jump across to our plant traits dictionary um so for those 500 traits we have a trait dictionary that's associated with os traits every trait is defined allowable trait values for a new for a categorical trait or ranges for a numeric trader included and it's transparent accessible and easy to update i've presented this to a number of international audiences and they've been saying this is now the new gold standard of of a plant trait dictionary so that's been heartening but we know it can get even better um and is another resource really needed um and much like with the ontology for the database structure where we see an ontology being necessary not just for our own database but to have the actual database structure reusable by other groups we're also hoping that if we publish a trait dictionary those definitions can also be more easily reused um and so what's out there um anyone who's a plant ecologist is probably familiar with try it's got at least 10 times as many records as os traits does but it has definitions for only a small subset of its 2000 traits and from an australian perspective it has quite poor coverage for traits related to fire for instance that are interesting to australian plant ecologists there are published publications that are trait handbooks that exist for about 50 traits but certainly nowhere near 500 um i've delved deeply into the published ontologies and there's an absolute wealth of data within them but again ecologists don't go to these and i think in part because of how constrained each definition is to a certain hierarchy and meaning there does exist one published trait dictionary for plants but it's again quite incomplete and hasn't been updated since its inception we've been in touch with the group that developed top and they'd be very excited to sort of jump start that again using our trait dictionary as a basis so okay i've two slides left um so we're going forward from here trying to make our trait dictionary fairer and prep for publishing it we're adding keywords and trait categories so broader and narrower concepts um we're adding references wherever possible we're adding links to whatever trait databases and ontologies do exist such that our trait dictionary will be interoperable and then we are getting ready to publish our definitions and um hopefully through research vocabularies australia and a special thanks to roane brownlee for helping me navigate this and have get a draft pulled together um and i'll just end by saying we're very mindful of what our community wants so i now can navigate an ontology but we want to make sure we put out a trait dictionary that truly does get reused um so just going to say thank you and here are different resources for austrates and if anyone does ever want to go look it up my very much of a draft owl file feedback is most welcome thank you