 Welcome, everyone. My presentation is going to be a very different side of things. So in many ways, this is covering small data and what researchers have asked us to do, rather than big data or large data sets or what have you. Rule of thumb for me is if it fits on the laptop, it's not big data. So Jenna's image set probably won't fit on the laptop. Big data has different approaches, whereas mine is hundreds of records, usually sort of field data collection. The FAMES project started in 2011. We've been building field data collection modules or notebooks for the last decade. FAMES has been and is an offline multi-user, multi-entity GIS capable and multimedia collecting data collection system. Lot of capabilities there. FAMES 2 generally went from 2012 to this year. We were hoping that we were going to support people this year, but went till this year and with any luck, FAMES 3 will be out next year. We've delivered 70 data collection workflows that's distinct scientific data collection methodologies used in the field. Most of them are up on our GitHub repository and we've calculated that there's about 11,500 person hours in the field using our app, give or take. More to the point of this presentation, we've gone through the Syro on-prime innovation incubator at Jazz Hands and that shaped a lot of our design work for FAMES 3 because it turns out that one of the fundamental issues with designing an app for field researchers is if you ask them what features they want, they'll tell you. Unfortunately, the features that they tell you that they want are not the features that they end up using and so in many ways, this presentation is about a number of the blind alleys that we went up to and how vocabularies fit into those. I also plan to work up this presentation into a blog post on our website in case people want to comment there or have thoughts about how I can make this a better blog post. Please email me. And so this is a conversational presentation. I was planning on a fairly straightforward presentation and then I read a really interesting retrospective on a market failure of an indexing surface on a literature review service. Strosser is in my bibliography and this presentation and my bibliography are available on GitHub. He was looking at an AI-based literature review service and a quote really stuck with me that most knowledge necessary to make scientific progress is not online and not encoded. And he's saying that this year and I stopped and I thought and I went, wait, no, that's about right because the researchers using our tool have this huge, classic understanding of what it means to do research in their domains. They are experts in their domains. They are collecting data useful to them and they are responding to market pressures. And when I say market pressures, I don't necessarily mean money, although money as a function of funding is there. But I mean the structural nature of what it means to be a researcher, what it means to get prestige, what it means to continue to hold down a job. And it is those market pressures that they respond to rather than a theoretical approach where we say, yes, we'd love to have your research encoded in a machine readable format so that other researchers can benefit without crediting you. That last bit we don't usually say allowed. And all of us here, I imagine, have tried to say, well, no, people can cite your data. There are ways of fair data, open data that benefit you, the researcher, except that at the end of the day, again, a quote from Strosser, technological utopians and ideologists, there we go, like my former self, underrate how important context and tacit knowledge is. And this ties directly into credit. When a researcher is bringing their tacit knowledge, their understanding of what gets published, their understanding of how to do research within their discipline, so a very Lakotoshi in philosophy of science, trying to map that to other vocabularies or more to the point trying to draw in theoretically established vocabularies into their work is actually really hard. And so Strosser again notes that research knowledge graphs are always designed around the domain and approach of a company and are tightly integrated into their infrastructure proprietary data sets and IP. And so Strosser was talking about biomedical companies and biomedical literature, but we can just as easily apply that to research data. And so when I was looking at presentations on YouTube for this group, there are a great deal of presentations on standards and building approaches. And I very much got a vibe of build it and they will come, right? Where if we build something that is theoretically appropriate, that is well specified, our users will show up to use our wonderful standard. It turns out this hasn't worked for us at all. There are three reasons. One, the pain points and temporal horizons of researchers preparing to go to the field, they've got about a month usually. They don't have time to alter their ways of understanding when it comes to vocabularies or theoretical knowledge models. Research reward structures that I talked about before. And of course, how many people raise your hand if you had to deal with Psydox CRM? I'm just interested in any anyone who's had to deal with Psydox CRM. But it turns out that none of the projects that were going into the field, derived any benefit from Psydox CRM. But that's one of the standards for archaeological data recording. And so the risk of building and they will come is an imprecise identification of what it is and who they are. And so it turns out that being market driven in this case means that we then have a population that we can try to sneak in better value propositions into. And so talking about where we've been. A lot of this design work is a decade old, right? 2012, 2011, where we created a, we went to a workshop. We ran a workshop and we asked archaeologists to agree to some of the standards for data collectors. It turns out they couldn't. We asked them to talk about 10 things that they would want to see in a data collector. They couldn't. And so we had to build the generalized way of collecting data in the field. And so that's what's framed our old version of things. We thought, okay, in 2016 we got some money from a New South Wales grant and we're like, great, let's build in semantic data, linked open data, all of that stuff at some costs. We're talking $20,000, $30,000 for that feature. No one has ever asked to use it at all. It was an utter waste of time and money because it doesn't solve any of our primary user's needs. It doesn't factor into their reward structures. It doesn't do any of this. And so we really wanted to have linked open data working and no one cares. So what field researchers want from our vocabularies is something that is useful pragmatically in the field. They want rapid and more to the point consistent selection of complex hierarchies of terms. Why not a standard one? Well, it turns out that it's their complex hierarchies of terms as they are appropriate for their research. So it's not that they're interested in using other people's hierarchies, although they'd be delighted for every other field researcher to share their data first. There's no paradox there. But they were collecting colors or animal types or soils or textures or bones. But specific subsets of these for their research environment that have compatibility with their way of working. As I said before, most researchers already have data collection methodologies. And phase two wasn't particularly compatible with a researcher just starting out. We recommended researchers just starting out to run a pilot before spending weeks or months developing a data collection module. And researchers don't have time to figure out new methodologies while they're preparing to go into the field. And so their time for luxurious data cleaning comes after because they can throw grad students at the problem. And again, this is incompatible with published vocabularies because published vocabularies are for slightly or very different environments. Only one team in 10 years has pointed us to a published vocabulary online and asked us to call terms from it. And it's a team for famous, who's asked to use phase three. It's because of the editorial control and the compatibility problem. Most researchers have wanted CSVs or shape files. Some of the projects have wanted KML outputs. We're talking about Excel or at the extreme Python or R-based data analysis workflows. We're not talking about anything that machine encoded knowledge would be useful for. We're in the middle of phase three development. The reason why I don't have any screenshots is our UI is rapidly changing. It is a ground up rewrite. And we are trying to come at this problem from a pragmatic perspective. How we thought to ourselves, can we encourage structured vocabulary adoption while specifically addressing pain points that will lead to uptake rather than theoretical compatibility, rather than ideological compatibility, what specific issues have researchers dealt with. And it turns out that copying and pasting some of the terms from a vocabulary is kind of tedious. And the ability to import a list of terms and then prune them in a UI is quite valuable. So we plan to do that. However, we have no plans for advanced knowledge representations until someone pays us to do that. And we're using that willingness to pay us to do something as our gate. Now, look, when we say willingness to pay us, it's probably not going to be the cost of implementing the feature. But if they're willing to pay for a feature, that shows that it's valuable to them. And what this is in support of is that researchers vote with their feet. And so if we are responsive to researchers and we are responsive to their pain points, they will continue to use us and we will be able to support them and sneak in good data and good vocabularies where we can. And so we really don't want to build theory downwards because theory downwards doesn't address researcher specific value propositions. This is the XKCD's 18th standard. We don't want to be the 18th standard. We can't enforce it and researchers don't care about it. We need to make sure that the people who are paying us are the primary beneficiaries of their work. And this is why we're not doing really fancy triples or linked open data to start with because the primary beneficiaries of these things are repositories and publishers rather than the researchers. And this is not because they're not valuable to researchers because they can be. It's because researchers don't get credit for it right now. And until we can solve the reward cycle, we have to focus on things of specific and immediate value to the people who are choosing to use us. And finally, the thing that's guiding our development is we have to figure out what people are willing to pay for and now pay for in time or money rather than what they say they want. Because when we listen to what they say they want, we spend five, six digits building what they say they want and they never end up using it. And so here's our bibliography. This is again available on GitHub. And we want to thank all of our partner organizations.