 best practices for the OGC publication on semantic resources. He has worked across main domains, including academia, NGOs, consulting, to government implementing distributed systems. Rob has led and co-authored a number of international standards that has been engaged and has been engaged with the OGC and W3C in interoperability initiative since 1998. And Rob is going to talk about the role of vocabularies in intersystem interoperability. Thank you, Rob. Thanks, Corinne. Yeah, this is a big topic, so we're going to just skim over a few issues, but just, it's timely because the OGC itself is trying to improve the way it looks at semantic interoperability, I guess. And so I wanted to give everybody a heads up of where we're at in our thinking. And that was to compare and contrast notes. So if I get this right. So I'm actually going to start off by quickly talking about what the current OGC reference model is. Just, no, get the dirty washing out first. Then I want to just discuss very, very quickly the concept that, now, there are multiple different implementation patterns. How do things get used, including vocabularies? Now, it's variable. There's a lot of different types of systems out there in the world these days, much more than before. And then dive down into sort of the nature interoperability a little bit and look at it, not so much of a monolithic thing, but in terms of multiple aspects, and then sort of draw conclusions and, again, look at where we're moving. So the current OGC reference model was adopted in a very long time, 2011. It was actually written in 2006. And it's sort of based on the old sort of Geos architecture, and it was written in a much, much simpler time where XML was something you just basically fought 19 hours a day, and life was beautiful. But things have changed quite a lot since then. We've got much more awareness and visibility of the way technology changes and the trends and so forth, and aware that we're not, you know, the geographic information community is not really an isolated bubble. We're just a piece of the bigger story. And what people wanted out of standards were based on limited scope systems. There was no complicated, nasty things like metaverses, we have to worry about. And there's this focus on the emerging wonderful new things, service-orientated architectures, okay, and things like security. These are all afterthoughts. Semantics wasn't even a thought at all. So the patterns there, these are just lifted from the OGC reference model. Two things to note. One is a small number of them. Secondly, then not particularly complete against today's world. But you notice the meta model for architectural patterns is a mud map diagram. And every one of them is different depending on which community you come from. So I'm not really going to go into detail of any of those patterns because it's not particularly relevant. There's not time. And I think I must have missed this slide, but it doesn't matter. There's a whole bunch of other architectural patterns which now people are familiar with. Sensor networks, cloud computing, sort of low bound with environments. Now all the sort of things about the low availability of networks in battlefield situations and all these sort of things. And if you're exploring on Mars or that sort of stuff, all sorts of different architectures exist now. So I want to focus in on the science, sciencey sort of stuff a little bit. And this is lifted from some paper, which is a fairly typical sub-idea where some piece of science done by some technique in this case, doesn't really matter what it is, a bit of AI, basically takes some set of inputs and derives some output. In this case, it's an algal bloom prediction. What's happening here is that some input data has some dimensions. What time it is, what sensor is used, what resolution, spatial resolution and temporal resolution. It exists that there's a bunch of information, a sort of training set, if you like, about the spatial features that have been observed at previous blooms. The effectiveness is comes down to some model for how we model our models. Again, there's going to be related models and processing and chains of models before the science is actually sort of done and evaluated. And that allows you to compare with the earlier state. So typically those sort of things are expected to be simple, similar. And of course, there's a whole bunch of, in this particular case, there's a whole bunch of in situ samples, and it might be like phosphate concentrations. But I could potentially rerun that model and say, okay, what happens if I also include nitrate concentrations if I include water temperature, if I include XYZ. So all those things become variables which I could change to do that repeat that science. So it's sort of coming down about the sort of reproducibility and reusability of the science. So let's just break it down. This is our typical surface-oriented architecture of the world. We get some data, we expose it by a service, we throw it into a metadata catalog, it references some large document which describes the data or maybe some scientific papers I should put on there. So what we do is, okay, we step up, we start to process this data with some model, we pull another data, we produce some new data, we produce some metadata for that new data. That's great. So if I want to ask a question about reproducibility and reusability is, okay, what other data could I put in that model? I want to run it again. I want to run it a final resolution. I want to know, yeah, that was done in Ireland. I want to run it in Denmark. What happens if I introduce something else like the previous count of algae in the water or something like that? Another thing in there, how would I reuse that? Well, at the moment, we have this metadata record which has no reference to anything else with current standards. And at the moment, you spin up another science project and you start from scratch. One of the things that would be useful to understand, well, what does the model require of the data to be able to reuse that model? What's the characterization of the data products that model can use? Maybe a machine-readable specification and a machine-readable model description would help. If we actually had the provenance captured, oh, well, how did that metadata relate to the original metadata and the model and the input data? Maybe that would be useful. I'll just mention the previous sort. We don't actually have a profile of any of our metadata records that actually formalize how we do provenance in a reliable way. And that still seems kind of strange. And then if you look at the sort of the characterization of the data, how all the statistical dimensions of that data delay you to characterize the data and determine whether or not it fits in to such as these things, again, you can do that sort of stuff. That's probably where the RDF data cube stuff potentially comes in. So you can sort of see, okay, we can start thinking about components now using that sort of fair lens. Metadata. Yeah, it's so easy. Everybody can do it. And we have lots and lots and lots of flavors of metadata records, lots and lots of equivalents between them. Some of them are common facets which are shared between them. And there are some vocabaries, like dub and core scheme.org, which handle those aspects of it. But many of those facets are domain specific, like your observable properties. And which means we got lots and lots of different ways of doing that. And we have an interoperability gap there. A lot of reliance on naming conventions of things like observable properties. So obviously, lots of people thinking about this, the European interoperability framework has this particular breakdown, which is a reasonable useful start where it says, okay, there's legal, organizational, semantic and technical interoperability. And the OGC has previously focused on technical interoperability, APIs, and so forth. And the W3C and so forth on things like SCOS, data standards, so forth. The semantic interoperability is something which we know we need to move into. And legal and organizational interoperability, things like machine readability of licenses, all sorts of other issues around the security arrangements, all these things will probably go into organizational, many aspects, but they fit under different headings here. Breaking it down a bit further, we can sort of, and I think I even use the same colors as somebody else had the fair slides with some random colors. These just happened by accident, as far as I was concerned, but it seems to match. But as you can sort of say, legal, we could name, we've got all those agreements, regulations, all that sort of stuff, licenses, which could potentially be machine readable, interoperable things, organizational things like credentials, who has authorization identifies the big thing from the organizational perspective, the ability to trust and identify by understanding its organizational context. And that's actually the foundational of the semantic staff. And the semantic stuff allows us, we can have different types of things, conceptual models, and we'll talk about those in a second, vocabaries, vocabulary crosswalks, that sort of context, all there's a whole bunch of stuff in the semantics, they have different roles, according to whether it's about to find ability, accessibility, and some of them may be shared. And then all the technical stuff, now formats, APIs, no schemers, infrastructure behaviors, all that sort of stuff. That's kind of all about the interoperability. Not too bad of that, but we're pretty poor of the semantic stuff still. But this community is, no, no, hopefully on the right track, heart's in the right place, at least. Oops, here we go back. Oops, I got a back button. Okay, so limited amount of time, by an horribly big diagram. Okay, so the way we've sort of started to characterize this is by looking at the different types of models and things that we have to deal with. On the left hand side, we have conceptual models, logical models, implementation profiles, vocabaries, you can sort of sit down there, transactional models, and then we can think about the roles they have, which is sort of the core concept here. So the way, and I haven't really seen this articulated clearly. So I've attempted to pull it out. The role of a conceptual model, and you can choose between hundreds of definitions of conceptual model, literally. But the way I've sort of envisaged it, and the stuff that's been behind things like surf and leaky and various other stuff, is that the conceptual model is all about, do we understand that, why a particular instance has given an identifier? Why do we think this this thing is different from that thing? Just the identification, forget about all its attributes, that's not part of the conceptual model. Simply, can I give it an identifier? And can I, do I understand the basis in which some authority gave that thing an identifier? And whether that's a diger or de facto authority doesn't matter. It's one I recognize. And that allows us to do object joining. It allows us to link things, which is an important part of data integration. The logical model allows us to add attributes to that and declare the state. What do we care about about that object? And there can be multiple logical logic, multiple logical models, most of the ways of looking at a thing. For example, do I look at a cat as I am, as a thing that goes in and out of my vet clinic? Would I look at it as a thing which is an impacting on biodiversity of a local area? Now, completely different set of attributes when I look at these things. So the schemers basically realize the logical models, or the logical model supports building sort of schemers. But then the way we use those schemers, the values we put into those, it's got to where your control of vocabulary come in. And they tend to be the way we profile those generic structures with a shared understanding of that state. So the vocabulary is where the shared understanding of the state of an object. So you actually have to have those different layers of modeling in your head to actually do this thing at any sort of scale. And then you can sort of see there's integration activities that happen between system of systems. And I'll just pull out some obvious ones, object joining and linking, extract, transform and load. Now, it's sort of where data schema transformations occur, validation of validation. Now, it does is the content in the right spot is the content what I expect it to be query translation, harvest and aggregation, that sort of stuff. And then we can sort of see different types of systems. And in one vertical, we might have data lakes where we basically pull off the data together. We might then pull that data into a dimensional data warehouse. Like your observation collections, we then may run workflows or whatever it is and generate data marts, data products, which are designed for usage in some particular application, decision or analysis, ready data. But typically, there's not one system, there's going to be multiple systems, these kind of activities are going to happen. So this really in a way says, okay, we've got to find a way of putting vocabularies into that framework. And it's going to fit somewhere into that because, you know, there's a very small number of standards where all those things are conflated. And we specify every single aspect. And what happens is, if you do that route, everybody has a different standard that I work together, which is kind of where we are at the moment, separating them out, we can potentially have reusable building blocks. But we don't have a formal mechanism for joining the vocabularies to the logical models. And that's one of the reasons why we have a problem. So just to wrap up, if we think of two different domains, and what's potentially common between those, we can break it down into a series of different facets of that interoperability. It's not just this interoperates with that, they may have some commonality between the metadata they use, they may have some schemers which are either common or can be mapped between them, that's relatively easy. They may have some common APIs they use, they may have some APIs, this one knows how to talk to that API, you can put adapters in, you can have some common vocabularies, which is a rare and wonderful thing. But, and I think it's been recognised more and more as an issue. Often the vocabularies will be, we'll probably need to manage crosswalks between the vocabularies different domains have. That's probably like a bare minimum set of facets to think about intersystem interoperability. The vocabulary is right up there in the middle. And it's fairly poorly supported in terms of mechanisms for sharing vocabularies, mechanisms for sharing and discovering crosswalks. So, you know, it's some standardising APIs is kind of easy, but it's a lot of hanging fruit. We ultimately are going to have to start working on the semantic interoperability in more detail. We can start thinking about this in, again, the reference architecture by thinking about the components of an architecture and understanding that each component has multiple aspects. And each of those aspects can be characterised by fair potentially, but it could, no, but it could also be characterised by the care principles or the trust principles or any other particular way of breaking it down and characterising it you like. You can identify its conformance, its formal machine readability of that aspect, or its documented description. So the moment most aspects are tacit, the documentation is somewhere, if you discover it, that dotted line is a very tenuous dotted line. But the move towards making more aspects machine readable gives us an approach to incrementally improving the interoperability. So what we're looking at with the OTC reference architecture, the evolution is from this document which has a list of standards, a list of components, a list of architectural patterns sort of set in stone with a static set of vocabulary models and schemas to an adaptive one where we understand the role of different standard types. We have a model of those as link resources. And then we can start sort of aspect by aspect breaking down the interoperability challenge and then look at how architectural patterns realise those different aspects. And then this big ticket item for this community is understanding how the vocabularies are linked to those standards in a practical way. And particularly understanding how provenance is handled consistently in terms of what standards conformance each of the various pieces of these architectures require or exploit. And then finally sort of some user friendly views of these complicated things by sort of profiling. This is how you use this architecture for this particular purpose. So I haven't been pinging. So then I thought, okay, how do I put 35 years of other work into 15 minutes? But this is just a very high level view of the fact that there is this opportunity to rethink some of the fundamental architectural patterns of how we start joining the dots. There's lots and lots of really good stuff happening. And we saw from Simon Hodgson's sort of code says, yeah, we're ready to adopt something once it's proven to be working. But we've still got some gaps in the middle. But what are the mechanisms? There's no mechanism at the moment for sharing of recovery. Okay, there might be thousands of mechanisms with no one mechanism. Well, there's no, there's some standards which are fairly in fairly good use like SCOS, but it's fairly complicated. But telling you where the term came from is actually optional in SCOS. If I'm sharing of recovery, that's probably mandatory. So I need a profile potentially of SCOS, which is here's my bare minimum. Tell me where this term comes from. That's my bare minimum profile of SCOS that allows me to share that term in doing across systems. Okay, so little things like that, there's a whole suite of little baseline mechanisms around how do we use these emerging capabilities in a consistent way, which are still missing because we kind of lack a governance framework and a you know, a process and targeting resources to solve the common problems while we're focusing on the more sexy high level problems or the more visible. Yes, I can get any amount of funding for machine learning for artificial neural networks, but I can't get any funding for working out how to make that reusable. So yeah, I guess I'll just throw that out there as a challenge to sort of think about how we bridge that gap.