 I'm Andrew Hencock. I'm a principal analyst in the standards team based in Christchurch at Stats New Zealand. We've been working for the past 34 years on classifications, concepts, metadana and coding systems. I'm also currently the chair of the United Nations Committee of Experts on International Statistical Classifications and we're just going through the process at the moment of organising our first meeting face-to-face in six years next month in New York. A lot of topics around modernisation and improving the way classifications are actually developed and maintained globally, which will help us domestically because it's time to change because the way we do things has become rather antiquated and difficult. So just really going to talk about the international statistical classifications, why we need them, issues they create, a bit about governance, practice change considerations around the use of concerns for classification management, a bit about future thinking and direction, a bit about concept-based classification management because that's where Stats New Zealand is at and is heading. We overview of SCOS and RDF, which I'm sure you know inside out, but I'll just reference that. A couple of, finish up with a visualisation of what we're talking about and then just wrap up with conclusions so hopefully you can get it all done in the 25-30 months. So the international classifications really came around in 1947 with the creation of the United Nations and the creation of the Statistics Division. And they set up an expert group to develop industry and product classifications to help with the development of a global economic policy and the postal recovery period. That expert group on industry and products classifications continued its work for about the next 50, 55 years, working on updating the classifications to reflect changes in industry activities and obviously the range of products that started to become available through the changes in the technology. The expert group has morphed into what's now called the Committee of Experts on International Sustainable Classifications. It is the central body for the current and future work of classifications that are the responsibility of the United Nations Statistics Division, but it also takes on board classifications developed by the International Labor Organization, the World Health Organization, World Customs, World Tourism, OECD, they tend to end up in our area. The Committee is mandated by and reports to the UN Statistical Commission. The Commission meets every year and it is a collection of all the national chief statisticians around the world and executives from the major international agencies like OECD, International Monetary Fund, World Bank. And they talk about all things statistics for several days on end. Every two years the Committee puts forward a report saying this is what we've been doing and requests permission to undertake revisions and get direction on the work programme. The Committee also coordinates the international review programme for all agencies and administers the International Family of Statistical Classifications, which can be found on the UN website, it has linked there. That's basically a list of all the known international standard classifications used for official statistics, whether they're international standards or not. It's sorted by domains under the classification of statistical activities, which is also embedded in the SDMX framework. And we've just done an exercise on actually revising that statistical activities classification, so the website's going to update in a wee while. But fundamentally the group's there to support principle nine of the fundamental principles of official statistics. And that is the use by statistical agencies in each country of international concepts, classifications and methods, promotes the consistency and efficiency of statistical systems at all official levels. So why do we have international classifications? Some of us think it's logical and common sense that we have them and others really do question the value of them and why. But it comes back to the need for standard concepts and definitions and classifications. I'm not sure we get a consistent approach to classify statistical data to support global policy initiatives such as the Sustainable Development Goals or help us understand climate change, order to define emerging discussion areas like digital economy, blue economy. In theory they provide a simplification of the real world and a framework for collecting, organizing and analyzing data, both statistical but also administrative. So we have classifications which are used purely for official statistics for the collection of data. We also take on more classifications which may be used for analyzing data which aren't useful collection purposes. The main thing is that they provide a framework for international comparability and a basis for national development. So countries can pick them up and use them as is or they can choose to adapt them. So they can always create their own things as often Australia and New Zealand will do. Fundamentally the years for collecting and organizing statistical information in a standard way for aggregating and disaggregating data sets and the meaningful ways for complex analysis, supporting policy and decision making and at the international level and most importantly for assisting developing countries and their official statistical programs actually a lot of the countries don't have capacity or classifications experts to actually develop their own classifications so we work to try and enable that. But there are undoubtedly numerous issues with classifications. Firstly just understanding the need for them and the lack of encouragement or support by international agencies and national stats officers to adopt them. We're all special, we're all unique, why shall we do the same as everyone else because we like to do things our own way. So there is that challenge and it's also not helped by the fact that it's very difficult to undertake a stock take audit of what's been used, how it's been used and why it's been used despite the best efforts of the stats division to do a regular questionnaire of national stats officers. Obviously obtaining international consensus and input into the development and maintenance is challenging when you've got 194 recognised countries in the UN, not that all of those turn up to the committee of experts or all of them are involved in discussions but we do have a massive challenge both with cultural, legal and language issues and trying to get consensus. And then there's the length of time taken to implement them at the international level. We have countries still using classifications developed in 1958, 1968, then alone getting them onto the current versions of industries or occupations or qualifications or research. We're also hampered by the fact it's not a central global repository for them. Yes, we've got the UN stats division web page and the international family, but they're all stored in different formats, primarily in HTML and PDF on the UN website. Obviously still got hard copy, all the time series concordances and mappings are in the Excel. They spread all over the world depending on the agency and that has a massive impact on search and discovery and dissemination. So we're looking to actually try and sort some of that out with this new approach that we're looking to look at. The traditional approaches of using sequential codes, apparent child category relationships, single labels, all the development of classifications are still constrained by the mentality of you've got an A4 page in front of you or a computer screen that provides or even the structure of an output table with the stub headings and the column headings that you can only have so many characters in your labels, you can only have one category and on it goes. You've got no flexibility and you also can't get context and fluidity in the classifications. They're very rigid and very hard to change, but also very difficult to implement because of that. There's also the ongoing issue of what is the official standard to be used. And metadata is a good example where you have SDMX, DDI, ISO 179 and Dublin Core. What is the standard? They are all standards. Which one should I use? Well I want the standard and we get into this whole circular debate around what is the standard? What are you trying to achieve? And here are examples of metadata standards. I don't like SDMX so I'll create DDI. So I don't like either SDMX or DDI so I'll create ISO 179. I don't like those three so I'll create Dublin Core. And what we end up having is a proliferation rather than a rationalisation because everyone's concerned that doesn't quite meet my needs and I want something different and my requirements for a standard are different from your requirements for a standard. So that is an ongoing challenge for us and there's no simple solution to that other than let's just take the whole lot and get rid of them and create one for everyone and see what happens. But the critical issue here is that we've got cyclical review processes and these can be on a 5, 10, 15, 20-year basis depending on the organisation. The international labour organisation releases a new occupation classification every 20 years. They've just commenced a 10-year programme of work to update it and they say that getting that done in 10 years is quite fast. We've just done the international industry classification in two years and it'll be released in March next year. So you've got this great disconnect and how long things take to do because of the processes and people sticking to organisations sticking to a process for the sacred. But most importantly it's because these review processes are designed for human consumption. We need a page in front of us for humans to be able to read and understand and interpret as opposed to putting more of a focus on what it means for machine to machine consumption. Machines don't care about codes and descriptive text or what's on the page. They just need a line of code that says do this, do that. So we're trying to bring a bit more of that mentality into the whole process. Key problems. Well, I like to sort of talk about the field of dreams mentality that we create a standard that will get used, the notion that you build it and they will come. So there's a lot of that behind what is currently being done. There is a need, let's create something, everyone will use it. And domestically we're having this massive debate around how we resolve that issue because we're currently going through a process in Stats New Zealand across the data system of introducing mandated standards. But we've already had statistical standards out there and because no one's complained about them we assume they are fine. We don't actually know how they're being used, where they're being used, where the content's right. And are they of any value yet we still keep producing them because we are creating the standard and it will get used. And then there's the Lord of the Rings mentality where one standard through all the world, here is the standard you must use it and we're all special, we're all unique. There are reasons why we can't use it, whether it's cost, resource, IT systems, whatever. And ultimately we've now reached a critical part of the game of Jenga where we've continued to create these rigid square boxes, we've continued to stack them on top of each other and now we're at the point that they're all starting to fall over because our classifications have run out of space. We can't add new content to most of the international standards because we've used up all the code patterns, we've used up all the available spaces from the current level so you then get into the issues of do we create more levels, if we create more levels you have to change the code patterns, that affects the IT systems, that affects the time series and you get into a greater level of detail which may be beyond the actual need. There's no scope for multiple contexts. There's not all flexibility in approach. We can't have multiple concepts or entities. There's no ability to create aggregated or derived link views because everything's standalone and needs to be mapped using the concordance or correspondence. So if you want to take a data story and start with a research classification and want to know how that links the qualifications, well, you can do that, you can map the two but if you want to bring in occupations you then have to map both those classifications, research and qualifications to occupation. If you want to bring in industry, you then go and map occupation industry and industry vector research and industry vector qualifications and it becomes very messy and complicated and you've also got the issue of is it an operational concordance to move data between systems or is it a theoretical concordance mapping that tells you how the classifications map to each other or is it a predominant concordance where you take the proportion of data and say, okay, this is the most likely single and have one-to-one matches as opposed to one-to-many. And there's no uptake of current ontological, taxonomical thinking as we're still driven by statistical processes, statistical silos and the needs of the statistical production system within the National Statistics Offices. So in terms of governments, this is just a couple of illustrations of the type of governance models that are currently existing. So at the international level, we have the committee, we have National Statistics Officers, we have regional agencies, international organisations. They all sort of work together but they also all work independently. They may, the National Statistics Officers may report to the international organisations or the regional agencies and then overlaying that is the statistical commission which is sort of the all-encompassing, all-powerful controller of all this. But there aren't definitive direct links and mandatory requirements to go through the commission because the international organisations have their own governance models and their own processes which may run in parallel or at odds with the statistical commission processes. In the national context, we have a standard approach that we identify as stakeholders who create advisory groups out of those stakeholders. They work together to provide the information to the project team to do the work for the development. That project team has a steering committee as oversight to tell if it's doing the right things, if it's going in the right direction, if it's on time, et cetera, et cetera. And then in the New Zealand context that will then go up to the chief government data steward if it's going to be a mandated standard for use across the full data system of the government's data station if it's for official statistics only. And the problem with both these models are they are not responsible for dynamic. They are time-consuming and problematic because it takes so long to get through the process. Both models have different interpretations of what over-revision means. Is it a major revision? Is it a minor revision? Is it a targeted update? Is it a refresh? Do we call it version 1.0 or version 1.1? Depending on the scale of change. Is it rev1, rev4? Is it a scared 2013 or 2018? We've got no consistency in how we actually define and label our classifications. And also, there's no big stick for compliance. It's all very much to step over the risk with a wet bus ticket if you don't comply. And we're having, again, a big debate around that here in New Zealand because our legislation's just changed around the new Data and Statistics Act supposedly gives the chief government data steward and the government statistician more powers to mandate, but we've still got the ongoing issues of timing, IT systems, cost, and have we got enough people to actually justify doing it? We still can't come back and forth and we can only shame them and say, oh, you haven't adopted it. So what's the point? So in terms of practice change considerations, we're still heavily in a Eurocentric Western model of mutual exclusivity and statistical balance type things where we don't encourage or adopt the thinking of Indigenous contexts. For example, in New Zealand to our Maori, the Maori worldview is very fluid and it's about familial relationships in contexts where as our classifications are colonialistic and Eurocentric, one category, one code, that's what you must use. There's no scope for change or fluidity or context. The ongoing issue of IT systems and abbreviations and version numbers are hard-coded. Our overseas trade system, we tried to change the abbreviation on our trade classification from NZHSC to put a year on the end of it to simplify things in the system. 400 stool procedures were the abbreviation NZHSC was hard-coded. A massive piece of work to actually change that. And then the ongoing reluctance to change, why should we change? Because it's working for us. What's the point? Why should we change? Robots are coming. All those sort of scary things. We're looking at the impact of digitalisation and digital metadata. Machine readable format is a way forward. So, greater use of something like an SDMX API versus a standalone classification. The ongoing issue of what about the time series? We can't change because of the time series and there are ways and means of mitigating time series which allow you to do stuff so you don't get hung up for that. We see people to let go. We want to be able to enable users to describe and classify their data how they want to match their data sets with inconsistent guidelines as opposed to impose these rigid classification structures and say this is what you must use. But the biggest challenge is educating users to this new way of thinking and the benefits that will come with it. So, obviously, data is sourced from a more variety of places than it was 10, 20 years ago. GPS, ATM, supermarket, scanner, you name it, it's there. There's greater volume and variety and the standards don't keep up and this is sort of the reflection of the social media world where everything's about now, people's attention spans are only focused on that piece of information while it's up on Instagram or TikTok or whatever and then something else comes on and they've moved on. We can't keep up with that. So, we've got to try and find ways to do that. Some of the best ways are using relational databases, computer-created matrix software, taking greater account of ontological engineering but most importantly, semantic web technology and putting the focus back on the words and not the codes. So, concept-based classification management systems are the way forward. Stats New Zealand has a vision and philosophy which it's been working on for the last 10 years which has evolved into a single area which is the tool where you haven't involved the philosophy as we would like because the need to mitigate existing legacy systems but we have been working with the vendor who are based in North America on involve-venus into a global system. So, we've developed area from a vision and philosophy from a global perspective for which Stats New Zealand is the first cab of the rank to use it. Statistics Canada are also using it and we've got other agencies like the Food and Agricultural Organization, OECD starting to look at how they could actually take that vision and philosophy. In the New Zealand context we've also now got the New Zealand strategy for a digital public service which is going to impact on the way we do a lot of our work in the digital space and for which this methodology that we're looking at will actually support. So, it's about moving to a user-driven dynamic content, trying to do stuff in real time so that we're not doing five-year cyclical revisions on a classification which is something that's out of date by the time we're finished. It's also trying to enable us to use the technology to get more market intelligence using Wikis, discussion forums that sort of thing. It's about adding value to the data by increasing the content and metadata that can be created. An AFOL page in a book doesn't allow you to do anything. So, we want to expand the data narrative by bringing in more content and more metadata and to do that semantic webs and the entity models that go behind that other way forward. It's about getting greater integration of administrative and statistical concepts and using concepts as our starting point around semantic consistency across the standards and getting greater use of existing content by storing once and sharing cost model locations. So, if you change something in one place, it will flow through to the other places it needs to be. The system of national accounts, for example, will define institutional sector. The balance of payments manual will also define institutional sector, but it has not cut and paste the definition from the system of national accounts. It is someone who has copied it and typed it and made changes slightly. So, if the SNA changes how it describes institutional sector, that could then be flowed directly across into the balance of payments manual. So, that's the sort of direction we want to take. Skiro the stupid, and I'll use the word stupid, cyclical revisions because they're labour intensive, cost and time, resource hungry, and as I said at the end of the exercise of storing at the starting point, we haven't actually made any progress. So, our move is towards use of more APIs and integration of systems and more conceptual modelling and metadata modelling. So, the methodology is behind it all. Better use of service-oriented architecture, open source where we can, so we're not constrained to a single platform which poses problems in terms of changing. Greater use of SCOS and Ex-SCOS. We haven't quite moved into the Ex-SCOS space yet, but we're starting to look at that. Integration of metadata standards such as SDMX and ISO 179. So, it's very much about taking the best bits of each rather than saying we're going to be strictly SDMX or strictly DDI or strictly ISO 179. We take the bits that are useful to us. Greater use of taxonomies and flissori, ontological engineering and concept management. So, we get a mix of structured and semi-structured data. Looking at multiple output views because a lot of the classifications you can't look at cross-cutting or sexual views of them, particularly industry. If you want to look at what is tourism, what is health, what is biotechnology, you can't actually get those cross-cutting views. So, we're trying to enable that. And also get multiple labelling options and one of the challenges of working with the ABS, Australian Bureau, when coming up for dry classification is how we describe things. Both Australia and New Zealand are similar in what we have in our labour market and our qualifications, but our terms aren't quite the same. We use the term rufa for someone who does everything to do with constructing a roof. So, what do we do in the classification? Do we have three separate categories just because Australia has three or do we have one or do we try and embed the term rufa with one of those other three or what? So, what we now want to have is the ability to get away from that sort of constraint that we can just put them in as appropriate. Everything's XML based so you can transform it into SDMX, DDO, whatever else from output of the transfer and bring in more automation and it's about educating and international thinking. It's a slow process, but we're starting to get some traction. So, metadata modelling is the fundamental thing that we're using. Starting with a concept which may have relationships to any other concepts or sub-concepts, each concept is unique. It forms a scope in terms of having a definition and then within that definition you can create a set of all the words to look at the intentional and the extensional approaches. So, with intentional concepts listed with the properties or categories that the concept must have to be part of the set. So, the concept of country for it to be a country it must be independent, it must be a geographic entity and or a administrative region. For the extensional side of it what is the country? Well, it's Australia, New Zealand, South Africa, whatever, so that's the distinction there which we can't easily really do for a lot of classifications. But the critical thing is trying to enable everyone to talk about the same concept, category and content in the same way, something that theoretically is happening with the standards, but in reality when we talk about ethnicity or whether we talk about religion we're actually talking about different things. And the same as I just alluded to of the occupation, you know, rufa, what's a rufa? What's a rufa tyler? Are we actually talking about the same thing? We think we are, but perhaps we're not. But most importantly when we talk about the same thing we can't really do for a lot of classifications. So this is where SCOS comes in and I'm sure you're all familiar with SCOS. The idea of getting to a neural network model the use of unique resource indicators on each entity which will give us so much more flexibility we can organise content into informal hierarchies and networks using predefined concepts games. The URIs remove the constraint for us of single descriptors on mutually exclusive labels we can do so much more introducing synonyms and aliases of categories and getting more granular data and easier integration and sharing of concepts and content. And work is underway with the digitalisation of the major international economic standards so as I said the system of national accounts the balance of payments manual we've got a massive work programme on what the UN have around digital formats so that you can keep them consistent and you don't have to have to 1,000 page volume books sitting on a shelf and if you update one it flows through the other and vice versa. RDF or resource description frameworks again we get the unique web identifiers for each resource or entity the triple which gives us the subject, the predicate and the object which is so powerful for us it allows classification content to be disassembled into the component parts and we can reassemble it back as it was into other shapes and forms so that we can get users to find views assist users with their data better than trying to force them to a standard and then say ok well why don't you use the standard and go through all those processes because the standard doesn't meet their needs. We get into graph networks and query languages such as Sparkle to retrieve and manipulate data but most importantly again sorry I keep using the word most importantly because everything is important here from my perspective enabling faster and more dynamic updating get away from 5 to 10 year cycle processes we want to be dynamic we want to have static copies available for use with working copies underneath which can be updated on a regular basis through a governance model and you just refresh on an annual basis for example whatever you agree with with the key users the basic model behind the ARIA system on the classification entity is just your standard metadata model classification code, devils, versions and how they link and then as a rough visualisation of where we headed to with the system this is sort of what it sort of looks like you have the concept of industry it has an agreed definition the concept of industry has a relationship to other concepts whether that could be things like agriculture, construction energy manufacturing or it could be more macroeconomic type things like CPIs GDP non-profit those related concepts can also be part of the category set the category set are all the words that you could use to categorise industry and that's the dynamic list constantly updated eventually we were looking at stock exchange approach the ones most regularly commonly used would stay in there and the ones over time that you could see were diminishing the usage would actually drop out to try and automate the process we're still having the philosophical debate about whether everything should be a concept or whether it can be split into a concept or a category this idea of having the related concepts and that sort of linkage analysis to produce views like GDP by industry which is very difficult to do at the moment because it's all standalone bringing together stuff the notion of the code bank is there because we want to use codes as placeholders for data rather than building blocks for classifications so that code bank would be a user defined spec of available codes ideally we'd sort of standardise and say okay for the concept of industry here are the sort of codes you should use in terms of the length and the type whether the alpha new American and what length but users could define what they want and apply it to the words and build their hierarchies and out of all that you can get things like standard classification like the international industry classification or the North American industry or you can create a sectoral view such as tourism and it becomes so much more easier doing it this way so the benefits are that everything's time stamp and each entity has a unique URI we bring in an APIs to enable integration of systems and to link content so whether the API is out of our ARIA system to other systems or whether we consume APIs from those systems and we're currently working with the tertiary education commission on a project around occupation where we will drop the ASCO classification and put in the US ONET system which is currently stored in ARIA we would use APIs to link to the online career platform that the tertiary education commission has used and that way we can manage the classification and the business will then be able to link using that classification through for one system to the other to get all the skills and competencies and roles and career guidance information that sits on the tertiary education commission website stuff that our classification can't hold on so we're also taking that on board for some of the international work as well it's around getting customized we've got customisable life cycles and approval processes you can restrict content to internal groups you can have external groups contribute on control and then content can be disseminated in multiple formats into Word, Excel, SMX, DI, PDF HTML, SAS, Stata you name it we can do it because it's all in XML everything's linked so we don't have and the correspondences or concordances can now be automated as opposed to manually done between versions because you're starting at the concept base and ultimately bringing more AI machine learning into reduced human interaction in the future so to finish up it's obvious that traditional methods of management of classifications no longer work semantic web metadata modelling is the best way forward we need modernise and change our governance models and take it away from a very bureaucratic, hierarchical approach we will be able to bring in dynamic real-time change and the ultimate goal of cost reduction is around doing ourselves out of a job so eventually if all this works on the one hand there's probably no need for a national stats office but on a lesser degree there's probably no need for classifications experts because it can be automated and done through machine through AI and machine learning and the algorithms and you'll get greater consistency and greater automation so that's about half an hour if there are any questions keep it to those questions