 I'm looking forward to joining me this morning. As I said, my name is Darren Barnes. My talk today is here to talk about a passion I have, if you will, for helping my organisation. That's the Office for National Statistics in the UK. Build better data that will support better insights, better analysis, and make our statistical data as accessible, as inclusive as it can be. The power behind our date needs to be leveraged as easy as it can be for the people and the services that want to consume it. Wrth gwylltynychwch, mae'n hi'n meddwl gwych ysgifeth ddechrau'r fforddiadau feddylu yma, ac sydd wedi'u gynnwch ddegwyd iawn i'r brwyllt llyfr. Yn y cwm Floridau, rwy'n credu cymryd perthesol yma yn y llyfr yma o'r ystod o'r ystod a chwybod yn cysylltu知ell yma eich cyfraffug sy'n mengyrsio arall. Ond gofyn amgyrchu cyfraffu bynnag, y cyfraffu cyfraffu, roeddo ddoin ni, storef amwordiant. That prepares a foundation for a better data-driven future for our organisation and users. But think about it. So, if you're building foundations, you don't just dig a hole and pour concrete into that hole, and to hope for the best that it's not going to crack and the house falls down, you need a strength in that with supporting materials. ymwneud. Yn ymlaen, mae'r syniadau am ffobl yn ymduldol yn y mediadau. Felly newydd mor hyn i ddim yn ddweud ei ddau ddaf yn fy gref i ddeithasrawu'r datblygu ac mae'n adnod ddigwydd ar我覺得, mae'n iddynt ddigwydd ar y ddweudio'r cyfeithio'r ddweudio. Y meddaith sydd wedi'u gwneud, yw'r ddweud a'i adnod y ddweudio ac yn perthun â sefydlu ychydig o ddweud cyllid ac yn gweithio hefyd cofaveidol, efall hynny ni wedi hanes ar ddeithasrawu'r ddweud. Rydym hoffodd maen nhw'n ddechrau'r derech o'ch sefydliadau, y gwerthwng o'ch ddaf, a gennym o gwybod llwyddon o'ch ddechrau'r arwain, ond mae'n oes cyd-wyadpus o'r credu arwain yma. Felly o fe wirionedd y data'r tro. Felly mae'n ddweud o'r hoffodd. Rydyn ni'n fyddwch chi'n gwerthwch o syrdi. Felly mae yma bydd y dyfodd â'r part. Rydyn ni'n ymddangos yn llyfrdd ychydig mewn bydd, ar gyfer y teimlo a'r byddau, y cultur ar y cyfnodau a'r cyfnod yw'r cyfnod ddechrau. Yn ymgyrch ymddangos ar y llyfr, a'r ddechrau yw'r ffaith yw'r cyfnod yw'r cyfnod yw'r newid ymddangos. Mae'r ddechrau yn y ffrindwyr lleigau ffyrdd yw'r cyfnod yn gweithio'r ffyrdd yma, yn ddechrau'r ffyrdd yw'r cyfnod. Mae'r rhai, ymlaen i'r argynod, y gallu gwneud ymgyrchu'r ffrindwyr, mae'r hynny'n gweithio'r ideaeth. gyda'r mynd iddo iddo. Nid yw ymweld i'n gwybod mewn fawr. Yn ddylai'r bwysig, ond mynd i'n gwybod i bwysig i gael y cyfath. Yn ymweld i'n gweithio i gael eich helpu i gael y tîm yn y gweithio better a'r gweithio i'r llei'n teimlo i'r gweithio, yn ni? A'r amser wedi'u ffluio'n gweithio, cyllid, cyllid, a'r ddweud o'i ddechrau ddechrau ar y pollwyr. Wel, mae gen i'r ddweud o'r gwaith o'r dweud. Mae'n rhaid o hwnnw o'r ffordd o gael o'r spreadsheets. A'r dailig yn ychydig yn ymdyn nhw'n fany ddim gynnig o'r ddeilig. Mae'r ddweud o'r ddweud o'r ddweud o'r ddailig o'r ddailig, ond y lable yw'r gwaith, rhaid o'u gwleidio'r dweud o'r ddailig dywed o'r ddailig, o'r ddailig o'r lable o'r ddailig o'r ddailig o'r ddailig, ond mae'n cyfnod o'r meddwl am y dyfodol gyda'r unig ar ôl yn ddigon. Rwy'n meddwl, rydych chi'n meddwl ysgrifennol yng Nghymru, ond mae'n meddwl yng Nghymru, ond mae'n meddwl ar gyfer allan y rhan. Rydyn ni'n dechrau bod, mae'r mhondol yn y ffordd o blygu mwyaf ar gyfer y ddechrau. Rydyn ni'n meddwl am ychwaneddu ac ymgyrch chi'n meddwl am gael o'r tableau. yw'r llyfr o'r llyfr o'r llyfr? Mae'n ymgyrch yn ymdwysig yn ymdill a yn y cyflwyter yw'r gwaith yw'r 2019. Is that a calendar year? Is that a financial year? I have no idea. Cos I can't see in terms of any of the current metadata that exists for it. Are the labels for the industry types? Are they the same thing? Can I compare them? Can I not? I don't know. I have to go off to somewhere else to identify where these codes have come from. So we want our suppliers to be using common labels, consistent code lists, providing the information about the attributes, the rotations in a consistent manner. This gives you opportunity for us to build relationships between the data and other datasets that exist out there. And it adds levels of confidence, right, in terms of a user can be more certain and they can appreciate what they're looking at and perhaps get data that they want much more quickly and actually see whether or not there's data that exists elsewhere that might meet the same and similar criteria. And we think that CSVW, or CSV on the web, is the kind of format of choice for us and we see that being quite a good standard for us to be able to support that change, that consistency within the organisation. I'm hoping that some of you have heard of CSVW. It is a standard for the World Wide Web Consortium. But just in case, it's not a single file format. It's made up of a sort of standard tidy CSV with a JSON file attached to it. We squeeze all that really good metadata in. And it's basically a subset of a standard document format for linked data, right? So it's JSON-LD. The standard defines and validates the structure and supports the transformation from that CSV into the JSON or RDF world. It allows fully extensible metadata and uses externally defined vocabularies. And that's really important, right? Because we want to be able to use standards across the web and not just redefining new ones all the time. So crucially, it allows us to tune everything, all the data, all the kind of code lists and things into URIs. And that basically makes out data of the API. So what does CSVW look like? Kind of on the left is the observation file. But it expects to be in a kind of tidy format. So if you use R, you might be familiar with that term tidy. But it's a rope or observation type of thing. It prefers codes rather than human readable labels. And the codes and the code lists and stars are important, right? Because we bring them in from these canonical lists, right? These external vocabularies. And it makes sure there's no ambiguity around what it is you identify. So geography code or some other kind of classification out there like the industry stuff I showed earlier on. So that helps the user that have that kind of source of truth that they need. And obviously then all these canonical lists have their own human readable formats anyway that we displayed the user on the websites. The JSON file that contains all the metadata, we've defined a kind of application profile. I must go in loud. There you go on. So yeah, we've defined a kind of application profile that we believe kind of manages all the metadata that we expect our user to be using. So in other words, we've done some user research. We found out what are the key elements of the data that you want described, right? What are the bits of information that you need that will make it easy if you interpret this data? And we've reused other standards like DCAT, SCOS, RDF, XSCOS. There's a whole bunch of other standards already exist. So cherry picking the best of those to make sure that it's more consistent across the piece. My mouse has gone a bit nuts. OK, so it's an important part of Labour on though is the fact that all this reference data where you have a kind of... The CSVW was a precursor for us to meet our linked data ambitions. And a support barrier in this interoperability of our data with other data on the web is vital. We're trying to build a mighty knowledge graph of data, a web of data, if you like. And once we publish all our information, we make all that metadata open and publish for others to reuse as well. And that's really important, right? Because we want people to be reusing the same sets of classifications of code lists. And where we have our own, we're using other external vocabaries, we're making sure we link to those. We're making sure that they are known entities that we're using so we're not reinventing the wheel for the sake of it. One more for you. Another further rally you want. So once we've created our CSVW, we've got the JSON file, we've got the CSV. There's a tool that we're using. It's open source. It's available. It's called CSV2RDF. And it kind of adds that semantic layer to our data to provide that kind of linked data location so you can see the subject predicate and object at the bottom. So it's providing that kind of build. It's providing an easier way for our data to connect the other data on the web as well. So how are we going to make this happen? So I said earlier on, we need to support our users. Support them in a way that helps them on that path. And so my team in the ONS has created a kind of data pipeline, a system of a tooling that will help change that source data, whether it be tabular formats, whether it be through another API, through a database extract, to create that CSVW. So we're building this kind of heavy lifting equipment, if you like, in order to help move us towards structured data and metadata. And this is the project they come up with. So again, it's all open source. You can find it on GitHub. I've put links in at the end. So this CSV project is a kind of open source project available to all. It provides a kind of command line interface tool so we can turn your CSVs into that kind of five-star link format. And we get to hoping that this standards-based approach will unlock network effects, right? To accelerate that kind of data analysis and make it easier for us to collate, compare, contrast data from different sources over the web. And all our data kind of depends on these open standards. So fast forward a few years down the line, we've got all this wonderful linked open statistical data on the web, and we can offer this well-formatted data to everyone, to anybody who wants to leverage the power of all this kind of statistical data we have. Whether it be through APIs, whether it be through APIs, or data explorers, the data is there in that kind of common consistent approach or we can start training things like large language models to support even better answers. But now we have the tools, we just have to have the education and the reliance on the statisticians to help us drive this so we can build that solid foundation from which we want to build. As a publishing organisation, we have this structure and metadata means that we can automate our processes so they're repeatable, they're less manually intensive and our internal processes are much better. So we're developing other tools for this, for a new CMS, which is a kind of block-based element so that we are creating content once and we're using it through many different channels. We're also building the ability, so we're also building the capability that allows us to query the data in the repository to drive those visualisations on the tables, charts and maps and other things. We want that data to enable better storytelling features feeding social media directly and kind of reaching new audiences. More importantly, to help make our data so understandable that even the less data literate people are understanding what they're seeing. From this well-organised data we can start to derive better insights with a wide range of resources, visualise the data in so many ways that at the moment are very, very resource-intensive to attain. But of course we can still have the tabular formats but the best thing is the system is building those tabular formats for us and that introduces a consistency to those formats which allows people to be a little bit they can use that data more consistently across the piece and it meets accessibility requirements as well which is really important. Most don't want to play. Now we're CMS, we're building the queries so that we can generate these images and the best thing about having the data query to drive these data visualisations means that we can actually power them automatically so the next time we release more data these charts are being updated without any interaction or manual interventions from us. And well, it'll enable the likes of the communities out there using data because the best things to be done without a date will be done by other people, right? As my old mate Tim says, we're serendipitous, right? I think there's a few shout-outs last slide. Links of credits, we've got the CSVCube project, we've got some information about the CSVW orcs some stuff on the ONS, the CSV2RDF but a big shout-out to my team, I basically nicked all their work, I've nicked all their words, I nicked all their slides, right? Right, so I've just been lucky enough to be able to come to Argentina and talk about it. So big thanks to Ross, Rob, Andrew and Gareth and all the rest of the team obviously but is their slidesite mostly stolen. But I think that is it. So thank you very much for your time. I hope that made sense. Thank you so much, Darren. Has anyone got any questions for Darren? Great talk, thank you so much. I was curious since part of this pipeline relies on having a tidy CSV going into it, what sort of user training or outreach or coordination are you doing across your different programs in order to make that a reality? Absolutely, it's a good point. There has to be some upfront costs and that's what I was saying earlier on about changing the culture so you get away from the path of least resistance. There has to be some stuff done. What we actually find is our statisticians are using tools like Python, maybe Macros and Excel and using machines to create this data. And then they're outputting Excel spreadsheet from those machines which then we have to take away and do something with. So we're actually trying to say, look, we can bypass that step and go straight from one machine to another so we're doing at the moment part of this stuff, I think we published around about 200 odd date sets in this format that's available and built some visualisations as you saw but we're consulting with the early adopters, the ones who want to change their organisation, find out who your allies are, work with them, work through the kinks because you can't get everything right. You have to work with the stakeholders to make that work. Understand where the gaps are, understand where the pain points are, compromise, work it through, test it, rate test it, rate test it. Since so much depends on standards can you talk a little bit or give us an example of developing a standard and what that process looked like so we've had a deliberate decision not to reinvent any standard or create a new standard so we haven't had to worry too much about consulting on what that standard might look like there are lots of good standards that exist so we've done lots of user research with different data communities in the UK to understand what is that kind of information you need so is it email address title or description what's a structural metadata look like so we've gone through and then we've kind of then looked at what standards exist already so we have like Decat Catlogs, Dublin Co SCOS for hierarchies there's a specific one for statistical hierarchies called Xscos, RDF so we've pinched them all right we build on the shoulders of giants we don't want to do anything new and then we've tested that out so we've built things, we've tested it for users does it make sense, does it work and the information profile is fully extensible so you can add as much as little as you want depending on how much resource and how much information is available Thank you, very good talk Beside you being kindly and trying people to use these new incredible things did you think or do you have any law or executive order for people to start using this and start analyzing all the data set do you have something like this If there is any law or executive order to force people to start using this new standard or you just are trying to do it kindly We want to work in a collaborative way so we don't want to say here's a new standard, please work to it that's worked really well for us in the past so now we try to work on that collaborative approach we try to say you do this work is easy if we can do it slightly different then it'll make your job easier in the long run it'll actually be able to allow you to spend more time doing the analysis you want so we try and dangle some carrots out in front in order to make that work of course there's certain restrictions or freedoms within the laws because we don't want a free for all it can't be the wild west we need to have some sort of structure and things so we do that consultation and the idea of working with early adopters is that we can start showing it actually works we can start with this real evidence of how this person has done that or the benefits are, what the impacts are being and what used to said and that makes it more difficult then for other people to say oh you got it wrong we don't like this way of doing it you provide the evidence base kind of use case so we try and do it kindly among the people that are producing data that you're asking to adopt these standards are there private contractors in the mix and if so like what, no, okay this is purely for the office for national statistics so it's our internal statisticians however what we would like to see is this being a kind of I guess a catalyst within other government departments who can start using these tools that's why we've made them open source we're very interested and we have a community called the government statistical service which brings all together all the departments and produce stuff to hopefully they will be the opportunity for commercial organisations or do provide data to government to do things in analysis they may want to join that kind of approach right but let's take one bite at a time let's try to borrow the ocean all right that's what we've got time for today thank you so much Darren Barnes thank you for your questions