 OK. So, ddechrau'n gweithio. My name is Darren Barnes, and I grew up in the Office of National Statistics in the UK. And I heard Dave talk about a success story from the previous CSB comp material that we have done and the tool we used to actually transform the pretty handcrafted human-repeable spreadsheets that we create on a daily basis in the more user-reformant Before I actually show you a demo of the tour, we're going to go through some of the things that we have to go through and one of the reasons why we had to get to solve this problem. So Overnest produce is thousands of spreadsheets every year on the website and we become increasingly efficient. Crunch in numbers and bring that hand in the right thing to produce the sausage machine in the like of the date that we produced. The actual underlying structures in terms of the data we have no longer exist, I will be able to get our hands on that data in a machine-newn-able way. And we become so good at actually making the Excel spreadsheets as good as they can be on the website. We absolutely, at the cost of everything that needs to be done for the system that we use to build those things. So, I'm in a unique position. I can actually go to a desk of a colleague and say, any chart you give me is source information, a flat, thin file format of the data you produce each month. Gallwn ddarparau am y cyfacol yn ymddylch. Armwysau cychwyn iawn i nadod, mae'n amlwy Produce a Gwylwyddraethau. Mae'n iddu yng Nghymru o'r canaliaf a'r cyflwylliad gwahanol. Mae'n ér cyflwyllydd gyntaf, mae'n ddysgu'n gweithio'r gwybod. Mae un drwy sy'n argyflusiadau, mae hyn yn gwneud eu gwirioneddau o gwylwyddraeth. I can't afford to go back to the IT department and say, can you do this thing in sastry? Can you re-engineer some of the tool that will help me to get the information automatically? So it's cheaper and more cost-effective and quicker, but when you chuck it in Excel, you know all the work I need to do. I'm giving you guys on the website. So it's absolutely at the full cost of doing anything useful in OHS with the data that you have. I'm going to be wrong. We continue to kind of cross-back-bridge, get the data out on the website that people use. If they want to kind of cut and dice it and slice it, you really want to. And it's not going to tumble, the bridge is not going to fall, or it's a bit riffy. And we need to kind of move on in this new age away from the kind of 50 years with the expertise that we've built up, built in some of those spreadsheets that we saw a keynote speaker. The good thing for OHS in the next few years, we're actually looking to re-engineer our systems. So we're not looking to rely on third-party tools to transform our data or deconstruct it, back it, become machinery or formats. We're actually looking to rebuild our entire technical state. There's a whole piece of enterprise architecture going on at the moment to make sure that our data collection, our analysis and our dissemination platforms all work together. So we're actually shore-up instruction, we get the foundation of rights, so we can get that data flowing through systems and not rely on manual data, around it, and sort of putting it here that I used to have on there. So before we hit that slot in the environment, we saw a lot of work to do. I can't simply expect the guys in the tech department to completely re-engineer legacy systems when we're 12 years in a matter of weeks. It's been a few years to do so. So I can't stop doing the open data thing. We can't just sort of say, it's okay, it's fine, kill the wing until we've actually rebuilt our systems. We've still got a lot of work to do and it's probably a much more steeper scale than that as well, if we're quite faithful. But actually what we should be doing is meeting the point of using the tactical solution to understand where our data needs are, whatever use and need from our data and actually working parallel with those guys who are going to deliver our platforms and deliver our services forward. So we can learn a lot of information and to do that to set me the sort of tactical solution in the bag that we want to do. Right, so, we, we'll go to the next slide. So, sorry, let me suggest some more. Two years ago, in this very room actually, I met this guy. You can't quite see him, but he has been sitting with me right now. And his name is Dave McKee, a.k.a. Dragon. And he is working for a company called ScreamyRiggy. And ScreamyRiggy, a great bunch of guys, probably a website, you've got lots of lovely online tools, ScreamyPDS and so on. And he delivered a talk in this room on X, Y, Path, which is a tool that those guys use. It's a kind of transformation tool. And from the part from the previous 12 months, I mean trying to fight with our IT department to get them to build something for me that will allow me to kind of take out those spreadsheets and make them more usable online. And so, when I kind of listened to Dave do his talk, I almost went myself excited. This guy was actually offering me something, but I've been fighting so hard and being told by our IT department, what do you expect? Do you know this typically at the house of his trousers? You've got to keep on on your tall buildings, get out back and you can't do this stuff. So, this guy really told me this. So, I'm going to go back and hostage for about an hour after his talk. And we're going to come together in the next couple of months, designing and building and developing the tool we call a big leg. So, that was our tactical solution. We basically have a tool now for about 40 months and I'll give you a demonstration of that. But it's based on a Python set of libraries. It's a very simplistic kind of programming style. We use command lines to run things. It's not a fancy UI to actually access this thing. And that was the point. We wanted to make it functional. We wanted it to do, want to focus more on deconstruct those spreadsheets and that was the important part. It wasn't a bug, maybe it looked pretty. So, I thought we should come out with this. As I said, there's no front end, there's no kind of web browser on this. So, I'm going to give you a couple of examples of the different things that we have to work with. And we've seen some really good examples of some bad spreadsheets from the earlier keynote speaker. And this is the kind of stuff that we have to work in, you know what I'm saying? You can see it's got word cells. It's bold. It's got rows. It's probably got hidden columns and hidden rows. Multiple tabs. You've got content tabs to tell you about the data set. It's probably formless, sat in the back of it. The usual kind of stuff that we have to deal with. We don't want a day-to-day basis. That doesn't make me wrong. It's great for people who want to use those days. You can look at them. You can kind of see them in logical charts. You can even probably print out quite nicely and take it away in meetings. But obviously for this day and age, that's not a good enough. We need to adapt and evolve. So, what we do with the day-to-day tool is we have to set up, and obviously we're not right with data code for you guys in this instance, but we actually set up some recipes and basically instructs Data Baker what file to go look at, where things are in the spreadsheet, so we can pick them up and actually do that transformation piece of work. It's literally saying, this is where my set of observations are. This is where my set of variables leads to the variable IDs. It will strip things up. We don't want such a huge list in the CSV. It will ignore the formless and actually correct those. We're actually getting real numbers, rather than just the formless behind the scenes and matching 4, 2 and a bit of detail. So, if I just, I'd say, we didn't do anything particularly clever with the UI, but we do have a sort of simple picture run. It's a command line-based thing. It basically says, it's a spreadsheet in my folder called this, and here's the recipe. Go and run that recipe against the spreadsheet and you'll generate some files. So, we've done some sort of clever jigglypogry, just to kind of run and actually run through very slowly. This thing kind of normally takes about 0.1 seconds to run. But it's going through, and it's actually taking out two spreadsheets that are in that folder. It's looking through all the different tabs that are in that folder. Look at all the columns, all the rows, where their observations are. And it's merging things together as we speak to actually provide us then with a CSV that we can then load with our data repository. That data repository then allows us to transform that spreadsheet into your soul and make it available into a JSON stack, JSON, XML, CSV, and so on. So, now that's actually done. What it does is it gives us a number of different spreadsheets to see the way you can actually present the dataset online. And one of the neat little things that we do have that it does is it allows us to kind of, well, there's a load of validation things anyway. It actually serves a number of spreadsheets for us so we can actually see and make sure we've instructed database to pick up like things up. So it's colour code for us to make sure we have made obvious mistakes like time is in calling me or where my dimension sets are, like dimension items that all live together. So it can quickly go through and make sure that I've done it right. So that's quite a neat little feature. Then if we go back in, so these are the actual datasets that we've done. We'll open that up. And you can see now, so basically it's taken all that data from those two spreadsheets, probably about 10,000 records wherever it might be, and transformed it anywhere that was super sensible looking file. And we can actually change the template of this so that we can have it add extra columns in, for instance, for our statistical purposes, we have attributes like either the measure, whether it's a count, whether it's a percentage, whether it's a ratio and all those sort of things. So we can change the way that thing is output and also make sure that, if we've got sensible headings at the top, we can change the dimensions and so on. And that's the sort of thing that we can put in those old machines and actually it will make sense to that computer. No longer does it have to, you cannot consume the spreadsheets as you've seen before. We've got tentative talks going on about that over the last two days. The spreadsheets that we need are absolutely not in any shape or recognisable to a computer. So having this sort of tool that has really kind of revolutionised the way that we will start to take forward that data. So that's really, that's all that Data Baker is. You know, it's a set of, it uses X, Y, Half, which is the thing that underpins all the libraries and things that it uses. Don't ask me about pandas and Python because I'm not a technical person. I'm just a second-hand car salesman who basically goes out and does all the predominantly work with the statisticians to make sure they can speak in the right way. And there's some really clever people that I work with who do all the recipes and create all the kind of things that do the clever work in the background. And I think that for me it's about when the overnight's now coming into a position whereby we're starting to get the skills into our operational teams. So before, we always had to rely on IT departments. It would take six months to actually look at your requirements and then the 12 months to build it and then actually give you something that didn't work. Whereas we're having the sort of skills now in our statisticians, in our operational teams that are allowing us to create these scripts that can do the transformation work for us. So we're starting to evolve as an organisation of not just statisticians and economists that have put the numbers there but actually real big acilinists bringing this together. So we've gone to this. It's all available. It's all open source. It's available on the Github. You've got it in stores for Windows machines only. Doesn't mean you don't use it on a Macs in the ONS. All the documentation is available there. It's all open source free of charge to use. So if you want to have a look at it, be my guest. We've also pulled off another brand new base that we used in the ONS which was starting to create the pertain enhancements on the Github. So let's go back to the slides. Get that to the show and tell. One of the things we're interested in so I did mention the tackle solutions. We don't want these things in going on but we're not supposed to be there forever. That's relying on these third party mechanisms that are going to transform our data. We need to get the extreme processing done correctly so that it's actually there was Rufus that's going to open up a frictionless movement of data from machine to machine to machine and actually up on the website. That's where we need to be and that's what we're going to. But I would like to develop the data data tool further in terms of things like output in RDF for instance or output in JSON. That's a validation techniques so we can find out where things go wrong because the idea behind this is we create the recipe once and then we use all the time. It's not supposed to be something we have to be manually adjusted. So some validation rules where we get users who are creating spreadsheets and they have a capital letter that starts with rather than a small lower case one previously or they may vary one of those city blank columns that nobody for a reason at all. So those are validation tools you try to build in. We also like at the moment you can ignore all the metadata that's available in spreadsheets too so we'd like to perhaps look at how we could have a different part of the recipe looking at that metadata and pulling that out as well and it's going to inspire me with Jenny Tennyson's talk yesterday about CSD on the web we have a sort of JSON format for the metadata and we can put a link tether in between the files together or maybe that's something I can go back and we can talk about introducing in the next next version of this thing. So last but not least I said the script with you guys is a fantastic step with web tools check them out I got a link here today about how the built data vehicle and some of the things that the user underneath the hood so you can check that out I've struck in the github page this slide obviously will be up on the website later on and obviously if you want to actually my details are on there as well but we really encourage people to use this thing we put it out to open source we want this that the community tool when I invested in being you know something that just the office of national statistics uses are actually something that because we all know how difficult these spreadsheets are at work with and if this can help anybody do some stuff and also help us give into improvements going forward and we might improve documentation around things what are suggestions improvements you might have feel free we love such a uniform make and get that feedback so thank you very much