 Hello, my name is Andy Tushar and I am a data scientist with the province of British Columbia And I want to tell you today about the newest package that we've developed called BC data Which is used to retrieve public data from British Columbia data repositories, especially spatial data I've been working on this package for the last year or so with my collaborators Sam Albers and Stephanie Haslett and Yeah, I'm really excited to tell you about it today Like I said, I work in British Columbia in the capital, Victoria If you don't know where British Columbia is, it's a really big province, almost a million square kilometers Up in the Northwest part of the continent Just above Washington State Southeast of Alaska Their BC has a largely natural resource-based economy and we have lots and lots of data Not just about the natural resources, but all things in All topics that you could think of but when I work in the Ministry of Environment Climate Change Strategy in a group called State of Environment Reporting and We work on Analysis and reporting on the status and trends of BC's environment So we're big users of environmental data in the province And we're also big proponents of open data and open science practices Most of you have seen a variant of this slide before taken from the tidyverse cookbook and it sort of outlines the Ideal ideal workflow for reproducible data science starting with importing your data tidying it into a format that works Visualizing it modeling and transforming it Doing some analysis to get some meaning out of that data and then communicating the results And the thing that ties this all together makes it reproducible workflow is that programming box that's around it By writing all the steps of our analysis in code It makes it a reproducible product that that can be run by collaborators and reviewers Unfortunately all too often Accessing data and importing the data it kind of breaks Breaks this reproducible workflow because it's not always readily available You may have it on your hard drive and no one else has easy access to it you might have to email it to a collaborator or Or you have to download from a website manually to use it and so that kind of breaks that scripted workflow and and So that importation step becomes a critical link And if we can add it in the code then we make the whole the whole workflow reproducible So I want to do a quick introduction to the British Columbia data catalog It is BC's online repository of all the public data that we That we that we host there's thousands of data sets Close to three thousand last count most of them are available under an open license Which means that anybody can download that data and use it for whatever they like with very few constraints And it's gone around to both tabular and spatial data And it's got great interfaces for for downloading the data So if you have if there's some spatial data that you're interested in there's a great GUI where you can Select your coordinate system and select your output format select your area of interest and you can download really really big Spatial data files from the catalog store work. It works great But I kind of as I alluded to earlier here There's a bit of a problem And then not getting the data in your code breaks down reproducibility. It's can't be scripted. And so it's not repeatable So, you know, the manual process is not ideal Luckily There are API's that sort of back all of the catalog stuff and they're publicly available There's two key ones in particular. There's a catalog API which allows you to access and search and Retrieve metadata as well as many time in their data sets from the catalog And see very powerfully. There's also a web map library API that has both an image overlay web mapping service and really powerfully it has a web feature service, which is a Service that allows you to download the actual features themselves The points lines and polygons as well as all the attributes that go along with them for some really really big data sets And it has a really flexible query interface via REST API so that you can get just what you need As many open-source projects go this started to As a as a means to scratch your own itch So, you know, the three of us were sitting down one day and one of us said, hey, there's a great API Pro accessing spatial data from the catalog. I wrote a function. It works pretty well So I said, hey, I just discovered that and roll my own to the other day You know that their person says hey, I need that And so obviously there's a catalyst for collaboration and to sort of sit down and turn this thing into a package that that could be a lot more widely used All of the functions of BC data are preceded by a BCDC underscore prefix to facilitate auto completion And there's a search function and there's a get record function that are sort of around querying and searching for the metadata today. I want to focus on getting data And how and how that can enable a reproducible workflow and most specifically about spatial data so for a quick demo I want to just pull up a Dinner record that everyone can probably relate to on BC schools and the programs that they offer So here's an example of a catalog record A record is sort of the metadata Around a data set and it often comes with one or more what always comes with one or more resources, which are the data themselves and so here the the record is this programs offered in schools and the resources are these These two files here. There's a common a limited tab limited text file and an Excel file And so you're gonna get the text file because it's sort of more open Standard so we're going to use our BCDC get data function and from there from the catalog page or from the BCDC get record function you can get To the unique ID for the record and the resource and you specify those in the get data function and we will Hit that API determine what kind of data you're downloading Most of the times it will find the correct import function and and bring in a nice nice table for you so in this case it brings in a data frame of schools across the province and and they're The different services that they offer so just a brief example of getting tabular data But really the sort of the most exciting piece from my perspective is the ability to get that spatial data And not only to get it but to query it And so the the workhorse there is the BCDC querying geo data function And it hits that WFS the web feature service API To get sort of fine-grained access to geographic geographic information at the feature and property levels It allows us to query Create that data and get just what we want When we run this function just by itself Similar to DB player if you've used it to act to interface with data frames using a deep liar Syntax when you run this function, it doesn't actually get the data right away It gives you a promise which is a sort of an object that tells you this is what you're going to get When you eventually finish building your query and asking for the data itself So if you run that function by itself with the name of a record, so here we're looking at municipalities in the province It gives you the message saying collect using the collect function will return a hundred and sixty-one features in 20 fields And at most six rows of the record are printed here So it gives you a little preview of What you'll get if you run collect at the end of your pipeline So again, if you're familiar at all with the DB player framework and the deep liar in general We can sort of run VCDC query native data and pipe that into select So we can use that to select just a subset of columns that we want And it works with all the tidy select helpers. So starts with matches one of etc Here we're just selecting two Two columns the admin area abbreviation of an area group name of those municipalities in BC And again, this isn't actually running the query and downloading data for you But it's getting you a preview of what you'll get so we still have a hundred and sixty-one features because we haven't used filter But we're only getting five fields now five columns and We get five because there are all even though we asked for two There's always a few sticky columns that come along for the ride and those are sort of the important object IDs and And similar identifiers For the rose so we can get rid of those later if you don't really don't want them So after running select we can pipe that into filter again, just like deep player and DB player And here we're just using a standard logical I'm using the double equals to say give me all of those rows all of those features That Have an admin area group name of capital regional district. And so that is the regional district in which I live Regional districts in BC are flight counties in the United States. So some provincial Geographic explorations Again, we have a run collect so we get another message saying this is now going to return 13 features because there are 13 municipalities within the capital regional district And those five fields the two that we asked for in the three that we didn't but we have no choice to take So finally if we finished building our pipeline and we say we want the data now So we just tag on collect to the end of the pipeline just like in DB player Interface it with a database and it will send that query to the web server It will run it and it will download just those features and just those columns that we asked for and return it as an SF object And so now we've got a real SF object. We've got 13 features Six fields. It's a multi-polygon and this is something that we now know how to work with In our normal workflow just to show that we've got what we asked for This is the capital regional district with the boundaries of the 13 municipalities that are within it I live down Down one of those southern ones in Victoria and so That's all awesome. So we've offloaded a whole bunch of work to the server. We're getting data that we want We filtered rows and we've selected columns But we can also filter based on geometry So WFS allows doing geometric operations using the sort of standard set of Logical geometric predicates that that many of us are used to using such as intersex Equals within contains overlaps all of these sort of geometric comparisons And so what we what I want to do now is show you how we can take that That CRD municipalities object that we just created It shows just the municipalities within the capital region district and use that to get all of the green spaces the parts and protective areas That are within That regional district within the capital regional district using these geometric predicates So I'm not going to show you the the catalog record for it right now But there's a record of local and regional green spaces So we put that ID into our BCDC query geodata function We can pipe that into select to just get the column if we want park name part type in the primary use And then we can pipe that again, and this is the the great part into filter and Instead of using a logical predicate We're using a geometric predicate. So we're saying just give me those green spaces that intersect the CRD municipality And so remember that CRD municipality object CRD moon is an SF object. And so that goes along with the query we run collect on this it sends the whole shebang to the server and and Does the the processing finds the green spaces that intersect with that that object and Just downloads those ones that we want and so now we can plot those on top of our CRD municipalities and And we can see that we just have those dark green places those those nice parks that intersect The the regional district that I live in so just to recap BCD is a new package That provides access to thousands of datasets from the BC catalog Directly within our it's a novel interface as far as we know to WFS service using This familiar d-plier syntax and it's sort of a new DB plier front-end to a web service back-end It's that of a database back-end it allows you to perform spatial and non-spatial queries And to just get the data that you need and incorporate that into a reproducible spatial analysis workflow So I just want to give a great big shout out to Sam and Stephanie who worked really hard on this with me the data catalog team who Have answered millions of questions for us about how the APIs and the data are structured and how they all work And especially to our employers for encouraging us and allowing space for innovation and collaboration finally BC data is on CRAN You can use the install box packages to get it We have a package down site at bcgov.github.io slash BC data We've got some nice units on there It's finally the issues of bugs you find on the github page And hit up myself Sam or Steph Haslett On Twitter, those are our handles And find us on github as well. So thank you very much. I hope everybody's enjoying their virtual Sam Lewis use our conference And I hope I see you at a future one