 Alright hopefully everybody is able to see my screen and the font size is big enough for everybody to see. As we go through this if you have questions you can type them in the chat and I'll try and monitor that periodically and stop. So if I'm going too fast or if there's something you don't understand you can go ahead and stop me. So this tutorial is going to focus on downloading Neon Aquatic Instrument System data or AIS data using the Neon Utilities R package. We're going to look at what comes in the download, provide some guidance on navigating all of the documentation and metadata that comes with the download, separating data collected from different locations within a site using the horizontal position variable and then interpreting some basic quality flags. So in order to complete this hopefully everybody has downloaded R and preferably RStudio. We will need to install two R packages if you haven't already the Neon Utilities package which is the basic function that's used for accessing Neon data and then we're going to use ggplot to generate some plots of the data. So these packages are available on Fran and you can install them by install dot packages and then just typing the name of the package in parentheses. So we'll go ahead and get started by downloading some data using Neon Utilities. Within the Neon Utilities package probably the most used function is load by product and so what this function does is it will download data into your R environment from the Neon API. Something else cool that it will do automatically is that it stacks all of the site by month files so if you were to go on to the Neon web portal to try and download data what you would get is a different CSV for each location within a site and each month so if you were to download like a year's worth of data from only one location within one site you would still get 12 different CSV files that then you would have to verge yourself and so you can see how you know if you were trying to do something with multiple sites and multiple years and multiple locations worth of data that get really hard to do so this load by product function will do that automatically. So before we can do that just make sure that you have installed those two packages Neon Utilities and GD plot I've already done that so I'm going to skip this section of code here for those of you who aren't familiar working in our markdown this little green arrow here will run the grayed out section of code so we're just going to go through each section one by one and if you're following along in the tutorial on the web browser these will be the blackout boxes so let's go ahead and we're going to load the two packages that we're going to use Neon Utilities and GD plot from our library since we've already installed them okay so now that we've got it loaded we can start to use the load by product function to pull data it has a couple of different inputs to this function the first one is going to be the data product ID or DP ID this is the code for the different data products that Neon has so you can find the specific data product ID or each data product on the Neon webpage data.neonscience.org the next input is the site so each neon site has a four-letter site code so for example a rickery river the four-letter site code is ARIK it can also be a vector of multiple sites to say you want to download data from a subset of the Neon sites you can create a list using just the sites you want and then if you put nothing defaults to all meaning all of the sites where that data product is available so if you were to put in an AIS and aquatics instrument system data product number you would get the data from all of the sites for that data product is collected the next two inputs are the start date and the end date and these are the exactly what they sound like they're the start and end dates for the period of data that you want to download in the form of the four-digit year and then the two-digit month so for example 2017-06 all of Neon data is stored by month so if you want you know a larger timescale like a year you'll need to specify a start date at the beginning of the year and an end date at the end of the year if you want a interval of time that's smaller than what you'll need to do is download that whole month of data and then once you have it subset it to the time period that you want if you don't put anything for the start and end date it will default to NA which means that all of the available data will be downloaded the next input is the package and so this specifies whether you either want the basic or the expanded data package expanded data packages generally include additional information about data quality such as individual quality flag test results note that not every data product has an expanded package if you put expanded and one doesn't exist and it'll just give you the basic package the next input is the release this is a new input to the newer version of neon utilities so if you haven't used neon utilities in the last year or so so the new input this specifies the release version to use so at the beginning of each year neon releases a new version of a static version of the data set with a DOI so that the results can be reproducible and then at the end of the next year any new data that was collected since that previous release or anything in the previous lease release that has been fixed will be updated in that new release and so to download the most recent release as well as any provisional data so that's like data that has been collected since the previous release but hasn't been included in a new release and note that these data haven't always been QAQC yet that's why they're called provisional you can set the release to current if you want to save your download to somewhere say on your computer or somewhere other than the our environments you can use the save path function to direct it somewhere it will default to whatever working directory you specify in R the check size you can set this to true or false and what this does is it'll check the size of the file that you're about to download before downloading it it will default to true this can become important like if you're doing really big downloads you know like multiple years of data for multiple sites just to give you a sense of like how long that download might take or if you have enough storage space on your computer to save that file and then lastly is the token and so this allows you to input your unique neon API token so you can learn more information about that on the link here but basically what the token does is it's a unique ID for each neon data user it kind of allows us to track who is downloading and using neon data so that we can report that to NSF you know it's not tracking you actually it's anonymous it's just that you're a unique user with that token and so the benefit to you for signing up for a token is that you get a faster download so if you don't have a token that's fine your download might just take a little bit longer and then there's some additional inputs that you can learn about on the link here or in that previous data alright so let's go ahead and download the data that we're going to use for this tutorial as we just saw the first thing that we're going to need is the data product ID so this is in the form of data product and then the level either one through four indicating how processed that data is to level one being the most basic data up to level four being products that are derived from lower level data products for this tutorial we're going to use what's probably the most commonly used AIS data product which is water quality so this includes things like dissolved oxygen pH turbidity and so the data product ID for water quality is DP1 indicating that it's a level one data product 20288 with the two indicating that it's an aquatics data products of anyone that's aquatics will start with a two then we're going to need our four letter neon site code and so listed here are the four letter site codes for all of Neon's 34 aquatic sites we have 24 waitable streams three large rivers and then seven lakes so you can see a list of all of them here in this exercise we're going to go ahead and use data just from one month and one neon field site Pringle Creek in Texas from February 2020 because we want to examine some of the individual quality flags we'll go ahead and download that expanded package that includes those individual quality flags we're going to want the most current release so this will be the most up-to-date data and then finally since we're only going to download one site month of data we don't need to check the file size but again for for larger downloads that's probably an advisable thing to do so we're going to use the load by products function we've set our data product ID to the data product ID for water quality DP1.20288.001 we've set our site to the four letter code for Pringle Creek we put in our start date and end date of 2020.02 because we just want that one month of data we've specified the expanded package the current release and we've told it to not worry about checking the file size so we'll go ahead and run this what you'll see is it's finding all of the available files it's downloading them it's unzipping them and then what you would see is oh I guess you still see even though it's one month here's where it's stacking so that's if we had multiple months of data it's combining all of these relevant tables into to one file for each table for all of the months alright so let's go ahead and look at the files that were associated with those downloads we can do this a couple different ways on the first is to just look at the names of the files included in this download water quality I forgot to mention this is what we specified it as here but so we look at the names in here we'll see all of these different files maintenance records readme's issue logs position files and we'll get into what those are in just a second another way to do this is we can take this list and load it into our global environment and what we'll see over here in our environment is that now all of these files are showing alright so what exactly are these files and why would you want to use them so the first one the most important one is the data file there will always be one or more data files if there is that data product available for that site and month and this includes all of the primary data for the data product that you download sometimes you will get multiple data frames there are related data products so for example water quality you'll see it's called water quality sub score instantaneous that's because it's measured and reported as instantaneous values collected either every one minute in this weightable stream sites or every five minutes in the lake sites but it's not average a lot of our other data products are averaged so for example five minute averages of 15 minute averages or 30 minute averages and some data products will include multiple data files so for example five minute averages and 30 minute averages but for water quality since it's an instantaneous and not average data product there's just one data file so next is the sensor position file and so you'll know all of these are the name of the file underscore and then number and that number is the same five-digit number that corresponds to the particular data product that they're related to so what the sensors position file does is this file it contains information about the coordinates of where each sensor was located so if we go ahead and open it you'll see you'll see this shows all of the different locations within Pringle creek where water quality was measured and different time ranges and then if you scroll over you'll get lat longs and elevations for those so that can be really helpful interpreting where different data was collected in a site so next is the variable file and so this file contains all of the information about the variables found in the other tables so for example definitions and units and other important information if we just look at that real quickly you'll see the first column is which table is it found in what is the field name what is what is the definition of that variable the type and then if it has any units associated with it so for example let's just go down to one that's in the actual data file so for example water quality instantaneous the field dissolved oxygen is exactly what it sounds like the dissolved oxygen concentration in units of milligrams per liter there's also a read me that comes with the file so this provides important information relative to how data was collected process etc and then also many of the data products come with maintenance records so for example this one maintenance here and then also cleaning and calibration and so it'll give you dates for when sensors were cleaned or calibrated and then pre and post cleaning or calibration values so that if there's an offset in the date you can correct that that's one important thing to keep in mind is that most neon level one data products are not post corrected for things like thrift and calibration offsets we provide that information in these maintenance files so that you're able to do those corrections yourself okay we kind of already did this but let's go ahead and we'll run this here and so those files that you just looked at the variable file sensor position file and the data file show up here one thing that's important to do is we want to check the versioning of the data that we downloaded what release did it come from remember we specified it to current so they would give us the most recent one but then when we cite that we were to use this data in a publication and then site it we want to know which release it came from because each release has its own unique DOI so let's look within water quality instantaneous the data file what values for the release are present okay so all of the data that we just downloaded came from release 2023 that's the most recent current release until I believe next week release 2024 and then here's a here's a helpful link if you want to learn more about our data versioning and then how to appropriately reuse and cite the undated and go to that all right so now we're going to look at data that was collected from different sensor locations within a site within most of our sites we collect the same type of data from sensors that are located in a couple different locations and all of this data gets delivered together when you use then beyond utilities load by product function if you were to just go on the web portal and and start downloading CSEs then you would get those locations separately you know but a lot of times you're going to want to analyze data together and so if you use the load by product function you're going to get those all of those locations combined into one single data file so within the data file one of the ways that we indicate where within the site the data was collected from is the horizontal position variable so let's go ahead and just look real quickly in our water quality instantaneous you'll see right here it's the third poly horizontal position and it has a numeric code within it and so for aquatic sites that code the HLR code of horizontal position code it's always going to be a three digit number for our aquatic sites as of 2024 what the possible codes are listed here so generally ones that end in a one are collected from an upstream location within a stream site ones that end in two are collected from a downstream location in a stream site the earlier numbers indicate like the type of infrastructure that it's mounted on whether it's like a monopod strut that's driven into the stream bed and mounted to that versus like an overhead cable system where it's hanging down from above or any lake and river sites where data is collected from a buoy it's going to end in three so 103 and then lastly for our lake sites we have a few sensors that are now are mounted around the outside of the lake like in the rhetorical zone and so those have HR codes of 130 through 190 note that within neon data is also collected at different vertical positions this is true in aquatics only at the lake and river sites so where data is collected like from a buoy that's in the deepest part of the lake we have sensors that are suspended at different depths to collect data so that we can create like vertical profiles of things like water temperature and water chemistry so that we can look at stratification but this tutorial we're just using a waitable stream site where data is only measured at a single vertical position so we don't really have to worry about that but so you frequently want to know which sensor locations are represented in the data file that you've just downloaded and so we can do this by using the unique function to look at what unique horizontal positions are present so we're going to say what are the unique values present in our data table water quality instantaneous in the column horizontal position and so we run it and so what we see is at Pringle Creek in February of 2020 which is the data that we just downloaded there's only two positions present 101 and 102 so this corresponds to upstream and downstream sensors one thing to be careful about is that the location even if it has the same code the actual physical location can change over time if the site is redesigned so that's why you'll want to go back to this sensor position file and find your HOR code here 101 and see for different date ranges did the physical location change and you'll see like for Pringle Creek this is a creek that has a lot of agridation and deposition of sediment and so as the channel meanders the actual physical location where the downstream sensor was had to move a little bit so you'll see like the latitude was pretty much the same but the longitude changed a little bit and the elevation changed a little bit as those sensors were being moved around through time so that they stayed in the foul web of the stream all right so now that we know what horizontal positions are present in the day that we just downloaded we can split them up into different data frames so then we'll have one data frame for all of our upstream data and one data frame for all of our downstream data so we'll go ahead and do that we'll create this data frame water quality up and this is sub setting water quality instantaneous for the horizontal positions that equal 101 so the upstream location and then we'll create a second data frame water quality down which is the data from water quality instantaneous that has a horizontal position of 102 so the downstream location so we'll run this and they should pop up over here in our environment so within water quality instantaneous there was a total of 8000 or 83,520 observations and now we've split them exactly in half were collected at the downstream location and half of them were collected at the upstream location all right so now we can go ahead and plot some of this data and just see what it looks let's plot dissolved oxygen data from the downstream sensor something else that's really cool that neon does with our data that other agencies like maybe USGS doesn't do is that we also include uncertainty estimates for most of our measurements that are derived from our calibration and validation lab so each individual sensor probe that we have they test it figure out what's the uncertainty of this probe and that gets published with the data here so if you were going to water quality instantaneous and go over to some of the data right here for dissolved oxygen it also has this column that includes uncertainty so we'll go ahead and we'll plot that uncertainty along with the data so the first thing that we need to do is we need to identify all of the columns that are going to be important for plotting the time stamp that we want to use and then the dissolved oxygen data there's two ways to do this one of them is to just look at the unique column names another one is to look at remember the variables file that was going to tell you what all of the different variables present in the data file are so let's just go ahead and do both of these we'll run back here first so these are all of the variables that are present in the data table water quality instantaneous so 151 different variables and so there's a lot of them here they correspond to dissolved oxygen some are called dissolved oxygen there's c level dissolved oxygen saturation local dissolved oxygen saturation and so we wanted to know what all of these different variable names are that's where we would go to this variables file we'll find the table that we want which is water quality instantaneous and then in the field name we can find each of those names so dissolved oxygen is the dissolved oxygen concentration in milligrams per liter if we were to go down we can find for example local dissolved oxygen saturation and so this is the dissolved oxygen percent saturation in percent relative to local condition so local atmospheric pressure this will vary gas saturation varies based on the temperature obviously but then also like the pressure so at higher elevations you get less dissolved gas than at lower elevations and then there's also this one here c level dissolved oxygen so this is reference to c level pressure so for this plot let's just use dissolved oxygen which is dissolved oxygen in milligrams per liter and then we're going to plot the associated uncertainty dissolved oxygen expected uncertainty oh also the look for our x variable we want to know which which time column we want to use so most data products have multiple time columns so if we go to water quality instantaneous you'll see there's a start time and the end time and so for water quality you'll notice that these are exactly the same and the reason for that is whenever it's an instantaneous data product that's only measured at one instant in time and so it doesn't really matter the start time and the end time will be the same for this data product for other ones that are like a 30 minute average the start time will be the start time of that averaging interval and then the end time will be the end time of that averaging interval so that's an important distinction to keep in mind another thing to remember is that all neon data the time stamps are always in UTC time regardless of the time zone where the data was collected it's always going to be in UTC time so if you want to convert it to the local time there's a lot of functions in R that'll do that for you but just keep in mind like if you see you know the sun is coming up and it says it's you know 1800 hours that's because it's 1800 hours in UTC time not local time all right so we're going to create our plot of DO we'll call it DO plot we're going to use GG plot we're going to make a line plot we've specified that the data we're going to use is water quality downstream we're going to use the end date time as our x variable dissolved oxygen as our y variable we're going to make it blue we're going to add ribbon for the uncertainty again we're going to use water quality downstream as our data and end date time as our x variable for our min we're going to use the dissolved oxygen variable minus the uncertainty and then for our max value we're going to use the dissolved oxygen plus the inspected uncertainty so we'll go ahead and run this to create our plot all right so now we've got a pretty cool plot here of our downstream DO data from Pringle Creek from February 2020 with uncertainty found added you know and it looks pretty normal for a dissolved oxygen plot dissolved oxygen goes up during the day because of photosynthesis and then it goes down during the night because of respiration here it's dampened probably because you know something happened these were this was a not sunny day or maybe there was like a high flow event that increased the turbidity or something but if we look right here we see some kind of weird data going on here like there's some spikes and dips and things and kind of what's going on there so that's what we're going to do next is we're going to look at the quality flags of the data trying to figure out what's going on with these spikes here so beyond data quality flags fall under two distinct types most of them are automated quality flags so these are things that get done automatically based on algorithms for the range spikes or steps where the change from one time stamp to the next it's quite large nulls for missing data and then some data products have data product specific automated flags there aren't any for water quality though and then the second type are manual science review quality flags so this is where somebody probably me at neon has looked at this data and reviewed it and said hey you know there's something odd going here something odd going on here and so I'm going to add a manual quality flag to it to alert users you know that they might want to take a closer look at it before they so for instantaneous data such as water quality flag the flag columns are all denoted with qf or a quality flag in time average data those quality flags then get aggregated into quality metrics which is just a percentage of the data within that average that got a quality flag and those are denoted with a qf so here let's go ahead and look at the different quality flag names within water quality and then we'll just look at the ones that correspond to dissolved oxygen and remember that we need to remove all those that are associated with the dissolved oxygen saturation in percent for the only want to look at dissolved oxygen in milligrams so we'll go ahead and run this we'll see that within the expanded package of our data product water quality there's 120 fields that are associated with quality flags so remember there was 151 variables here so the vast majority of them are related to quality flags and that's because remember we downloaded the expanded data package that has all of the individual quality flags if we were to download the basic package then it would only include those 31 variables that aren't associated with quality flags and then here the final quality flag which aggregates all of these other individual quality flags for each parameter and we'll see in a second why sometimes it's important to look at the individual quality flags versus just the final quality flags because there might be times where these are getting triggered for a reason where the data isn't necessarily bad you want to look and see which individual quality flags are getting triggered before you decide whether or not to use the data versus if you should just download the basic package and see that the final quality flag was tripped without knowing why and you might be more hesitant to to use that data so a quality flag of zero indicates that it passed a particular test one indicates that it failed that test and then occasionally you'll see a negative one and this indicates that particular test can be performed so for example if there were missing data it would get a one indicating a fail for the null quality flag but then for the range quality flag it would get a minus one right because you can't run a range test on a missing value and so again the detailed quality flag showing the individual results are available in the expanded package but if we had specified basic then we would only get this the last final quality flag there's also these alpha and beta quality flags and what these do is they aggregate results of the various quality flag tests so in most cases if any of the individual quality flags triggered a value of one indicating that it failed it will set the alpha quality flag to one and then if there were any results that caused it to get a minus one indicating that the test couldn't be run then it will set the beta quality flag to one so let's go ahead and consider what types of quality flags were thrown for the data that we just looked at the downstream dissolved oxygen at Pringle Creek for this month so we'll go ahead and run this so we'll see for range the unique values were zero and one so there were some that passed and some where the test couldn't be run but there's no one here so there were no values that failed the range test for being outside the range for the step flag okay there were a couple that failed the step flag we'll look for the null flag again there were some that passed and some that failed likewise for GAP and then you can go down through all of these other quality flags we include some for calibrations so if the sensors haven't been calibrated with an specified amount of time say the field science wasn't able to make it out to the site for the last month for some reason and do the calibration of the sensors then these valid calibration flags will get triggered again that's not necessarily a reason to a priori throw out data just because you see this quality flag that's been triggered so that's why it's important to go and look and see what are the individual flags that have been triggered because like some data that gets quality flag may still be usable and then of course the opposite is true sometimes some data does make it through that's bad data but for whatever reason didn't get quality flag so that's really why it's sort of on you as the user to review the data understand why it is or isn't getting quality flagged and then make a determination for yourself whether that's data that you're comfortable using all right so we just discussed this to a zero value indicates that it passed the quality test one that it failed and then minus one that it couldn't be run so we saw which type were happened for each quality flag now let's look at the frequency or the number of these individual flags that were grown for this period of data okay so we can see out of 41,760 total measurements within the downstream water quality dissolved oxygen there were zero that failed the range test 23 that passed the step test 224 for the null test and so on and so forth so for all of these different quality flags the algorithm that's used to create them and then the threshold that's used to trigger them are available and what we call our ATVDs algorithm theoretical basis documents and so these are available for download if you go to the neon data portal webpage for each specific data product you can download these ATVDs to get a better understanding of how quality flags are generated so we can look here are there any manual science review quality flags that means that somebody looked at the data and saw something that was wrong so we look up here right here dissolved oxygen the final quality flag science review there were zero so nobody looked at this data saw anything that was obviously wrong you know this is the one that I when I used neon data that I focused on the most if the science review quality flag has been set to one it probably means that there was something seriously wrong with the data that a person was able to visually recognize it but again just because you see it set to zero doesn't always mean that there isn't something it's just you know we collect hundreds of data products from 90s something sites and within each data product there are dozens and dozens of variables so it's a it's pretty much impossible for a person to visually review all of the data that we're collecting so that's why it's really on view as the user to note that the quality flag algorithms aren't perfect and that suspect data sometimes passes through the quality test and other times potentially useful data might get quality flag and ultimately it's up to you decide which data you're comfortable using and so that's why we always recommend using the expanded data package and understanding why the individual quality flags are being flagged and if you ever have any questions about it you know we encourage you to reach out to us who can help you better interpret data that you might not be as familiar with as we are all right so let's go ahead and replot the data now with any of the quality flag measurements set to a different color so we're going to create a new do plot it's exactly the same as before we're using our water quality downstream with our end date time as the x variable dissolved oxygen as the y variable but now as as for the color we've set as the factor whether the dissolved oxygen final quality flag when it's zero indicating that it passed all of the quality flag tests it's going to be blue it's going to be blue and if it was one indicating that it failed the quality flag test it's going to be red so we'll go ahead and run it we get a plot that's very similar to that one that we saw before except for now these weird spikes and dips that we notice that's looking odd in the data before yeah those did get get quality flags and so they're shown in red all right so we've got about 15 more minutes left of the tutorial before we open it up to questions so what we'll do is we have this last section to apply what we've learned using a different data product so for the first part we'll use another popular as data product temperature of surface water and it has its own unique data product id so dp1 indicating it's a level one data product starting with two indicating it's an aquatics data product 0053 so knowing this other data product id can we download data for the same site in month that we just looked at for water quality so we're going to create this new object here using our load by product function we've specified the new data product id we're keeping the site the same as Pringle the month the same again we're going to download the expanded package so we can look at flags and we want the current release so we'll go ahead and run this and load it into our environment and if we look over here on the right we'll see that this data product also came with sensor position files variable files readme's followed by that same five-digit data product id so we now have a readme for 20288 water quality and 20053 temperature and surface water likewise issue logs variable files etc all right so let's see what horizontal positions are present in this data remember we just did this for water quality to see where the different locations where data was collected within our temperature and surface water we have the same horizontal positions we did for water quality 102 and 103 indicating these measurements were co-located with water quality let's go ahead and again we'll use just the downstream data like we just did for water quality so we're going to create a new data frame temperature and surface water downstream subsetting our temperature and surface water 30 minute with a horizontal position of 102 so no for temperature and surface water this is an average data product remember so water quality was instantaneous we look at temperature in the surface water over here we got a five minute average data table and a 30 minute average data table and so if we go and look at this at the start time and the end time remember for water quality these were the same because it was instantaneous now for this 30 minute average data product we have a start time and then an end time that's 30 minutes later so something important to keep in mind all right so we'll go ahead and split it to be just the downstream data with a horizontal position of 102 if we look over here in our environment we now have this data frame temperature surface water downstream we'll go ahead and remove the quality flags remember I said you should always go and look at them individually for time sake here we're just gonna say we're just gonna remove any that have a final quality flag or we're gonna subset ones where the final quality flag is set for zero so ones where the final quality flag wasn't triggered in general you usually don't want to do this but for simplicity sake and time sake in the tutorial we're going to go ahead and do that and then we'll go ahead and plot the data we're going to use our data temperature in surface water downstream we have to choose here whether we want to use the start time or the end time as our x variable a lot of times you could use the average of the two so that's like the middle of the averaging period we're going to use our bean surface water temperature we chose this value and we know what units it's in presumably because we've looked at it in the variables file and then we can also add uncertainty just like we did for dissolved oxygen so we're going to set our min equal to the value minus the uncertainty and then our max equal to the bean plus the uncertainty we'll go ahead and run it and get this nice plot here similar to what we had before for dissolved oxygen but now it's for temperature we'll go ahead and do another example for this one let's look at our continuous discharge so this is a higher level data product you'll note that the number it's dp4 indicating that it's a level four data product and what that means is it's derived from lower level data products so in the example of continuous discharge it's derived from our level one surface water elevation data and then also our level one aos or aquatic observational system data the field measured discharge so periodically they go out and they can hold measurements of discharge and from that and the level one water surface elevation we can make a rating curve that predicts what is the discharge for any particular water surface elevation and that's this higher level data product level four so we'll go ahead and we'll download that data for the same site in month that we just looked at so we changed our data product ID to discharge we've kept everything the same pringle date package release all the same we'll go ahead and run it okay and it's now loaded into our environment so over here we'll see this new file csd continuous discharge something to note discharge is only measured at a single location so we don't have to worry about the horizontal position we go in here discharge is always measured at the horizontal position or horizontal ID of 110 there will never be a different number here so you don't have to worry about separating the data by location and the reason for that is like our reaches are specifically chosen so that flow coming in approximately equals flow going out there can be some small groundwater interactions you know but we don't have large tributaries coming in so discharge at any one location in the stream more or less it's going to be equal to discharge at any other location the reaches were specifically chosen for that so let's go ahead and for our continuous discharge remove any of the values that have a final quality flag or rather we're going to choose the ones where the final quality flag wasn't triggered and then we'll go ahead and plot this so we're going to use our csd continuous discharge we're going to use the end date as our x variable and then this term max post discharge here as our y variable again if you want to know exactly what that is you can go look in the variables file that comes with continuous discharge but basically it's the maximum posterior likelihood of this Bayesian model that we use to calculate discharge it's essentially the mode the most common value we also for continuous discharge do provide the means and the means if you prefer to do that but generally we'd recommend using the maximum posterior because that's what the developers of the particular model that we use recommend and then we'll go ahead and also add our uncertainties here so for discharge is a little bit different the uncertainties are the upper and lower values it's not just an uncertain standard uncertainty that you add and subtract from the actual value so when you want to use that data that's a distinction that can be important so we'll go ahead and run this here and here's our continuous discharge with uncertainty and so you see there's a higher flow event here and that's probably remember when we looked at the dissolved oxygen data where that clear di-signal disappeared was probably right for this flow event because it increased the turbidity and attenuated the light that was reaching the benefit surface and stimulating primary production all right so we've finished our tutorial and now oh right on time with three minutes to spare and now we can open up the floor for anybody that has any questions so if you have any questions you can either unmute yourself and ask them or you can type them in the chat if you have questions that like you don't want to ask in public because you're embarrassed or something feel free to email me hi I have a question thank you so much that was a really useful seminar one of the things and I appreciated the depth you went into in talking about the flagging because we've been using some neon data and we just aren't aren't totally sure what all the flags mean and everything so that was really useful and one question I had about the flagging and the data sets is do you ever remove data if you feel pretty certain that the data are bad like if you had an instrumental error or something like that or all data included and then flagged so not currently and that was mandated by NSF that is changing though we're continuous discharge is now our first data product that we're going to work on both removing data that we identify as just like so bad that it can't even be used we're trying to clean up that data using alternative data sources if we can to try and make it usable so at the moment no I guess that's not entirely true there have been like extreme circumstances where data have been redacted but it's been a few and far between for the most part what NSF has directed us to do is to manually flag data that's bad but it still gets published which I mean as you just alluded to is not a perfect solution and can lead to people misinterpreting how to use that data or whether to use that data that's really useful and and the discharge data is what we've been using the most and been you know sort of thinking hard about I'm wondering what the time frame for this change to removing some data is and when should we expect that and are you gonna go back and do that with older data yes to both questions but on different timelines so moving forward starting we have published water through Water Year 22 we are working on Water Year 23 which just finished October of 23 so moving forward that data is going to be cleaned up with any bad data removed and any data that can be fixed fixed and then we are planning to go back in time uh to do that to data collected previous to 2023 that will take a little bit longer I I don't know exactly how long it might take a year or two before that happens that's really useful thank you so much yeah yeah great that's that's really useful information thank you um just to note that I dropped a link in the chat to a survey so for those of you who have to drop off we just ask that you fill that out so that we can continue to improve these offerings I have a quick question Bobby um how is the uncertainty determined I'm sure that's in the ATVD but is that based off of sensor limitations or what for which data product it's different by data product so for the level one data products where it's just like sensor data that's being collected um that's derived by Caldwell so they test each sensor before it gets deployed by putting it in standards um so they know what the measurement uncertainty is for the higher level data products like discharge that uncertainty comes from a couple different sources so there's the actual uncertainty in the field measurements so like the water surface elevation um and then the manual measurements of discharge that they collect using the ATVD so those get incorporated but then the Bayesian model also has its own two sources of uncertainty so it gets priors that are based off the the channel geomorphology so they go out there um with like a theatolite um and a level rod and they measure the shape of the channel which obviously like that and the slope of the channel controls how much water can flow through it so those serve as priors for the model those have uncertainties based on the measurement accuracy of the survey and then how we interpret like where the channel breaks are like so where it goes over bank um and then there's also posterior uncertainty which is how well that model actually fits um the observations of manual discharge so it's like a remnant uncertainty uh so discharge has multiple sources of uncertainty and so if you actually go um let me share my screen again are you able to see my our console again yeah so if you go into the continuous discharge data frame it'll include both so there's some with the just the parametric uncertainty so that's like the uncertainty in the priors and then remnant uncertainty which combines both the um prior and posterior uncertainty okay and that's what was plotted in the lesson the yeah so that's like the that's like combined yeah so i think that's what i thought it i can't remember but it should have been yeah yeah so i plotted thanks