 Okay, so we'll go ahead and get started here. This is typically what we do to get started with neon data is go to the data portal page and that's where you can find all the information about our data products. So we have 182 different products being being collected across 81 different field sites. So if you want to follow along, you can go to this URL here data dot neon science.org I'll throw that into the chat. And you would go to this explore data button here. And that's where you can start searching through our 180 different data products. So you can put into the search, any term that you're interested in so of course we've got not only this organismal mammal data but also lots of abiotic variables, and other things that you might be of interest for an analysis. So in this case I've typed in mammal, and it's going to filter down to all the data products we have that might involve mammals or rodents or whatnot and so you can see there's a few of them. I'll be talking about the small mammal box trapping data today. I'm going to click on that one. And so this is what we call our data product landing page and it's really helpful for orienting yourself to any neon data product that you're interested in getting started with. So it starts out with just an abstract of basic details about how the data are collected. It talks a little bit about the latency. So how much time you might expect to wait until the data are available on our portal. And then of course if you're using our data we ask that you do a proper citation in any papers and presentations. The reason for that is just to spread the word about our data and also to help us with our reporting metrics to the NSF who's funding the project. And then of course a little bit on documentation and so this is super helpful if you're trying to dig in for the first time and you want to know a little bit about how the data are collected. So if you recommend the user guide that's kind of a 10 to 15 page document that tells you the basics of the protocol without getting into every single possible detail. So that's good for if you have like a basic question about methodology. The quick start guides are wonderful when you just want to get started right you kind of you already kind of know what we're doing. And you want to download the data. So this is an example here of the quick start guide. It'll give you just the very basics about how the data are collected. And then it's this table joining section down here that I think a lot of people find quite useful. It tells you a lot of times neon data end up being collected into multiple different tables that you then have to join together to get information about for instance, the effort that was done for the sampling itself. And then with like what the trap collection data say, and then the trap collection results on each of the individual, you know measurements and tax on IDs and things like that to link that up with the pathogen results from the samples that were analyzed you know all those things often require joining tables and that'll probably become more clear as we talk through the, the, the tutorial that we're going to do today so that's a little bit on the documentation. So any time protocols can't be followed exactly or something comes up that interrupt sampling, things like that. We have an issue start and end date, and it all goes in here and so if you're looking for something specific that doesn't look quite the way you expected to, you can come to this issue log and see if there's an entry there. And then last but not least on these data product details pages is the data availability and download information. So you can see all the different sites down here as rose, and the years as columns and the blue squares indicate when there's data available for that particular spot. So, one way to download data of course and I apologize I'm not sure if you can see my full screen but there's a download data button here in the blue. So this is my quick intro. And next, before we launch into the our portion of the tutorial webinar I did want to just talk real quickly about how we collect the mammal data just to help everyone orient to what we're actually looking at when we're looking at the data. So we have 44 neon terrestrial sites where we're collecting mammal data. There's on average six mammal grids at each of those sites. There are always going to be three pathogen grids at every site. And those are the ones that get sampled three nights, roughly in a row, you know sequentially per bout. And then there's anywhere from zero to five what we call diversity grids. So that's one night of trapping per bout and those tend to be at the land cover classes that are less dominant at the site, but they kind of help us maybe capture taxa that we wouldn't see in the dominant land cover class so that's kind of why those are in there. The trapping consists each grid is going to have 100 Sherman traps. They're at 10 meter spacing and they're in a 10 by 10 grid. So we have six bouts per year. For those of you that are familiar with neon were sort of separated into eco climatic domains there's 20 of those. And each one of those 20 has a core site. And so at those core sites were doing six bouts per year so separated roughly by about a month and sort of over the growing season is when those tend to occur. And then there's anywhere from one to several gradient sites at each of those eco climatic domains. And those are sites where we only do four bouts per year but still quite a bit of data. So that's what we're looking at for mammal data. So I'm going to pop over to Safari again and show you if you registered for this ahead of time you're going to see in the email link to this tutorial I'm going to try to throw it in the chat as well. I'd like everybody now to go ahead and get our studio open on their computer so hopefully you got the email that kind of indicates which packages you should install and things like that. I'm going to clear my working environment you certainly don't have to do that. If you've got things in there that you don't want to lose maybe took a long time to process. But I'm going to remove all my objects. And then I do again have that tutorial link here at the top of this page and the first thing that we're going to do is start by loading the packages so hopefully you've all already installed them if you haven't you would use the install packages function there. But since I already have them I'm just going to type in library to call them up and let my computer know that I need to use them so the plier is one package. Let's see here neon utilities is another fantastic package that helps with downloading the data. The neon OS package is wonderful for kind of spot checking to make sure the data looks the way we want them to and then GG plot for plotting out the data. So go ahead and get those loaded up. And then the next thing we're going to do is kind of step one which is the most fun step I think which is getting all that data onto your computer so we've got a huge wealth of resources there and it's as simple as just using the function and neon utilities load by product. And the way this works is the first thing you're going to do is put in the DP ID that's the data product ID that's going to be found like I showed you on the on the data product landing page is one spot and then various other places throughout the website so that would be where you would go to locate that in the case of small mammal data it's this DP 1.10072.001. And then I just selected three sites that I knew had some fun data to look at. So SCBI is going to be in Maryland, Kanza is in Kansas and then under is in Wisconsin. And then we're going to just do the basic package expanded has some extra data things that we're not going to be using today. We don't need to check the sides because I've already made sure that it's not going to take forever to download but if you didn't know how big it was you might want to include that. So that you don't end up spending, you know timing out on the download and stuff like that. So I am using as a start date, January of 2021 and as an end date, January of, or sorry December of 2022 so we're going to have two years of data here. And we can go ahead and run that. And you can see that the computers finding this files and downloading them relatively efficiently which is very nice. So in my opinion this is an easier way to download the data because it just is all already in the our environment that you needed in for working with the data. So, so great so that didn't take too long it's already downloaded. And then I just wanted to kind of show you what comes up here so you can see that it ends up as a list with seven objects inside of it and so we can kind of take a look at what those things are and I just think this it's the same for all the different data products so I do find it useful to kind of take a peek at what what you get in that list. So the first thing is categorical codes and then this number here corresponds to the data product ID right here in the category got categorical codes file is just the controlled lists that are used when we collect the data so we use this nice little app that lets us have drop down menu so we can constrain the values and so that's the constrained values there. The issue log is going to be what you would find on the website on that data product details day page. And then as I mentioned they get the data often get split up into different table table so the man per plot night table is going to be all the information about the mammal trapping effort so when about was completed. What the dates were in times and things like that, as well as an event ID that kind of groups that bout as one. So the next thing that we kind of want to do is just take a look at you know duplicates and make sure that we don't have duplicate records. And so our that neon OS package that we downloaded earlier and and started to use has a fantastic duplicate checking function built in for all of the neon data products. And so we can we can take advantage of that for mammals and and double check to see if there are any duplicates in the data that we just downloaded. Okay, so we're going to set it the data. We're going to check both of the data tables that we downloaded so the man per plot night is the one we'll start with. You have to tell it what the variables file is that got downloaded when we downloaded the data. And then you have to tell it what table we're looking at. Okay, so man. Dupes. All right, so I'm going to go ahead and run that and lucky us there's no duplicated key values found. If there is you would want to kind of go in and see if they're if they're true duplicates and kind of get rid of the extra records and of course an extra step would be to let the neon science staff know, you know, if you find something like that so that we can fix it for everybody. And then we're also constantly monitoring them and fixing that kind of thing on a rolling basis as well so we do our best not to have that happen but every now and then it does. If we haven't had a chance to quality check it before you've downloaded it. So, the next thing we want to do is for the man for trap night table. This one's a little trickier, because every now and then and this is sort of unique to mammals will end up getting trapped that has two individuals in it that aren't tagged individually and so they would look like a duplicate but they're not. So, this little code chunk that I'm putting in here is just checking to see if there's any untagged multiple individuals in a trap in this data set. And you can see this is what we just created and it's a it's a data frame with zero entry so we don't need to worry about that particular issue for checking duplicates in the trap night table. We're going to do the same removal of duplicates are checking for duplicates in man for trap night since we don't have to worry about the fact that there could have been multiple individuals in a trap. So we go back and use that remove dupes function again this time, we are going to do the man for trap night table. Let's see. Alright, so again no duplicates which is wonderful so we're just kind of going through the error checking duplicate, you know this is sort of best practices again. So we haven't quite gotten into the analysis yet but we're the data set so far is looking pretty good. So what we want to do, like I said, there's a lot of times the situation where we have all of the event ID information in the per plot night sort of trapping effort table. And what we want to do is join that with the per trap night data table that has all the collection information in it, so that we can have both those things together because when we do the analysis of the minimum number known alive. We can have both those pieces of information to complete that. So the next step in this particular process is going to be to create a data frame that I'm going to call ma'am join so it's going to be the joint table. And again this is using that neon OS package that has this handy function joined table neon. So let us join the per plot night table with the per trap night table. And I'm going to use the duplicate free versions. So hypothetically if there had been duplicates these would be the ones that didn't have any duplicates in them and so I'm using those, you could just use the straight up data since there were no duplicates but we're kind of purist here. Okay. So let's see. All right. So now we have this lovely joined data frame that has all this, all the same variables that were in the per plot night table linked up with all the ones in the per per trap night table. That's wonderful and the next thing we want to do is double check that the tag IDs that that should indicate a capture, really do indicate a capture. And, and that the ones that where there were no captures don't have any tag IDs. And so the way we're going to do this is just trap status error check so this is just another kind of best practice to make sure that the trap statuses are accurate in the data per trap. So again we're using those no duplicates files, and we're going to filter out any where the tag ID is not blank right so there is a tag ID in this particular filter data set. And if there is a tag ID, then we want to find the word capture in the trap status. Okay, so the so the two trap statuses with a capture would be a trap status of five which is a straight up capture or a trap status of four, which means multiple captures in the trap and so both of those have the word capture in them. And so that's what we're searching for here. If again we're finding that there are no situations where there is a tag ID, but the trap status is not set to one of the capture trap statuses okay so that so that's again just another error check saying, you know the data are looking the way we expect them to. You can do the converse right. And so we're looking for that, you know, if, if there isn't a tag ID that it's not labeled as a capture, right, because we don't want to miss those. So we're just literally doing the opposite of what we just did. And we're going to look now for, oops, capitalize that. Okay. Okay, so and again this should be empty so it is so wonderful. All right, so the next thing we want to do is filter down to just the target individuals so when we do this trapping of course it's nocturnal we're setting them out in the evening and we are going back to collect them in the morning and of course every now and then we'll get a stray squirrel or some other diurnal species that shows up in the trap. But, you know, those since they're not what we're targeting. Oftentimes there's going to be, you know, those data are less reliable I would say than what we would consider our target species so for the purposes of this particular tutorial. And just in general, since most people are going to want to filter those out I wanted to show you all how to go about doing that. And so the first thing we were going to do is download a taxonomy list so any neon data product that has taxonomic identification is going to have an associated tax on ID list that has all the information about the various taxa in that product. So in the case of the mammals. So this is what we just downloaded. We have a four letter tax on ID code that's associated with a scientific name, and then lots and lots of additional information but the thing that we're going to be focused in on here is this tax on protocol category. So if you consider opportunistic shrews and things like that, you know, we get them and we keep track of them, but we do less analysis on those than we would on our target species that are these nocturnal rodents and so we're going to use that to filter filter our data set to only the target species so that we're only looking at those. So, I'm going to create a data frame that is taking that tax on ID list is ma'am dot list, and it's going to filter that to the tax on protocol category of target. So that is what we're looking at there. And, and scientific name. Alright. So, wonderful. So, now we have filtered that is basically just the target that's just a list of all of our target taxa. And the thing that's sometimes nice to do so you know when we looked at the data set for the protrap night table, there's quite a bit of columns in there, which is wonderful, you know, there's anything that you could potentially need, but we're going to filter it down to just the columns that we're looking at and it just makes it easier to kind of look at, look at the data frame itself and know what we're looking at, especially if there's something that we're wanting to dig into a little bit more. So that is basically just defining the core fields that we're going to be interested in for this analysis. That being the plot that the data we're collected from the collect date that event ID, which as I mentioned is sort of about identification value. And then a tag ID for each individual, and then attacks on ID which is going to tell us the species so those are really the only things that we're looking at for the analysis that we're doing today. And, oh we've already required the player earlier so we're going to do that step. So the next thing I'm going to do is just filter down that joined data set. So that man dot join which we created up here, which was a joining of the per plat night and the per trap night tables. I'm going to filter that down to only the ones that are captured and only the ones that are target taxa. And just keep only the core fields that we just identified and that's going to be kind of our data set that we're going to work with for the rest of the meeting so I'm going to call that file, or that data frame captures. And we're going to use that man join data frame. We're going to filter it down. Again, looking for that word capture in the trap status because again there's two trap statuses that indicate that something was captured. So one capture in the trap or two captures in the trap and so this is just a simple way of getting both of those without having to kind of specifically write both of them out. And then we want to do the tax on ID. And we need that to match that target taxa tax on ID so again target tax is what we created here where we filtered down to just the target taxa. Those nocturnal rodents. And then we're going to select out the core fields columns that we just created up here so that we're not looking at 500 columns. And then just for, you know, appearance and sake we're going to change the collect date, the collect date because it was in both of the tables that we joined together had a collect date dot x, and that bothered me so I just changed my name to collect it. So I'm going to just show you what that data frame looks like this is kind of what we're going to be working with for most of the rest of the tutorial here. We've got a night you ID which is, you know, just kind of an individual night of sampling. It's unique for each individual night of sampling. There's a plot ID a collect date. This is going to be the tag ID for each individual that was captured the tax on ID that four letter code. You can transpose those into the actual names if you look at the tax on table that we downloaded earlier and then the event ID which is again marking a single bout together. So the next thing we want to do is calculate the minimum number known alive. And this is just kind of an index of abundance and the main assumption here is that let's say we trap for five nights. And on night one we capture an individual with a given tag ID. And on night four we capture an individual with a given tag ID but we didn't capture nights two and three. And we're sort of assuming that it was there nights two and three and just didn't happen to go into a trap. And so that's that's really the basic assumption that we're doing here. And so in order to work with that assumption, we want to make that assumption in our data explicit instead of implicit right and so the only thing that this next section of code is doing is taking those extra nights of time that we don't actually show up as a row in our data frame and adding them in. And so that's how we're going to calculate the number of individuals on any given night because we're just we're then we just have that data frame and we can calculate the numbers, because we've made that assumption that they're there. So that is what we're doing and so because again this is one of those kind of convoluted for loops that I was mentioning that we're going to be working with. So I'm going to take a second and pop over to the neon tutorial. We've already done quite a few of these sections here. So I'm going to scroll down to the section that we're at. And so we've already checked for duplicates joined the tables, checked for problems with the tag IDs and now we're at this minimum number known alive section. I'm going to just grab everything in the black box so we are just to kind of help folks orient. We are in section three calculating minimum number known alive. We've jumped down past the first box where we were filtering out the target taxa. And we are in the second box and I'm just going to copy and paste the code from this black code chunk into my, our studio session here. And I am going to run those lines and then I'll talk through what it's doing because it's going to take last time I was a little surprised it took a few minutes to run. It used to be faster I don't know what my computer must be overwhelmed. So basically what we're doing is creating, you know, all of the unique tags that are present in our data set so that's what this distinct is doing so this is every unique tag that's present in the, in the captures data set that were that we just created. So we're going to create an empty data frame that we're going to populate. And as I mentioned, the goal here, ultimately, like this captures data set has 2385 rows. Ultimately this caps new data set is going to have extra rows that are just filling in those implicit presences for those individuals that are not actually in our data because we didn't actually observe them but we're pretending like we did because that's the assumption we're making for an abundance of index that we're doing. So this for loop all we're doing is just looping through each individual unique tags so that's what we created here. And we're creating this individual which is basically a, you know, from the capture data set anything that had that tag ID so any records so there could be one record if we only caught it once there could be 10 records if we caught it 10 times. And that's what we're looking at for this individual or indiv field here. And then all we want to do is identify the first date that it was captured, and the last date that it was captured right. And so we're taking the minimum of that collect date and the maximum of that collect date it could be the same day right if it's only captured the dates that are months apart right and so that's that's what we're doing here. For possible dates we're creating a sequence of every possible day in between the first capture and the last capture, where it could have been captured. But of course we didn't sample every single one of those nights because we're not sampling every night we're just sampling during our bouts. So this possible dates which is a sequence from the first capture to the last capture of every single day, and then filtering it down to the days that we were actually out sampling right. So we may or may not have caught that individual but we're still filtering it down to the ones where we were out trapping, and that individual may or may not have been present. So this next section is just filtering down to the to the list of dates when we were out collecting. And then filling in the values for that individual tag ID. And then the all nights is where we are. You know, joining these potential might values with the individual collection data for that unique tag number. So now we have, you know, a mini data set for this one particular individual tag ID. Where the, there's a first capture date a last capture date and then all the other sampling dates in between that are filled in with all the appropriate data. So now we've done that for one unique tag and we're going to, you know, loop through. We're going to our bind those things together, right with all the other things we've looped through so that eventually we end up with each unique tag has that full data set. I think it should be finishing up here pretty soon but certainly opportunities for folks to enter into the chat if they're not sure how things are working or struggling to keep up or have a question while we wait for that to run. So the next thing that we want to do is we can take this data set and see if there were any untagged individuals and add them back in but I don't think there are. So that's kind of what we're doing there. Yeah, so there's no untagged one so we're so we're good to go on that front. And then the next thing that we want to do is start calculating the minimum number known alive and so I think again it's just going to be easier to pop over to the tutorial, especially since we only have 15 minutes left, and grab that code chunk again so now we're just scrolling back down. To the next code chunk where we've created a function that's going to calculate the minimum number known alive, and we're basically just kind of summarizing the total numbers that we see in this new data set this new data frame that we created of the number of individuals grouped by event ID and then plot ID. Like a mammal grid basically you can think of it that way. So I have copied that section of the code. I'm going to run that. And this is just a function where we're giving it a capture data frame so that's going to be this caps new data frame that we created that has all those implicit abundances made explicit. And all it's doing is just summarizing the number of distinct individuals at each event ID so within each bout that were captured. So it's literally just the number of individuals at this particular site at the bout that we're that we're looking at so that's all that's all that function is doing. And then if we wanted to calculate the, the average across a site so so if you recall we have anywhere from three to eight grids so like, like sampling grids in a site. So we're this function allows us to calculate at the grid level, or it will give us at the site level by summarizing using an average right. So in this particular case it's just the, the, the number at that particular plot. And then if we want to take across all the plots and summarize with the average we can do that as well with that function so that's what's happening there. What I'm going to do is feed that function I'm going to create a new object called MNKA, and I'm going to feed the function that cool data frame that we made and just take a peek at what we see there, which is just the mean number of individuals on this event ID right at Kanza this particular site in Kansas in 2021 and then 27 is just the week of the sampling date right so that's how many we saw there. So that's a basic abundance thing but we might be more interested in looking at perhaps a particular species, you know and looking at just the minimum number of that particular species and so we can do that as well and again I'm going to just pop over to this tutorial and grab the code chunk from the next black coding box and paste it in here. What we're, what I thought would be interesting to look at is per mescistocopus so that's an attack sound that's found at each of these plots at each of these sites that we're looking at, and it's going to be there every year and very in abundance and I thought it might be fun to kind of take a peek at what that figure would look like, looking at the abundance for that species. So the way I do that to calculate the minimum number known alive by species is just to generate a unique species list so this is all the individual species that are found in our data set. So you can see there's what 1517 different species right, we're going to loop through each one of those and just run that minimum number known alive. So this is a function by site right using that particular subset of the taxonomic data right so. So this species list is each individual species, we're going to take, you know species I and filter it down to that tax on and then just calculate the minimum number known alive for that particular tax on so that's kind of what we're doing there. So I'm going to run that for loop. Oh, I didn't know all of it sorry. Great. So I'll take a second. Um, if we wanted to look at an abundance figure of just that perimuscus leucopus so this MNK a by species is going to have the minimum number known alive by species and site. Right and so we're going to just filter that particular data frame down to just the ones with the perimuscus leucopus. So that's what we're looking at here. And in order to plot it we need to end up with a data frame that has the dates of the sampling events. So ultimately what I wanted to look at was the, the abundance of this perimuscus leucopus across time. So in order to do that we need to get the collect dates, which we were sampling so that we have the x axis to plot there so that's what this date DF data frame is doing. It's just generating the collect dates for each event ID. An event ID is going to be about a sampling so at those pathogen grids there's going to be three different nights, and we just want the minimum collect dates so that we're distilling it down to a single night. Because otherwise, we would have three dates for a single event ID and we just want one. So the date data frame is really just a list of event IDs and dates. We are going to join those dates with the minimum number known alive for the perimuscus. So I'm going to just show you what that looks like there. We have an event ID. We have a date. These are only for perimuscus leucopus and we have the average minimum number known alive for that particular tax on and then we can go ahead and plot that and take a look at what the variation in abundance looks like from year to year and across different sites for that particular tax on. So this is kind of a, this is sort of the fun part is taking a peek at that figure. So that's what we end up getting so the dates are here on the x axis, and the mean minimum number known alive for perimuscus leucopus at the three different sites is here. Oh dear. There's 2021 in the kind of orange color and 2022 in the blue color. And it seems like across most of the sites 2022 seem to have a higher general abundance of this taxa then in 2021. It's an interesting way to kind of take a huge amount of data and distill it down into an abundance metric for one particular tax on that you can compare across all these different sites so that's one of the really neat things about neon data is being able to look at cross comparisons across major spatial and temporal scales. So that is, is that section just kind of looking at the abundance. So if anybody has any questions feel free to pop in otherwise I'll just keep keep trucking along. All right. So next time I can just keep scrolling down. And I'm going to go to looking at the maximum abundance at these sites of all the different taxa. And my reasoning for doing this was just to kind of get a sense of what the diversity might look like. Sorry. At a site. So I'm creating a new data frame here called tax data where I'm now grouping by the tax on ID in the site ID and picking the maximum of this mean minimum number don't alive. So that is that data frame. Again, this is just a given tax on ID at a given site in the maximum number we ever observed overall the two year time span of the data that we've downloaded. And so that's just, you know, just another thing you can plot to kind of take a peek and compare across sites and so that is going to be here and again and gg plot we're just kind of putting in the x axis is the tax on ID and the y axis is the maximum abundance metric that we're using. And then this facet wrap is just making it create three separate plots for the three different site IDs. So that's that plot. So you can kind of take a look at the compare the diversity across the different sites it's relatively similar. Maryland looks like it maybe has slightly less diversity and probably is more dominated by these two particular species than those other two sites. For those folks of you who are interested in disease ecology a lot of times kind of the diversity of small mammals at a site can be linked with some of these tick born pathogens and really unique and interesting ways and so that was kind of one of the things that we were doing here with this tutorial is to kind of take a peek at some of the diversity metrics and look at that in relation to the pathogen data so that is what I had for visualizing abundance metrics for the small mammal trapping data and next up is taking a peek at the pathogen data. And I see we you know, I we might go a little past one o'clock in terms of the actual coding part but I should still be able to leave some time for questions at the end but feel free again to let us know in the chat if you're struggling to keep up or have questions Yeah, but again, to save time I am going to just keep scrolling down in the tutorial and grabbing these code chunks and running them separately, since we are running low on time now. This was kind of a media tutorial I guess in terms of content so the next thing we want to do is just go back to that process of downloading data so the rodent pathogen data are a separate data product from the small mammal trapping data. So again we go back to the school load by product feature and set in that data product ID instead of that 10072 which is for the small mammal trapping data. This one is 10064. And interestingly, almost always, or I suppose for observational sciences a lot of times they end in a dot 001. But this is actually the second iteration of this product because we used to do passing data on haunted viruses up until 2020 and then we switched over to tick porn disease to better match with the tick vector data that we're collecting. And so that's, that's why that's a dot 002 it's the second iteration of this particular data product where we're looking at tick porn pathogens instead of the haunted viruses. Again, we're just going to use the same sites, looking at basic data, and that looks like it's already downloaded. So if we did that names thing we're going to end up with the same contents for this RPT that again it's a list of a bunch of items. And so we're we want to transfer those that list of items into our environment as data frames so that we can access them more readily. So that's what this list to environment code does. Again, we're going to do the same best practices of removing duplicates and things like that. And then we'll run another remove dupes function for this new data set. And there's no duplicated value so that's wonderful. And we can just proceed with the data. So the next section of this is, I will show you basically what the the rodent pathogen results look like here. And basically you've got you know your site your domain. The plot. The sample ID so this is going to be the this is the tag ID of the individual from which the sample came. And then here this dot B is for a blood sample and the dot E is for an ear sample. So without going into too much gory detail the ears are better targeted for Brelia burgdorferi which is Lyme disease. And then a lot of these other pathogens are better targeted from the blood and so that's why we said both for pathogen testing. And we differentiate them with the dot B or the dot E and so that's going to become important later when I'm going through the code. And then, you know, these are the pathogens that we're testing for and the test result here in this column. What you'll notice is conspicuously missing is any kind of taxonomic information about the individuals from which the samples came. And the reason for that is because that's all in the mammal for trap night table. And so again we have to do that table joining in the particular case of mammals, or of the of the mammal pathogen data. And the sample IDs because they come from ear samples and blood samples, and those are in two different columns and the mammal trapping data that lovely joint table neon function that works for so so many of our data products doesn't work here and so that's another reason we thought this tutorial would be helpful for folks to be able to combine the mammal trapping data with the road and pathogen data by using these code chunks here so I'm again just going to copy this code chunk. I'm going to use this data joint together. So unfortunately we can't use the joint table neon. So we are going to separate out into blood and ear samples, and then join them separately and then bind them back together. I'm seeing some stuff in the chat so. Great. Alright. Oh great thanks Bridget for sending out the survey wonderful. Okay. So we are taking it again the de duplicated data right that we created here. We're selecting out those column names that we think are the most useful for this particular analysis of the plot ID collect a sample ID, the test pathogen name and the test result, and then creating a variable for site. And then we are going to do the same thing with the mammal per trap night data so again it's that de duplicated data, and we're just selecting out the most important columns that we need when we're joining the data. And again, all we're doing with the selection of columns is just reducing the data set to something that's more manageable and has only the columns we need for our analysis. So in this case, we really, we only want to know the species that the animal came from. And then, like I mentioned, there's two different columns for the blood sample ID or the year sample ID. So we need both of those if we're going to join these two together. The next thing that we need to do is split up the road and pathogen data into the different sample types. And so the way we do that is we take this rodent pathogen data frame, and filter it out. As I showed you that sample ID is going to end in a dot e if it's an ear sample. So I created a data frame for just the rodent pathogen your data. So I'm going to run that. Oops. Maybe I didn't run a whole bunch. Sorry. Getting out of myself. Oh, it should be there. Sorry. Sorry, I'm just trying to figure out why this isn't. Oh, I guess I forgot to add. Sorry. So maybe that's the. Okay, yeah, so it appears that when I was typing in and not typing directly from the tutorial, I forgot an underscore. And so then when I started copying in from the tutorial instead of typing in directly, that underscore was creating a problem. So, but yeah, this is basically just a selected out smaller number of columns from the mammal protrap night deduplicated data set. So that's what we're dealing with there. And then we're going to run those two. And now we've basically got the rodent pathogen data split into the ones that have sample IDs for years, and the ones that have sample IDs for blood. And then we're going to create a dot j data frame for each of those where we're left joining the ear data frame with the mammal trapping data by that your sample ID column and the mammal trapping data, and then we're going to do the same things for the sample ID. So, I'm going to run both of those. And then to bind them all back together right because now we have, you know, the tax on ID now added to the rodent pathogen your data in this data frame in the tax on ID bound to the rodent pathogen blood data in this data frame. So now we just want to our bind those together so we're just joining those two data sets when we run that. And the last column the names didn't match. So that's why this we have this minus eight. It's getting rid of that eighth column there. Okay, so that gives us a lovely data sets that now has tax on ID right so all that convoluted data. The management was basically just so that we can get this tax on ID column here so that we can start looking at some of the pathogen data across the different taxa. And, and check that out. So, again, popping back over to the tutorial. I'm going to copy these so that we can take a look at the pathogen prevalence is of the different pathogens across the different sites. So, just copying in the next code chunk here. And a lot of times when you're looking at pathogen data prevalence is kind of the things that's the percent of positive infections in in the particular groups that you're looking at. And a super common metric that you would want to calculate for these disease data and so that I'm going to show you how to do that here. We're taking that rodent pathogen data set that we just created, and we're grouping it across sites. And then the different pathogens that we looked at in the different taxa of the individual mammals from which the samples came. And we are summarizing. So we're creating summary columns where we have the total tested which of course is going to be the denominator. And then the total positive which is of course going to be the numerator. And then we are creating with this mutate function, the prevalence column which is going to be the total positive divided by the total tested so this is going to give us a nice little prevalence. Data frame here for a given site for a given test pathogen for a given taxa. So you can scroll down through that and look at all the different small mammals that were tested from all the different sites in our data set and the prevalence of the pathogens there. And then of course data frames are nice but graphs are even nicer and so we're going to plot out this looking at this prevalence data frame, where the x axis is going to be the pathogen that we tested for in the y axis is going to be prevalence. And then we're going to separate out by the different sites. Okay, so this is this first particular plot is just looking across the different sites at the prevalence grouping all of the treating all the different mammal taxa that we tested samples from as sort of one big group. But we can also split it out by tax if we're interested in that and so I'll show you that will be the last thing that we talk about today. But you can kind of take a peek these are the different pathogens we're testing for. It looks like scbi which is that site in Maryland kind of has the highest diversity of pathogens. And the phagocytophilum has the highest prevalence. Wow, it's over 60% of the mammal samples that we tested tested positive for that pathogen. Brella burgdorferi sensu lotto is the pathogen causing Lyme about nearly 20% of the rodents at that site are positive for that. And then again there's this brilliant memo to I at that site so given that scbi is kind of an interesting looking site I thought we could drill down a little bit more detail and look at the prevalence within each of the different mammal taxa that we tested samples from at that site so that's the last bit of this tutorial. So we're going to scroll down, pop this in here. And sorry. Alright, so we're going to take the prevalence data set and filter it down to just that one site that we're interested in, and now make a plot where we're wrapping instead of creating different panels for the different sites we're creating different panels for the different taxa of mammals that we looked at some of the zoom in there. And, so you can see that we've got two different taxonomic identities here. And it looks like most of the diversity of pathogens is coming from this premise because the copis, and then a lot of that ana plasma is coming from this other species here so. So yeah, that is that's kind of the, the nitty gritty of working with the data. So the sort of take homes from this are going to be some of the code for downloading the data directly into your thing and understanding where the documentation comes from. And as I sort of mentioned the sort of take homes from this particular analysis or just this idea that the diversity of hosts at a site might be sort of linked with the pathogen prevalence that we see at that site. So there's kind of a way of looking at both of those things in one and of course there's lots and lots of other abiotic and biotic variables that can mediate that interaction so of course I'm not going to make any major conclusions about diversity and pathogens from this exercise but just kind of an interesting, you know subset of the data to take a peek at so with that we're just a little bit over but I would love to take questions. And answer, you know, any, anything folks might want to ask about.