 Yeah, so I'm going to talk to you. I am a wildlife ecologist. I'm a data producer and I Created this data pipeline to move my data through in in a way that I can use it at a later time and after we built the thing we said Damn, this is great. This should be used by other people This can be a way to bring together other projects And we've been working I'd say the last year and a half on trying to make this thing suitable to Network different camera trapping projects together, so I'm going to explain to you why we built it how we use it and How we're now trying to evolve it into something else Okay, so let's get that going and I know there's a different background here So some people know everything there is no by camera trapping some people know everything here is about data We're gonna have to cover all those things in order to get through this talk And okay, so camera traps. This is what the camera traps look like camera traps are really not for scientists Camera traps are for the general public. There are now more camera traps in the US than there are binoculars There are amazing number of these things out there bought by hunters and the hunting community to See where the buck is see where the turkey is see where their wildlife is camera traps are relatively new development for researchers I spend most of my time overseas doing surveys and I don't know how much time we spend looking at some tracker sign Trying to figure out with that picture with that with that thing is and it's relatively easy when it's a well-defined trap trap but something like this I have spent so many hours looking at crap on the ground and Four people arguing over what that crap belongs to and it's just the most dominant voice in the group you get to decide That's a male civet who was heading west at the time that it dropped this thing and and then you write that down Because there's no other way to do this thing and that is so frustrating. Yes, you can use genetics But it's not timely enough and it's too expensive for most of these reserves Usually you're there looking at this thing trying to figure out what it is in this case. It's a red fox Dropping but these camera traps can work 24-7 they can sit out there and they can give you some photographic evidence that usually you don't argue about This is what the animal is. This was the time. It was there. This is the date. It was there with that scat You have no idea where that thing was dropped yesterday or dropped two years ago in some environments How these camera traps work for those of you who don't know it's a it's a it's a heat Motion sensor in that it's a hot body moving in front of that camera So it has to be hot and it has to be moving so grass doesn't do it a Stationary object doesn't do it. It has to be something that's moving and in this case You'd be detecting this animal up in front You wouldn't be detecting that animal in the corner because it doesn't hit the The rays are almost like fingers coming out from that camera sensor and once the camera sensor detects something Sends a message to the camera unit It just has an autofocus and then usually keeps shooting pictures as long as something is moving in front of the camera These cameras really replace I work for the Smithsonian The Smithsonian is haul after haul after haul of these kind of cabinets full of these kind of animals This is amazingly useful when you talk about biodiversity What did biodiversity used to be in certain locations? And it's because you have the specimens and you have the metadata that goes with it You have the time the date the place who collected it all this information you need to make that data relevant The problem is the as a member of the Smithsonian I can tell you these are almost impossible to collect anymore you can do it in California or Virginia But try to go to Brazil and get a sample out of Brazil you're gonna be an old person before that happens I have ticks. I've been trying to get out of China for 18 months now ticks Who cares about ticks you can't get them out of the country because it's a biological resource that they're not gonna let out We that's not the case with these cameras. I can take camera pictures I can put them in my pocket and bring them home and nobody cares about it But you can take a picture like this and you can add the metadata to it And it's no different than a museum specimen well now no different you get about 80% of what you need You don't have the genetic material you don't have the morphological measurements But you do have this animal was here on this date at this time You can get behavior if you get a sequence of pictures You can get a lot of information that you can then bring home with you and do your mammal distribution studies your Your chain your occupancy detection things you can do a lot of science with this That you used to do with these museum specimens that you can't collect anymore And you can do it over a much larger scale usually So these camera traps are the biodiversity Data of the future for a certain suite of mammals. It doesn't work for all mammals It doesn't work in all circumstances But it has a much broader range of utility than a lot of the specimen collecting that we used to do I Originally started working on this stuff in Virginia, but pretty quickly I Exported to China where I was I was kind of required to do some biodiversity surveys. So starting in 2000 We went to a lot of reserves and did a lot of biodiversity surveys primarily for giant pandas So you can take you can take a map of a section of this is a five county region six county region in the giant panda range We can see where we detected giant pandas using those images so that we don't have to worry about sign Misidentification and say this is where the pandas are and this is where the pandas are not That that there's a lot of other species that come along You just can't set the cameras to take pictures of only pandas You get everything else out there that's with the pandas and with that you can do a relative abundance Estimates you can do occupancy things you can do a lot of science that goes along with those camera data Even for species that you weren't really looking at in the first place A lot of that has now gone in the last few years We've gone to more corridor mapping looking at corridors between two reserves These camera these these boxes are camera arrays that are set up between two national parks that are supposed to connect with pandas and and What animals are detected in that corridor area and what are the covariates that they're detected with? And we can use that stuff to create a species distribution model for the pandas. That's what's in green We can there's no corridors between those two parks It's all messed up by roads and houses even though it's on paper. We can tell a computer to force there to be Corridors between that those two parks and then look at what policy decisions could actually make a corridor Should we be replanting bamboo should we remove the houses should remove the roads and this over here on the right hand side Is just testing the resistance surface under these different management policies I think one more of these things. Oh This is just doing the same thing not just for pandas One this is across the whole range of the chinling mountains. This is looking at corridors across that area That's what the bottom map is and then we can do that for all the other mammals that we capture in those pictures So again, it gives you a much broader spectrum of animals you can look at and Lastly now the the statistical stuff has been developed So you don't only do species distribution models, but you can do co-occurrence models You can look at the probability of detecting one species based on the detection of another species in this case We were looking at the impact of these four mammals on giant pandas The Chinese are very worried about talking talking is that large goat-like thing on the left upper left That animal eats bamboo and these reserves are set up to save pandas not talking And they don't like these talking because they feel they're competing for bamboo But if you do the co-occurrence matrices, you can see here, there's no difference That's a probably of occupancy for pandas given the presence or absence of talking. There's no difference. They don't impact These these giant pandas at all wild boar don't impact child pandas at all But what does impact is domestic cattle you find a much lower probability of occurrence if they allow cattle into those parks And that is most of the parks do allow cattle So they should be a lot more worried about cattle than they should about talking and a synergy thing was was asiatic black bears which are Probably the most hated species in that region of the world because they do a lot of crop damage The only place you find asiatic black bears are where somebody's protecting that the pandas Where there's good panda protection you find black bears where there's no panda protection There's pretty much no black bears. So that's a positive relationship between the two. So all this is just to say We can do good science with camera data But the data is very useful because you presence and absence data with presence and absence data You can do a lot of occupancy modeling that allows you to get pretty far down the line While I was doing the China stuff I said if I can train a Reserve staff in China with a sixth grade education who doesn't even speak my language to do this camera trapping I Can do a guy in Virginia a woman in West Virginia somebody in North Carolina I can use citizen scientists and train them just as easily to do all the stuff I'm having these reserve staff do so we tested that out on the Appalachian Trail We took a thousand kilometers of the Appalachian Trail We went to the the there are trail clubs assigned to each section of trail Went to each of the trail clubs said would you put cameras out for us? If we give you the coordinates in the camera We'll give you a half a day training and you move the cameras along the trail to these coordinates and feed the data back to us Every trail club said yes every trail club had volunteers We retain these volunteer about 95 percent retain retention rate of those volunteers over the course of two or three years They love this project and they fed us all this data for free Which is great, too In that case what we has a website where we put in the metadata And we had a they make a folder of their images and put it in a drop box and we would download those images That part of the project was a nightmare It was impossible to keep track of those thousands of images in those folders with that metadata connecting it up together and using it in any Meaningful way and I said if I'm gonna do this again the citizen science part is great This data organization is crap We got to improve that part of this study and not the citizen science part. So we got ourselves a So I said the citizen science part. We're really following a bird watcher model Look at everything that's being done with bird data right now any of you who've been to the e-bird site Their ability to look at the distribution of birds the change in bird distribution over time the impacts of climate change on birds They're doing that because they have a hundred thousand volunteers collecting bird data across North America and feeding that into a single site It's the best example of citizen science data hands down we have in this country Can't we do that with mammals? I like bird people, but I like mammals better We could do this with mammals the one thing bird people had that mammal people didn't have is Binoculars having a tool that allows you to verify what it is You're hearing really made that whole thing possible these camera traps are the tool that allows The mammal watcher to record what they saw in a way if they can attach metadata to it It becomes just like the e-bird model and we could build statewide efforts Countrywide efforts where we're feeding all that data into a similar place and we can look at right now We look at no mammals across their entire distribution with the exception of something like giant pandas That's it everything else We're always looking at a little piece of their distribution a little piece of their habitat Because nobody's able to bring together data across the entire range of a species the way we can with birds So this is just I think this is all the stuff I've just said what what it what it needs to be like birds is we need a centralized data management We need to be able to standardize that data across projects We can't have one person calling something one thing and somebody calling it something else We can't have the files formatted differently Everything has to be standardized We have to be able to share between projects and we have to have that data accessible to the public I'm a big believer in I'm being paid by the federal government to collect data That data should be accessible to public and used by the public more than it's used by me so we created this thing called e-mammal this is the data flow pipeline and It starts out here With the researchers or the citizen scientists they use the camera traps we have a desktop app where it allows them to sort and Tag that data relatively easily They push send the data goes up to a cloud site at the cloud site it undergoes expert review So some expert looks at it says yes, that's a deer. Yes, that's a squirrel. Yes, that's a bobcat. Yes, the dates look good Yes, it looks like the cameras set up right then they approve it It goes into the Smithsonian digital repository where it's curated they make backup copies They do all that kind of stuff with it and then we feed it out To the e-mammal website where the volunteers can see their own data And they can see everybody else's data and in addition the researcher can download their data But allows the complete feedback of that data and for the website also allows the researcher to communicate with the citizen scientists so we have a blog post and a discussion board and Training videos things that help those volunteers along and keep them connected You guys are data people which I am not I'm just showing you some more of the detail of how these Images are moved along and how the metadata is moved along with it has to be Packaged up and unpackaged for each one of these stages and it ends up inside this fedora Repository here and then back out to the Drupal site which is a e-mammal We spent several days several long discussions on this shared metadata structure And we had to get buy-in from most of the major camera trapping organizations So WCS and CI are also agreeing with the Smithsonian on this is going to be our metadata structure So when we do projects we can share across the projects. We have projects sub projects Deployments deployment is when you set a camera someplace for a amount of time Inside the deployment you have images the images are grouped into sequences So if an animal stands in front of the camera for a period of time all those images get linked into a sequence And the sequences compose the images and there's observations metadata attached to each stage along the way So you could have this the project could be California and each of the sub projects could be different reserves or The project could be an individual reserve and each of the sub projects could be Different projects inside the reserve so one arm was looking for bobcats One arm was setting the things for squirrels on trees somebody was setting cameras on trails all those different sub projects Can go into the same database because you're capturing enough of the metadata To be able to just do covariates on what the project was actually about but they all fit inside the same project structure Okay So we we built this thing with NSF money, and we used it for a project on the east coast We we gave them a three-hour training We had more than 500 volunteers. We gave them all a camera. We gave them coordinates to go to it's like geocaching In reverse they have to go to the coordinates collect the data Bring us the data go to the next coordinates go to the next coordinates on their list and when they finish their list We give them a new list of coordinates to go to Across six states four million images and metadata. This was all organized with two technicians Delt with all the volunteers and all the data. This is what the desktop app looks like This is a deployment. This is Tavis's deployment. He was assigned this deployment He downloads the data on the computer and he has 16 sequences He's now looking at sequence number two. You see the blue box So that's sequence two shown across the bottom and he's highlighting this picture on sequence two So he can look at it automatically groups all our data together Attaches a position to it attaches a date and a time and then what he does is has a Has a drop-down menu. He clicks on what the species is so there's no misspelling a species and then he adds the number for that species and then when he pushes When he pushes this upload it goes from his computer up to the cloud site where then it can get the expert review You're able to use this star on the bottom to make a favored photo So six million images 5.9 of them are terrible But there's like a hundred thousand gems in there You want to know what the hundred thousand gems are so that when someone asks show me your best bobcat picture you're not Looking through the stack your look you're just going to the the favorite images and bam Those are your best ones and then we have this bounding box the bounding box allows us to if you have a dark Some of those sequences up there are very dark Maybe you can't see what the animal is if you push the bounding box It just looks for change detection in that sequence and says this is where the change is so it's put a box around There's a squirrel inside that box a volunteer may not see that squirrel But that bounding box if they say there's nothing in the image We ask them to use the bounding box to have the computer look and see if they can see anything inside the image Then this goes up to the expert review tool this is it here it allows you to track where you are So this is what a PI would use or that the technician in charge of the volunteers They have some board up on top that tells them how many deployments they set out How many have been reviewed so far how many are still left to reviewed what deployments have been rejected? They can track all that stuff across the top Here's their latest favorite images down below they can unfavorit something if they don't think it should be a favorite Like why is this a favorite image? You know who knows this kid probably can't be a favorite image We got to get rid of that kid picture so you can unfavorit those things and down on the side here You know most of my experts are actually graduate students. I say go do this do this for five hours each day or something But I could be I have some real species that are hard to tell about I Can assign an outside person as an expert they can come in and only look at a single species So this person could look at the coyotes the 67 coyote sequences and just ID those coyote sequences and then get out of the system and The graduate student can continue with deer deer deer deer Then it goes into repository. I know nothing about this repository except that it organizes all my data I anybody who goes in there usually gets lost but You come out at the end of the day These are the number of projects we have both in country and out of country that are in e-mammal The out of country stuff we've been doing just demonstration projects So they're not really a lot of data, but there's a lot of data inside the country for about three million images in December The website the e-mammal website this is what it looks like so it has a bunch of options for you You can view photographs you can explore projects you can browse the data you can volunteer for the projects Each project has its own home page You can go to its home page and find out about that project and volunteer for just that project Or you can send a generic e-mail that probably we will never answer about volunteering for anything in the world There's a carousel that flips through the favored photographs. There is the there's the blog post latest blog post entries It it works to communicate with the volunteers and communicate with the public what data is being collected for them This is an example of what the date browse data page looks like it Just you have you have maps from an s-re server that you can you we've zoomed in here on the Washington area And each one of those numbers represent the number of deployments in that area not the number of sequences or pictures But for example in in DC we have a hundred and fifty one deployments So those are kept separate camera settings in DC and if you zoom in on that image You can see the individual points eventually you can click on individual point and see what the data is for that individual point Things that are important to people if researchers are going to use this a lot of them want an embargo They don't want the public seeing their data as soon as they see their data So you're allowed to embargo your project so that the images come up to be seen But the metadata going with the images Stay secret for a certain period of time until you're done with your master's thesis or your PhD thesis or whatever Eventually though this is a federal site so all your data has to become public you can't keep an embargoed forever With the exception of endangered species right now we're sitting with if it's an endangered species You will never know the location of that species. You'll know the date. It was captured the time It was captured, but as the public you won't know the location. The location is assigned to a central point in each project That's the way we do it right now and you can you can add to that list So if you say we have some project. That's very sensitive about Whitetail deer, they don't want to let other people know where the bucks are the big bucks are so all the bucks get assigned to the headquarters Because they've made whitetail deer an endangered species for their project You have to sign a data provider agreement if you put data into the system You have to sign a data user agreement if you take data out the data user agreement said hey, you didn't collect this data Somebody else did you confine the PI is listed for the project if you're going to use this data in a scientific way You should credit the PI but we can't force that to happen But we can say this is this is what you're aware of this before you download this data 90% of the download is sixth grade kids seventh grade kids. They're not using the data for Publications or the volunteers are downloading the data to see their best picture That's what most of the use is but they have to sign that data user agreement also Just to give you an example here's here's the east coast again. We are east coast people so we have a lot east coast data We can zoom in on this area and what I've done is taken to two data sets I had to do this by hand but I took a Virginia data set which has core forests in Virginia That's what all the green is and I took three different email projects and just did a data search that just blended them together And what I'm showing you were all the camera trap locations in the DC area The ones in red are the ones that had bobcats detected at the site The ones in blue are the ones that had coyotes detected at the site And the yellow sites are sites that had neither species detected And you can see that bobcats are pretty much confined to some of the bigger forest blocks And don't get very far into the metro area and coyotes can go all the way into downtown DC You can find them in Rock Creek Park I can only do that because I can pull different projects together by having a shared metadata structure And having it sit in a place that it can all be used That's all the data use and we built this as individual scientists and we think other scientists will use it And citizen scientists find it helpful But actually most of my money in the last year and a half has come through education tools It seems to be most desired by school systems rather than by researchers Well we didn't open it up to researchers until this winter But school systems are very interested in the data As a way to do STEM learning Some school systems are actually designing their own projects putting out their cameras I don't care who submits data to me If you tell me where your protocols are I will take your data and put it in this system You have to have an expert reviewer just like everybody has to have an expert reviewer But you can be a sixth grade class and you can contribute data to the Smithsonian website If you follow, you know, if you have protocols that are set and you have an expert review of your data And that's good, that really allows the schools to get involved Most of the schools are not collecting their own data It seems to be almost impossible for a school system to go out in the woods and put up a camera right now The legal issues are just too much for most school systems They can put a camera in the schoolyard But once you get out of the schoolyard gets here What we've come around to is in Florida we have money for 25 schools all to have a volunteer team It's almost all made up of parents The parents move the cameras around And the schools compete against each other on who's getting what pictures based on the parent activity The schools use the data in their curriculum So it's data from their county collected by their parents And they can compare it to other projects in the Smithsonian This is just an example of what they can do They can pick a project, they can pick a sub-project And they have a bunch of data output down at the bottom I don't think too many researchers would use this but I think this is primarily a teaching tool for the kids Essentially you parse your data, you send it to a sandbox where there's R scripts The R scripts generate a JSON that comes back with some graph of whatever your parse data shows So in this case we're looking at coyotes, we draw a box around an area We captured 341 detections of coyotes We can parse that down further by looking just at a specific time range Of all the data that's available And then in this case we're graphing the activity of these coyotes versus the activity of bobcats And it's just to show the students that in this case the bobcats which are the solid line Are much more active during the daytime than the coyotes are So there are simple analysis things They can make hypotheses based on looking at the pictures And then they can test them out with the data So EMAML is trying to sit in a spot where it's useful for scientists in their individual projects It's useful as a citizen science tool It's useful as warehousing and archiving biodiversity data which can never be collected again You can't go back 10 years and get data that wasn't saved 10 years ago There has to be some way to put data in a place where people can access it a decade from now or two decades from now And lastly it can be used by the public in whatever way they want to do it Do they want to use it as a teaching tool? Do they want to use it for their own little policy decision in their county? You want to see where the bobcats are in your county? You can drill down if you're living in the right area And you can just be Joe Q. from the public See where the bobcats are and see where the bobcats are based on data collections that were done by other people at other times I'm perfectly fine with that use of the data They have to undergo the same caveats that all of us do Someone will complain about it or not complain about it depending on how they use it But the data is for them and they should be able to see their data and access their data So I built this thing as a wildlife ecologist who needed some way to organize and use my camera trapping I am not nearly using the capacity of putting out this data or assembling this data Or making use of the potential of this data And one reason for speaking to a group of you is earlier I spoke to the field station staff from University of California To start doing camera trapping at all the field stations and use e-mammals so they can coordinate across field stations I also need a center like you to help me with how to use this data, how to display the data How to use the outreach capacity of this data in a way that is beyond my capacity Let's put it that way I know this is a good idea, I just have no idea how to implement that good idea And the rest of this is just telling you the great news And I don't know if I have to spend a lot of time on this We have management tools, we can capture the metadata You can manage thousands of photographs in a project We have standard data across the projects We can long term data store this data and you can easily get access to this data The bad news is that right now this thing is costing me money That data pipeline, it has to move through the Amazon cloud And be put in three different buckets as it moves through Every time it moves from one bucket to the next and every time somebody packs it or unpacks it It costs me money And right now the costs are about $3.20 per deployment month Deployment month is if you leave a camera out for a month, you get about 400 or 500 images So it's processing those 400 or 500 images through the pipeline So right now I have to charge outside organizations for those Amazon costs And that's a downer for some I was talking to Kevin and Michelle, I'm willing to trade Amazon costs for programming skills That's a trade I'll make, maybe it's the right audience to do that in But once you get through the pipeline, the data storage is free And that dissemination through the website is free It's the moving it through the pipeline that right now is costing me money We're trying to work this thing at the local level and at an organizational level So we've signed agreements with WCI and CI about the metadata structure And the sharing of data CI is kind of broken off by itself But most of us are still together with this is the way we want to display our data Future things we're working on All we have the capacity right now is to work on those data downloads Put them in packages that people want to use like presence or unmarked R package So they can do this occupancy analysis We are going to move this data through Zooniverse or give a project the capacity to move it through Zooniverse Zooniverse is a crowd sourcing platform where 10 people look at your image and say what it is So rather than my graduate student going dear, dear, dear, dear Let 10 people over the country look at it and say dear If they all say dear, it goes into repository If six say dear, three say raccoon One says possum, that's an image I should look at And that will save us a lot of time By fitting down like that And lastly we're probably by the end of this week Well this is the end of the week Let's say by the end of next week EMAML Academy where we've created now online videos for training all the parts of the system And we have curriculum from Virginia and Florida We put that curriculum up there It would be great to have California curriculum But it would be a place where you can put curriculum that's been developed to use EMAML And school systems can take more advantage of it So that's the system we have right now I think it has a lot of capacity I definitely need help with it But I think it's the best idea I ever had And I'm trying to find a way to make it sustainable is the main part Alright, thank you