 Good afternoon and welcome back to our committee on Power and the Way to Continental Scale Biology. My name is Ines Ibañez and I am a professor at the University of Michigan and we are going to have today our panel for tools, technology and research techniques, top-down approaches. We have four speakers, Dr. Charuleka Barada-Rayan from the Lawrence Berkeley National Laboratory, Dr. John Barger from Pacific Northwest National Laboratory, Dr. Rachel Baxon from Carleton University and Dr. Sara Huber from the University of Minnesota. We are going to have the four presentations first and a question and answer session right after. Our first speaker is Dr. Charuleka Barada-Rayan. She is a biogeochemist and data scientist at the Lawrence Berkeley National Laboratory. Her interdisciplinary research spans a broad range of topics including the impacts of human and natural disturbances on water resources, methane cycling, environmental impacts of carbon sequestration and fossil fuel production, bio remediation and AI and machine learning. For all our panelists full bias please refer to our agenda. And with that we would like to invite Dr. Barada-Rayan to begin with her presentation. Great, thank you for that introduction. Are you able to see my screen? Yes. Great, thanks for that introduction and inviting me to present. In today's talk I'll be describing our investigation of watershed response to disturbance using data-driven methods. This research is funded by the Department of Energy's Environmental System Science Program as part of the DOE Early Career Award and the Watershed Function Science Focus Area Program and is also supported through a DOE graduate student fellowship. I want to acknowledge all my co-authors, the funding from DOE and the supercomputing resources from DOE's nurse facility and the University of Minnesota. Watered is a fundamental substance needed for life on this planet to thrive but unfortunately our water resources are being stressed by climate change and extreme events such as heatwaves and droughts which have consequences for the biology that depends on them. Our research investigates the impact of climate disturbances such as extreme rainfall, droughts, and heatwaves on water resources. Watersheds, which are the basic units for managing water resources, have many functions. These include physical and chemical functions such as stream flows and water quality which is the focus of this talk, as well as biological functions such as forest health and biodiversity. Our goal is to identify how diverse watersheds with different traits respond differently to disturbance. We're doing this across a range of spatial steels from hill slopes to watersheds through hydrologic basins and the continental United States. Some of the questions we ask include how do we quantify a watershed response to disturbance, how do the properties of watersheds influence how they respond to disturbance, and how can we predict watershed functions when disturbances may occur anywhere at any time particularly in unmonitored basins. A key aspect of our approach to studying watershed function and their response to disturbance is our use of traits which is a concept that originates from biological sciences. Watershed traits are their properties such as their climate, topography, geology, vegetation, as well as the extent of land cover, land use, water management, and other human activities. These traits interact with each other and with external climate forcing to influence how watersheds function at different scales. For example, at global scales, climate patterns and regional constraints such as geology and topography are major factors influencing watershed function. As we zoom into finer scales we start to see the emergence of other traits that influence functions such as the extent of forest cover and human activities at landscape scales to plant soil and microbial traits at organism to post scales. We are in an era of big data and can now observe not just climate variables but also many critical watershed functions and traits with remote sensing at different spatial scales and monitoring networks such as those maintained by the USGS and NOAA. In the past decade data products that combine these different datasets have also become available such as Streamcat that provides information on hundreds of watershed traits that can be paired with streamflow and water quality data. So to utilize these datasets in our research we use a variety of methods that include a software tool based in 3D for integrating time series data across distributed sources to statistical analysis and network science to infer patterns in the data as well as information theory and machine learning to gain scientific understanding and make predictions. So next I'll present one example of how we use some of these methods to predict watershed functions at different scales. So this example is focused on predicting stream temperatures across the continental United States. Stream temperature is an important water quality parameter that can be affected by global warming as well as heat waves and drought. So our goal is to predict stream temperatures in unmoderated locations using data from moderate sites. So to do this we build and compare machine learning models that utilize data at multiple spatial scales. In the first approach which we refer to as the continental model we implemented a single machine learning model with trade information for all the monitored sites. And the second approach we group sites in different ways either regionally or by how similar their traits are and built a machine learning model for each group with relevant trade information. And in the last case which we refer to as the local approach we built a machine learning model for each monitored site individually and then use a transfer learning model to make predictions at unmoderated locations. So this local approach is comparable to a bottom up methodology that is commonly used in scientific studies where models are built for representative sites that have some measurements and then generalized based on different measures of watershed similarity. So we use different types of machine learning models for this study which were implemented by Jared Willard as part of his PhD at UMEN through a Dewey Fellowship and Helen Weierbach Research Associated Berkeley Lab. The first is a deep learning model called the Long Short-Term Memory Network and LSTM which is quite popular for time series predictions due to their ability to capture past system states. We used a model architecture that was developed a few years ago by Kratzer Adol for our stream flow predictions where they adapted this LSTM to take time series inputs such as climate forcing along with trade information which they refer to as static site attributes. In our implementation of the LSTM we used an approach from another study by Romani Adol with inputs of climate stream flow and 27 watershed traits that their domain experts selected as relevant for stream temperature predictions. But in our setup we used 787 sites that had at least 5 years of data for training the model and data from 580 sites for out-of-sample testing of the model results. So we refer to these as the unmonitored sites encodes. We also compare the LSTM with a classical machine learning model called XGBoost and a transfer learning approach that Jared had developed for his thesis where the knowledge gained from the source models from monitored sites are used to make predictions at unmonitored sites. An important step in this work is to determine how watersheds are similar to each other based on their traits. So for this we used the USGS gages 2 dataset that has data for over 300 traits for over 9,000 catchments. However with such a large dataset we can't blindly apply typical classification approaches because the data are high dimensional and there are many redundancies in the trade information. So my postdoc Fabio Chula developed a new methodology using network science where we built two parallel networks. So the first is a network of traits where we condensed these hundreds of traits into a few interpretable categories such as those related to agriculture or human development. We then built a parallel network that classifies the catchments as per their trade categories. Remarkably even though no geographic information was used in our analysis we find that watersheds with similar traits are generally located next to each other. The advantage of using this method is now we can both identify classes of watersheds that are similar as well as the traits that cost them to be grouped together. So in this example showing many watersheds in the rocky and cascade mountains of the western US we find that these watersheds generally have lower temperatures, higher elevation and more evergreen forests compared to other watersheds in the US which makes intuitive sense. So when we compare the continental grouped and local models our initial results indicate that the single continental model outperforms the other approaches to pretty extreme temperatures for most places in the unmonitored test set based on a few metrics that includes model accuracy and computational expense. Some of these results are published in Jared's PhD thesis and in a recent conference posted here but this is still work in progress and we're trying out a few model configurations to verify these results and using information theory methods to help select the most relevant traits for extreme temperature predictions. However what these results show is that the top down continental model of this machine learning model is remarkably able to learn from the diversity of data that is available across all the 787 sites where the inputs provided which includes the 27 different traits and this is consistent with other published typological literature so showing the ability of these machine learning models to learn from diverse data. I want to conclude with a key takeaway that our results so far indicate that building top down machine learning models that include as much data as is available are a potential approach for predicting watershed functions and their response to disturbance at large scales. However there are many open research questions that need to be addressed when using this approach that are outlined in further detail in a paper that's in review and will shortly be available in archive as a pre-print but some of these questions are you know how what traits have the most predictive power how do we configure these traits across spatial scales how do we link mechanistically these traits to function in process models and how do we incorporate dynamic traits in these models. So with that I want to thank you all for listening and happy to take any questions in the Q&A session. Thank you so we are now moving to our second presenter Dr. John Barger who is an environmental transformations and interaction science area leader at the Pacific Northwest National Laboratory. Dr. Barger's research interest focuses on molecular processes across various scales and for more than 25 years Dr. Barger has led projects pertaining to both molecular structure and system-scale research with respect to the behavior of essential metal micronutrients and metal contaminants in soils and natural waters. It is with a warm welcome that we invite Dr. Barger to speak. Thank you can you hear me okay? Great all right I'd like to share with you a project that's connecting microscale biological and biogeochemical data to macroscale or system behavior across the continental United States or CONUS. MSOL is a scientific user facility funded by the DOE, the Office of Biological and Environmental Research. We support user science focused on molecular and genomics control processes and how they drive earth systems behavior at the largest scales. While it is a top-down data generation project, MONAE also embraces hypothesis-driven research to support community leadership in cutting-edge science. I'll give a brief overview of the project which was launched just in February of this year and then talk about key challenges and innovations. One last point though before I go on but very important open science provides the opportunity to democratize access to world-class data and research facilities like we have at MSOL and doing so is a core value of the MONAE project. Current and next generation soil and climate models need high-quality standardized molecular and microscale soil property data but no such data currently exist. That is standardized and at large scales. To meet this need MONAE is building an open database of molecular and microscale soil properties to advance earth systems modeling at continental scales. But why soils? The answer is because soils are profoundly important in the context of climate change. They hold more carbon than the atmosphere and all above ground terrestrial biomass combined. They're vulnerable to climate change and they're in intimate contact with the atmosphere. MONAE data will be critical for understanding how soil organic matter is responding to and either mitigating or enhancing climate change at regional scales. Because soils are a nexus between the solid earth it's microbiome and the atmosphere MONAE necessarily brings together an array of scientific disciplines while also connecting diverse participants. MONAE data types provide data needed by soil organic models as exemplified by the MEMS and PROMIS frameworks but many other soil organic models as well. Molecular data types include high resolution soil organic matter composition which is unique to MONAE and metagenome sequences through a partnership with the Joint Genome Institute or JGI. MONAE is also unique in providing 3D XCT images of soil pore and root structures in this large database format along with hydraulic properties. We provide an additional 13 biogeochemical parameters in common with other soil ecological networks. These common parameters enable transfer ability of MONAE databases and sites. This map shows the locations of MONAE sampling sites in FY23. We're using a network of long-term research sites operated by partners such as NEON shown in green, BERSFAs, LTERs, and agricultural research stations as well as crowd-sourced sites across the U.S. All of which are shown in blue. This approach was inspired by the Wonders Open Science Project led by James Deegan at PNNL. So we launched the soil function call in just February this year and we had to close it already in mid-June earlier than anticipated when the number of applications shot past our capacity of 200 sampling events. We were more successful than we expected. We've approved 27 research proposals which translates to more than 400 cores, 600 replicates, and thousands of individual samples and splits. The panel asked for a list of key challenges which I'm providing here, but then I'm just going to move on to talk a little bit about open science. To achieve our scientific vision, we're taking an open science approach which embraces large numbers of participants providing the ability to solve truly large problems, but this is well known, but there are a lot of additional benefits. These include reducing barriers to participating in research, expanding access to premier capabilities, and democratizing access to data. For example, Monet's sample contributors automatically gain access to premiere methods such as EMSL's FTICR, mass spectrometry, and XCT capabilities. PL and Simone are illustrated on this slide. Soil cores are collected locally at paid-on and landscape scales using standardized collection kits. Standardized analysis workflows provide consistency reproducibility and enable interoperability. Raw and processed data will be available in a searchable and fair database with visualization and processing tools and containerized modeling applications. Last but definitely not least, we need to engage participants to contribute samples, otherwise we won't have anything. But we also need for this community to provide leadership, scientific leadership, and drive impact. Our data types and approaches while powerful are relatively new and unfamiliar to some, and this necessitates robust training and engagement. Our partnerships provide numerous benefits to the scientific community. The National Ecological Observatory Network or NEON provides cores from high-value sites with long-term monitoring of soil and atmospheric parameters. JGI provides metagenome sequences from soil cores processed through JGI annotation pipelines. These will be accessible from both Monet and JGI databases. Our partnership with the National Microbiome Data Collaborative or NMDC provides yet another path for people to discover Monet metadata. Examples of practices to scale our operations include the use of standardized core collection kits as shown here, automated processing of high-resolution mass spectrometry data, and installing and commissioning an automated soil analysis system this coming year. Please join us on November 7th and 8th for the Monet Community Science Meeting and scan the UPC code here if you'd like more information. So this is another set of information the panel requested about our approaches and innovation practices. I'll leave it there for your consideration and move on to my final slide to once again thank our sponsors, which is DOE, Office of Biological and Environmental Research, to acknowledge the many, many contributions of the Monet team and to thank you for the opportunity to participate. And with that I'll go ahead and stop sharing my slides. Thank you, especially thank you for keeping the presentations in time. Our next speaker is Dr Rachel Buxton. She is an assistant professor at the Institute of Environmental and Interdisciplinary Science and Department of Biology at Carleton University. Dr. Buxton leads a team that aims to generate knowledge to support and mobilize equitable conservation solutions. Her main research interests include soundscapes, seabird ecology, ecological restoration, and systematic conservation planet. We would like to invite Dr Buxton to speak. Great, thank you and thanks for including me in this really exciting conversation. I use acoustic monitoring at really large scales and something I've been asking myself more and more is how we can use these sorts of technologies to tackle some of, oh, am I still sharing my screen? Hi Rachel, sorry to interrupt. Will it be sharing your slides on our end? So please just let us know when to advance. Okay. Sorry, we had a bit of technical issues. Sure. There we go. It should be up now. Okay, is it okay that I'm seeing the notes as well? I'm seeing just the slides on my end. I think you have to present at the presenter view. Okay, that looks good. That should be better. Sorry about that, Rachel. Go ahead. No worries. So anyway, something I've been asking myself more and more is how we can use these sorts of technologies to answer some of the big conservation questions that we're facing today. Okay, next slide. So for a long time, biologists have used sound to look at species distributions and abundance. Can this make sense? Because in most ecosystems, a large variety of species produce sound and these sounds have a variety of different functions from navigation to foraging to mate attraction. So by passively monitoring these sounds, we can get a lot of information about these vocalizing species as well as the ecosystems in which they inhabit. Next slide. So passive acoustic monitoring is particularly powerful. First of all, because it's continuous, it can be for up to 24 hours a day. Also, it is synchronous. You can record at many different sites at the same time. Next. So what this allows for is monitoring of spatial temporal patterns in species vocal activity. And it captures everything, every sound that is made from bats, birds, anurans, cicadas, crickets. Next. And unlike when we count or having an observer present, it provides a permanent record. So different observers can go in and analyze the data for different purposes and answer different questions. It's also noninvasive. So we don't have to account for the presence of an observer or for the presence of a device on an animal. Next. Also, because of huge advancements in recording technology over the past few decades, acoustic recorders can now be placed on a landscape and record for long periods of time over enormous spatial scales. Just to give you an example, the National Park Service's Natural Sounds and Night Skies Division have been collecting recordings for the past two decades at over 490 sites across the United States. Next. Now, the real challenge with this volume of acoustic data is trying to extract biologically relevant data to try and perform some sort of rapid biodiversity assessment. There's no way that a human observer could extract information from this breadth of data. And so we played around with a few different methods. This is one of them where we use bioacoustic indexes. This is where we look at the variation in sound pressure level over frequency and time within these recordings and compare that with the diversity of species that are vocalizing within a recording. Next. So when we combine these different bioacoustic indexes in a predictive model, we actually find that they have a really strong predictive power in pulling out the number of species vocalizing within a recording. Next. Another method of extracting biological information is through recognizer algorithms. And this sort of technology is advancing at a blistering pace. So this is the Burnett algorithm that is produced by Cornell University. They can now produce this recognizer for thousands of North American species with very high accuracy. Next. So given these advancements in not only recording technology, but also algorithms and different analytical methods of going through recordings, we can answer a whole bunch of really exciting questions at a large scale. Not only questions about species and their biology, but also about conservation. So things like studying species behavior, song dialects, discovering new species, but also looking at the effects of things like anthropogenic noise, land use change, looking at the outcomes of management decisions, and more interdisciplinary questions like the relationships between soundscapes and human health. Next. Now I've used acoustic monitoring in a number of different ways in my own research to look at conservation questions, but I wanted to walk you through a couple examples. Next. The first is looking at the phenology of songbirds in Alaska. Next. So we all know that climate change is causing songbirds to migrate later in the season in the spring and to arrive at their breeding grounds later. And these effects are most pronounced in colder regions where it happens to be more difficult to monitor. So what we did was put out a number of acoustic recorders in a very hard to reach park, Glacier National Park. It was very remote. These sites are really hard to get to by boat and through hiking. We put out an array of acoustic monitors and we looked at how well acoustic indexes as a way of analyzing these recordings could pick up the arrival of migratory songbirds. Next. So we found that a particular acoustic index, the acoustic complexity index, had prominent spikes at the start of the spring in April, which corresponded with the arrival of common migratory songbird species. So this was a really affordable and easy way of analyzing huge amounts of acoustic recordings to get at biological information. Next. And this was only a three or four year study. So take away what you will from this, but throughout the process of our study, birds were arriving about five days earlier in the spring. Next. Next. So another example, looking at the impact of aircraft noise on birds at a large scale. Next. So to get at this question, we used the National Park Service's large database of acoustic recordings. In this case, we had students go through and analyze the first 10 seconds of each two minutes, where they looked at the detection of birds and human cause sounds. They did this for eight days at 103 sites across 40 national parks. So again, an enormous scale at which to answer this sort of question. Next. And what they found was that right after an aircraft pass, there was a much higher probability of detecting birds, which transitioned over the course of about two hours into a lower probability of detecting birds. And this lower detection of birds persisted for about three hours after an aircraft pass. So this really raises the question about what the ecological consequences are in of these changes in behavior. Next. Especially since aircraft noise is pervasive within national parks, especially at some of these more remote sites. Next. So where do we go from here? What our research has done a really good job of is outlining the problem. We know that noise is widespread, for example, and we know that it has some sort of ecological and or behavioral consequences. So where our research is going is from measuring the problem to starting to explore some of the solution. Next. So I had some students in some local parks around Ottawa, which is where I'm located, looking at the outcomes of road closures for wildlife, both their behavior and their survival, and then also looking at the impacts of noise production. Next. Also taking a more interdisciplinary and equity grounded approach. Next. So we have a project in Detroit, where we're looking at community led restoration initiatives. This is where local communities restore vacant lots from turf grass and cement to more native vegetation and work capacity building with communities so that they can put out acoustic recorders, they can know where to put them and, you know, how to listen for different birds. And we're looking at the outcomes of these restoration projects over time working with communities and for communities. So I think I'll stop there and I'll save the rest for the question period. Thanks very much. Thank you. So now we are going to proceed with our last presenter, Dr. Sarah Hoefner, who is a postdoctoral researcher within the University of Minnesota College of Science and Engineering. Her research focuses on the practical aspect of conservation and restoring wild mammal populations, including long term continuous monitoring. She's also an active proponent of employee and citizen science and machine learning for conservation and ecological research. And with this, I would like to invite Dr. Hoefner to begin her presentation. Thank you for that introduction, Inez. I'd like to begin by acknowledging that the University of Minnesota, a land grant institution, is built within the homelands of the Dakota and Anishinaabe people. Sorry, my slide doesn't want to advance. Yeah, so doodle alarming rates of wildlife decline throughout the world. Ecological monitoring programs have become a critical component of evidence based conservation planning. Continuous landscape or continent level monitoring programs are crucial to provide insights into population trends and to aid in understanding factors associated with altering population dynamics at various temporal and spatial scales. Constant monitoring is important, not only for tracking rare or threatened species, but also to detect the increase of potentially invasive species and the trends in the populations of common species, which in some regions are declining even more rapidly than rare species. Monitoring of ecosystems in the species they inhabit or the species that inhabit them is especially relevant in Africa. It's a highly biodeversed continent with numerous iconic mammal species threatened by human activities like poaching and human induced climate and land use change. Camera traps have emerged as one of the best tools to inexpensively and unobtrusively monitor wildlife, providing valuable information on how vulnerable populations respond to environmental and anthropogenic disturbances. Because cameras are inexpensive and relatively non-invasive for wild species, many camera trap projects exist, but they are conducted at such a variety of scales that they are frequently incompatible with other datasets. To address this issue and systematically map biodiversity, monitor the dynamics of African mammals, and evaluate outcomes of existing conservation programs, I founded and lead a collaborative network called Snapshot Safari. This network, which I'll hear after called Snapshot, comprises a multinational collective of ecologists, wildlife managers, citizen scientists, data scientists, local community members, and interdisciplinary academics working together to conserve and restore African mammals. Since 2017, Snapshot teams have deployed more than 50 camera trap grids in protected areas in Botswana, Kenya, Mozambique, South Africa, Tanzania, Zimbabwe, and now Namibia. You can see that there's been a fantastic uptake in South Africa where about two-thirds of our grids are located. This map is a little bit out of date as we are now approaching 60 grids. It also does not show all of the grids since some are grouped into clustered monitoring sites. Research teams at every reserve within the network deploy camera traps and collect data in exactly the same way, allowing for cross-site comparisons and analyses of metapopulations. There are many benefits to joining a collaborative network such as Snapshot. First and foremost, even though we collect data and metadata in the same way, each research team continues to set its own priorities. Some teams are most interested in evaluating certain species or gills, while others want to monitor the health of the entire system, and those research goals can change over time. Our cameras capture on average 75 mammal species in every protected area, so the amount of by-catch, but the image is not relevant to a certain study is massive, but there's no shortage of researchers and graduate students eager to develop their own projects using it. Further, many teams take on additional data collection in the form of vegetative or acoustic monitoring. Upon joining Snapshot, researchers agree to share their data with one another under a few stipulations. Researchers can submit abstracts and request access to other sites' data. The abstracts are sent to all of the sites through whom data is requested, and research teams have several weeks to opt in or out. If they opt in, they are to be included in all publications that result from that work. If they opt out, their data is not shared, and the same goes for researchers not already part of the network seeking to access data. On a much shorter timescale to publications, we can identify and share best practices. For instance, one team in Tanzania has pioneered a fantastic program for engaging community members in the work of conservation, and many other teams have been implementing some of those practices at their own sites. Reserve managers preparing to undertake translocations may reach out to other managers to learn from their experiences. And the best part about bringing together so many people working in the same space with the same overarching goal of preventing wildlife extinctions is that it is easy to strike up new collaborations and be inspired by one another. So you may be asking yourself how much data are we talking about here? With nearly 2,000 cameras in the field running continuously, we have collected and classified nearly 18 million images of animals in the wild. We currently average three to four million images per year, all of which are sent to me in Minnesota. To return annotated data quickly, I then employ two types of classifications. Those provided by citizen scientists on the platforms universe.org and those produced by custom machine learning algorithms. So this figure provides an overview of all of those pieces as they come together. I presently use two convolutional neural networks. The first is an object detector that looks for animals and reports empty when it finds them in an image. It is very similar to Microsoft's mega detector, but was trained solely on snapshot data. The second is a classifier also trained on snapshot data that makes predictions of the species in an image. How many there are and observable behaviors like eating or moving. Once I have the machine learning predictions, I bring them together and create a manifest that is uploaded to this universe along with images. From there, we ask volunteers to first confirm whether they agree with computer that an image is empty. If two people agree with computer, the image is pulled from circulation and labeled empty. If not, it's circulated longer to reach a consensus among the human classifiers. We found that consensus is the key to returning accurate data. Images with animals and them are sent to a larger workflow where they're labeled with the species count behaviors and demographics, such as whether young or present or how many of the animals have horns if a dimorphic species is identified. There are many benefits to including the public and participatory science, including just educating them on the systems we work in and sharing information about ongoing research and conservation efforts. Over the past decade, the Snapchat network has attracted more than 200,000 volunteers worldwide, hailing from 77 countries last I checked. The involvement of citizen scientists has led to some interesting findings, including observations of brown hyenas, Egyptian mongoose and leopards outside of their known ranges. More recently, a melanistic serval was spotted on a cluster of cameras in the Serengeti, which is an interesting morphology that volunteers immediately recognized as anomalous. Finally, you can see in this gift that our cameras in South Africa picked up an observation of meerkats and a yellow mongoose foraging together and sharing vigilance, which is a behavior that had not been documented previously. So all of these findings were spotted and reported to me by volunteers in this universe, many of whom have stayed with the project for years. I have one Serengeti moderator who takes days off work whenever a new season comes out, so they're very committed, speaking of which season 16 is dropping soon if you're interested. So can we trust the accuracy of labels generated by computers and volunteers as it's safe to publish using these data? This is a figure from a paper I'm working on assessing the accuracy of volunteers and machine learning alone and in tandem, measured against labels provided by snapshots of our researchers. So this figure shows the error rates of our hybrid pipeline compared to the error rates of AI without humans in the loop. Across the board, machine learning labels contain far more errors without human supervision. This becomes even more true in the species as rare within a data set as in as is the case for all of the species with orange bars here. The dark green bars show AI labels for common species and the light green are the error rates of our hybrid pipeline. For Zerillas, the computer missed the mark every time while the volunteers correctly converged on the label 100 percent of the time. So by introducing AI, we've had the amount of time it takes to classify a data set and by keeping humans in the loop, we have not seen a drop in the accuracy of the volunteer labels, which agree with experts 98 percent of the time. So the takeaway here is you absolutely can rely on the accuracy of volunteer labels with the proper guardrails in place. So more than 25 peer reviewed publications have been based on snapshot data with many more in the works and accelerating in pace. It's more people learn of our data. Since I completed my degree earlier this year, I've joined Zooniverse as their ecology lead, which has been a tremendous amount of fun and helping to build new tools and infrastructure to make life easier for ecologists grappling with how to process and use big data in timescales meaningful to our commitment to conserving the natural world. So I'm working on building connecting infrastructure with other platforms like sitside.org, iNaturalist and wildlife insights with the hope of creating a digital ecosystem where we can collectively access tools across platforms by making connections rather than continually needing to reinvent tools. This will also facilitate incorporating citizen science at both the collection and the classification levels simply by pressing a button to share to another platform. But the thing I'm most excited about and committed to is democratizing access to AI for people all over the world who care about the environment and want to use camera traps to monitor local wildlife. To that end, Zooniverse will shortly be announced in the availability of Microsoft's mega detector object detection model right on those universe platform in our new subject assistant portal. So project owners can upload camera trap images, run them through mega detector, get computer generated predictions as to whether images are empty or not and create new subjects that's right on the platform based on their unique requirements. And so if anyone has questions about using any of these tools, they're welcome to email me at the address on the slide. For now, I just want to say there are a lot of complex relationships to manage in the network like Snapshot Safari. This is by no means a complete list, but the benefits have been tremendous and only continue to grow. Thank you. Thank you Sarah and thank you to all the presenters. We are going to move now to the question and answer session. I would like to revive the audience that we have a question and answer function below the video player. And you can pose your questions right there. I'm going to start with a question for all the panelists. What are the key challenges and opportunities you see in the current applications of technologies for continental scale research? How do you think they can be addressed or leveraged for better outcomes? Can we just go ahead and speak up? Yes please. One of the challenges that we run into is the diversity of data types. Just to give an example from the data I showed, we have on the one hand 3D data that's produced by XCT and we also have lots of 2D data. We have traditional mass spec data and we need to make all of those data equally searchable and equally available and eventually we want to be able to register these data and that is a challenge because they're so different. The details associated with each voxel, the amount of information stored in each voxel, the file configuration, it just takes time to figure out how to do that. So dealing with diverse data types is really a very important challenge within projects and certainly for interoperability. Yeah I'd like to add to that and say absolutely agree with John on that. That's one of the reasons that we ask people to collect data in the same way. But another challenge is sharing data. So sometimes when people have collected data they have a lot of ownership over it. They're not sure about letting other folks have access to it before they've published. So I think just having really strong rules in place and communicating that to people, your data is not going to be shared unless and until you're ready. But we do hope that in the next few years this data will become available publicly. I agree with all of those points raised. So I want to add that one of our challenges is the availability of co-located information. As we're trying to derive scientific insights, we want many variables that are measured together of its similar spatial and temporal scales and resolutions in order to be able to pull all these together into models. And often many of the data sets we use are not intended for that particular science question. They're just broadly available data that's collected through existing monitoring networks that were perhaps designed for a different purpose. So we start to see spatial biases in the data that's available. There's issues with QA, QC that takes time to clean the data. Sometimes with these trade data sets that are referred to, you may not have all the trade data sets available in the data product. So you're going to have to pull together that information from different remote sensing data sets that takes time. So it's just this availability of co-located information that really presents a huge challenge for us. As far as acoustic data, I think our biggest challenge is that acoustic data are enormous. So sharing analyzed data is fairly simple in some sort of data format. But if you're actually talking about sharing raw wave files, it's prohibitive. So just even sharing and keeping audio data open is very, very challenging. Thank you. We have a couple of questions from panelists. Stephanie, please. Yeah, thank you. All of these presentations are so fascinating. Thanks to all of you. The question that I have is, I think anytime that we're dealing with the kind of large data sets that we use in continental scale or global scale research, we have to grapple at some level with ethics and privacy and security. And I think at least two of you have, I bet all of you have some thoughts on this. But for those of you who are especially working with these passive monitoring techniques, these are potentially special issues. So I'm wondering if some of you might want to share some thoughts on the state of ethics and standards around privacy and security relative to the pace of advancement in these fields. And I'm thinking maybe Rachel and Sarah have some thoughts, but I would throw it out to all of you, of course. Yeah, that's a great question. So having so many camera traps out there, occasionally people are going to walk past them. And so we have to have solid methods of extracting all of the human images from our data sets before they ever go online. So we have a couple of special scripts we run offline just to make sure that there are no humans being presented. Even if it is researchers, we just rather not have anyone's face up there. Secondly, with our particular system, we have a lot of threatened animals. And so we have to be very careful about revealing the locations of those animals. So I actually strip all of the ecological metadata before I post them online just to avoid, you know, giving away the locations of rhinos or leopards. Yeah, that is a really good question, especially as a lot of our research moves into the urban space. And so I touched on that really briefly at the very end. But for us, it's just been working with communities and working closely to analyze the data with them for them. And in that sense, they can sort of be the guide of, you know, what makes it into the open. But I mean, that's like a case by case basis right now. We don't have a method of how to do that at a larger scale. And I think that's sort of the next step. We have another question from the panel, Luis. Okay, so thanks, John. I really appreciated your discussion of the diverse data types and the challenges in terms of making that data inoperable across different sets. But I also had a question about spatial scales. And Charlotte, Luca, this is also probably relevant to you. How you go, you know, if you're able to make those data accessible, what do you think are the challenges in terms of spatial scales? Because you're talking about projects that run from remote sensing down to metagenomics. What do you see the challenges are there? I guess I'll go ahead and answer first, Charo. And then I'll try and be quick. So for the molecular and microstructural data that we're collecting, the best way to connect across scales and through time is to use modeling. And that's one reason why I really emphasized in my talk the seminal role that models are playing as we spin up this project and as we continue to think about how we're interacting and what we're going to invest our limited resources in developing next year. We want to, right, we receive guidance from our Science Advisory Committee that we really should focus on providing monadata in a model-ready format for informing all models because that's so crucial. Beyond that, there's interest and plans in the out years to develop more sensor installations so that we have, we're connecting through time a little better. But fundamentally, we need to work very closely with modelers in order to connect across time and scale. Great, thank you. Charo, do you have any comments? Great question. So I think one of the biggest challenges with spatial scales is just the spatial heterogeneity of the terrestrial ecosystems or trying to measure and model. And we just really don't have the resources to be able to measure everything everywhere. This is really where remote sensing can start to become powerful as a top-down approach. But even then, you start to have sort of trade-offs between spatial resolution and temporal resolution that you need to be able to measure and model at. So one of the examples is, you know, the use of traits. Well, the set of traits that are relevant for different functions are going to be different across spatial scales, as I showed in one of my slides. So how do we start to measure everything that's relevant for different scales of interest? When we're trying to make predictions that are relevant for decision makers, you want to make, you know, they are interested sometimes at different scales between watershed scales to basin scales. So you need to be able to model these things as well at different spatial scales. So I think this is sort of the scaling is a grand challenge for us. And it's a consistent problem that we've been trying to tackle across decades of research. Does any other panelist would like to add anything? If not, I'll move to the next question. Okay. So again, this is a question for the whole group. Top-down research often requires collaboration across multiple disciplines. So we would like to know what kind of strategies can facilitate effective communication and collaboration among experts from different domains since all of you have this kind of experience. Thank you. Cheryl, I'm resisting going first. You guys have a lot of experience with this. I have a few thoughts, but yeah, I mean, as part of a, you know, a national laboratory, being in the national laboratory for, you know, many years, team science is fundamental. And we do do work in large interdisciplinary teams. So one thing, you know, we've realized is even terminology can be really different between different scientific disciplines. And, you know, as we starting to flesh out proposals, it's really important to articulate what we mean by a particular term, because different disciplines just define the same thing differently. I also actually, we're a hat as a PI for an open data repository called the assistive. It's a DOE repository. And so the other thing that we try to facilitate when we work across interdisciplinary sciences, the use of interoperable data standards, very descriptive metadata, as well as, you know, metadata that is machine extractable, that can be used to automate some of these searches and discovery across disciplines, because often a researcher may know where the data is available for their specific discipline, but when you're trying to combine diverse types of data across disciplines, you need to be able to, you know, search outside of your typical resources that you've gone to. So there's many different techniques that we can start to use to enhance that search and discovery process and integration process as well. So for the Monet project, this is really important. There's, as I showed in the talk, there are a lot of different disciplines that we connect to and a lot of different investigators. It can be overwhelming. So one of the things that we do is focus, try and focus in one area at a time right now. This year, we're spending a lot of time understanding how we can begin to engage in atmospheric sampling, which connects in many ways very naturally to the soil sampling that we're doing in terms of its synoptic qualities. So the outreach is very important. We have postdoctorals that are part of the project. Their job is engagement by forming collaborations with external researchers, so they can catalyze that researcher, that external researcher driven science. And in addition to that, we lean deeply into the, into the outreach and training activities that that really provides an opportunity for people in different research disciplines to understand what we're doing and reach out to us and then we can form those collaborations. So I would say those are three different mechanisms. One is the first one that I mentioned is where we actually have scientists on staff or expert in areas and they're really driving a coordinated activity to reach out to the community. And then there's our postdocs and then there's the outreach workshops like the one on November 7 and 8. Thank you. I see Stefanie has another question. Yeah, I think, so Sarah used the term that I love is a bycatch in your data, right? So you've got your target observations and then you know that there is all this other stuff there that would address questions that you yourself have not ever thought of, you know, you wouldn't think of and other people can use it and like 10 years from now they'll be able to use it for things nobody's even imagining. And I think you're respective of the fact that these are huge data files and we have to make, you know, decisions based on storage. Like this is, this is not a new problem. This is what museums and collections and long-term research programs have been wrestling with forever is like what do you keep and what do you throw away just based on the chance that that bycatch from your program could be something that really revolutionizes some area of science in the future. And I wonder if you could address a little bit something about this challenge of how do you decide what to keep, how long to keep it and what you throw away when you're dealing with some of these newer data types. Well, I'll start with Snapchat Safari. Essentially, we keep everything. I don't throw anything away. We focus on mammals. So we do tend to categorize things like birds into just one large bin. And then if people are interested in using that down the road, they can have a go and take a look at all of those images that are out there. But there've just been so many like grassroots projects that have sprung up asking, can I use your leopard data? I want to look at hyenas. And so I keep all of the classified data and we also share it online. There's a labeled image library of Alexandria, which is co-founded by myself and some other folks at Microsoft in Minnesota. So all of our labeled images stripped of the metadata are available there publicly, freely available anytime someone wants to get them. And so that I think has generated a lot of new species classifiers using that existing data. Thank you, Sarah. I think we have run out of time. So I will add.