 All right, I'll go ahead and get started. Greetings and thank you for attending this month's Science Seminar presented by the NSS National Ecological Observatory Network, which is operated by Battelle. Our goal with this monthly series of talks is to build community among researchers at the intersection of the Ecology, Environmental Science and Neon. This month, we're very excited to have Keith Chakran here to present his work with us, but before we turn it over to the speaker, a few logistics. We have enabled optional automated closed captioning for today's talk. If you would like to use it, find the CC button in your Zoom menu bar. The webinar will consist of a presentation followed by a Q&A. As you think of questions, please add them to the Q&A box. We also have a meeting chat, which you can use to share links and other items of interest with the larger group, but try to add your questions for the speaker to the Q&A. We'll facilitate discussion at the end, and there will also be opportunity to unmute yourself and ask a question over audio. Neon welcomes contributions from everyone who shares our values, unity, creativity, collaboration, excellence, and appreciation as outlined in our Neon Code of Conduct. This code applies to Neon staff as well as anyone participating in a Neon event. The full code of conduct is available via a link that I will share in the chat in a moment and also on our Science Seminars webpage. I will just show you that here. I'm sharing my screen to show the page, and our code of conduct is down here embedded right in the middle of the page. This talk will be recorded and made available for later viewing, again, which will be linked on this Science Seminars webpage. And to complement our monthly Science Seminars, we host related data skills webinars on how to access and use Neon data. Registration for those is available at the same Science Seminars webpage in the bottom section on data skills webinars. And I've recently learned that although the webpage is not reflecting it currently, we are doing a Neon Microbial Metagenome-oriented data skills web job in partnership with the National Microbiome Data Collective, NMVC, and that's going to be in December. So check back at the Science Seminars webpage, Science Seminars and Data Skills webpage very soon for the links that you can register for that data skills event. And then lastly, if you have ideas for a talk for this Science Seminars series, nominate yourself for a colleague today by filling out the form right here near the top where you can nominate a Science Seminars speaker for a future version of this series. Now I will turn it over to Hugh Croft to introduce our speaker. Oh, thank you, Samantha. Yes, we are really excited to have Peter Chukrin as our seminar speaker today. Peter is a postdoctoral researcher at UC Berkeley, whereas researching soil metagenomics and metatranscriptomics. He has a broad interest in soil microbial ecology, and he's no stranger to Neon. After completing his bachelor's degree at Syracuse, he worked as a lead field technician at a Neon site in Utah in 2017. And he's also done field and laboratory work for the USGS and Yellowstone National Park. His combination of field laboratory and data analysis skills served him well in his graduate work at Northern Arizona University from where he received a PhD last year before moving to Berkeley. Today he will discuss his research using Neon and other field data. So I'll turn it over to you, Pete. Great. Thank you so much for the introduction, Hugh. As you said, my name is Pete Chukrin. I am a postdoc here at UC Berkeley. And today I'm going to be talking about some of my dissertation work which used Neon microbial data to look at selective pressures on soil microbial communities across scales, as well as some of the work that I'm doing here at UC Berkeley, that was the analyses of which are kind of informed by some of that work. So I'm going to do kind of a general introduction here because I realize not everybody here might be already invested in the world of soil microbial ecology. But I really want to stress how important soil microbes are for terrestrial ecosystems. And we can think about this on a number of levels, whether it's the symbiotic relationship that they form with plants or their crucial role in decomposition and carbon cycling and the stabilization of soil organic carbon. It's really hard to understate the importance of soil microbes for the formation of soil and also how our world would look without soil microbes. But the problem is that they're really complex. Part of that is that in the grain of soil, we can have as many as one billion microbes. And so if you think about you have a handful or a bucket full of soil, you might have more individuals in that sample than there are people on the planet. And so starting to kind of understand those community interactions at that scale is very daunting and very complicated. And then there's also the part where they're small, very small. So that's the micro part. We can't really do transects on soil microbes and we can't lay down a quadrat and start studying them like we would plants. And so that makes understanding of these communities really difficult. But luckily, you know, we're living in kind of a really unique time where sequencing has never been cheaper. A lot of people who do genetics work are probably familiar with this graph. Here in the gray is Moore's law, which is kind of a predictable reduction in the cost of computing power over time. And then in the yellow here is the cost of sequencing a genome. And so this is to say that the cost of sequencing has outpaced the reduction in the cost of computing power enormously, which when we think about what a computer looked like in 2001 versus what one looks like now, this is really incredible. So the advancement in sequencing technology has really expanded our ability to study microbial communities at scale. And so this is where omics comes in. And so we all know that, you know, central dogma biology is, you know, we have DNA that gets transcribed into RNA that gets then translated into proteins. And if we were to study the DNA of an individual that would be, you know, a genome or, you know, the human genome referred to study an individual plant, the genome as well. But when we look at all of the DNA from a microbial community, we call that a metagenome. And then all of the RNA, all of the RNA transcripts would be metatranscriptome. And then all of the proteins would be called the metaproteome. And so I just wanted to put these definitions out there. I'm going to be focusing pretty much primarily on the first two here. And then with the neon data, really just metagenomes. And a lot of my work focuses on genomic traits. And so I consider genomic trait to be really anything from a genome that does not tell us function or identity. And so this could be the size of a genome, so how many base pairs and length the genome is. It could be the GC content. So if we, you know, we know that the genetic code is made up of AT and GC base pairs. And so if we look at a sequence, we can start to total the GC base pairs versus the total length of a gene or a genome. And so in this case, you know, we have four GC base pairs, the length of the gene is about 16 base pairs. And so that would be a GC content of 25%. And this might not seem super important, but it actually tells us something about how genes are written. And that can really tell us something about life strategy. We can look at codon frequency. So we can go through a coding sequence and start to tally up the codon frequencies with, and how this gene is written. And this can tell us something about how that gene is written. And it becomes important for rates of transcription and translation. And I'll talk about this a lot more later, but it is kind of fundamentally tied to some life strategies. And then we can also look at things like amino acid content. So we can go back to those codons and start to, you know, go to a codon usage table and start to decode them and look at how bacteria, how microbes are actually writing their proteins, what amino acids are they using. And this can also tell us something about their interaction with their environment. And, you know, I'd like to make the comparison to the study of human communities on why traits might be important, because if we're studying human communities, we can get a lot of information from everyone's name and everyone's occupation, who's there and what they're doing. And that's often the focus of a lot of microbial work too, because similarly, if we're looking at a microbial community, we can get a lot of information from taxa and functional genes, who's there and what are they doing. But we know that if we're studying a human community, right, that let's say you're comparing Santa Fe and Boston, you might have the same proportion of guys named Ricky, who do HVAC, but you know that those two communities are very different, even though there's some functional redundancy in name and occupation. And so there are other traits that we can use to kind of inform these comparisons between communities. And so for human communities, this could be rent or income, size of buildings, cars per capita. There are a lot of things that we can use to kind of help contextualize the community information. And in the same way, looking at microbial communities, things like the size of the genome, GC content, codon frequency and amino acid content, they tell us something about how these genes or how genomes are written, the actual building blocks of genomes and proteins. And I think that there's a lot of value in that information, which then we, it's, you know, raises the question, okay, well, then what influences these traits. And a lot of times we think about it in terms of nutrient limitation. And genomic streamlining is I think what comes up most often, which is in low nutrient environments, we tend to find smaller genomes with lower GC content. And both of these traits are thought to reduce the cost of reproduction. Now the smaller genome, there are two reasons for that. One is that basically if a smaller genome means less nucleotides needs to need to be synthesized in order to reproduce. And then also in low nutrient environments, there tend to be smaller cells. And this is kind of a fun life strategy, because smaller cells mean a higher surface area to volume ratio. So a greater chance of interacting with nutrients per unit volume. And usually that comes at the expense of things like motility or metabolic diversity. But being small does have its advantages. But small cells also tend to have smaller genomes. And so there's kind of this simplified structure that is kind of a fundamental life strategy that is reflected in genome size. And then low GC content, there's also two reasons for that. One is that the GC, oh, sorry, went too fast. Genomic streamlining is highly prevalent in oceans and a lot of the literature based on genomic streamlining is kind of founded in marine environments. Okay, so like I said, we also see low GC content in low nutrient environments. There are two reasons for this. One is that the GC base pair is slightly more costly to synthesize. And then also the AT base pair has a carbon to nitrogen ratio of 10 to seven, whereas the GC base pair has a carbon to nitrogen ratio of nine to eight. So that AT base pair actually saves on nitrogen. And so in nitrogen limited systems, we tend to see a much lower GC content. And so there have been a lot of studies that have looked at these trends in isolates. And we know a lot about the distribution of genomic treats already. So we know that in soils, we tend to see much larger genomes than in oceans or host associated systems. But I think the problem with that is one that 99% of bacteria cannot be cultured. And I kind of chose the high end of this estimate, it could be more like 95. But it's to say that a vast majority of what we know is based on bacteria that we can culture and we don't know. Sorry, I'm stumbling over my words here. There's this huge proportion of the community that we can't access. And we don't know how these traits are distributed. So there could be some collection bias there. I also like to study these things on a community level because I think that there is some value to understanding how traits are distributed, not only within isolates or within a community, but between communities. And we can kind of think of this like, kind of going back to a human community. We know that education and literacy rates are related, right? And we could show that within a human community. But it's not until we start to compare different communities and start to compare statistics across communities that maybe we can see that, oh, it's related to public funding and public education or school launches. And so I think that there is some value in being able to look at these traits and the distribution of these traits between communities across scales as well. And so I'm going to introduce a study that we did that I think really lays out the story for the neon, the neon analysis that I'll present later quite well, which is we went to the IMG database, which for those that are not familiar, it's DOE's JGI Joint Genome Institute. They do a lot of sequencing and make these datasets public. And so we took public meta genomes from hosts associated marine soil and thermophilic environments. And I just wanted to see, okay, how are these, if we look at community average traits between these environments, how are they distributed? And what does that tell us about selection pressures for these environments? Now, I want to quickly talk about how we estimate genome size, because you know, in a perfect world, we would get all of our sequencing data as whole genomes, but anyone who does genomic work knows that that's not the reality yet. Right now, all we get is a lot of times, especially for microbial data, we get short reads. So 150 base pair short reads, which makes estimating genome size really difficult. And so we could either try to reassemble all of these short reads back into a genome, or we could use a different approach, which is single copy genes. So we have all of our short reads from a meta genome, and we can use genes that only occur once per genome to get an idea of how many individuals there are in that sample. And this is kind of like if we had a number of jigsaw puzzles, and for some reason, we threw them all into a bag. And at the end of it, you wanted to know, okay, how many puzzles are in that bag? One thing that you could do is you could look for corner pieces, right, you could look for pieces that only occur once per puzzle. And so that is kind of what we're doing here, we're looking at genes that only occur once per genome. And then based off of that, we know how many individuals are in the population, sorry, in the community. And then based off of the number of base pairs and the number of individuals, we can generate an estimate of average genome size. And so each point here represents a community average. And here in blue are meta genomes from marine systems. And so we can see that there's this positive relationship between genome size and GC content. And that's what we would expect, right? That's it supports genomic streamlining, which we know is highly prevalent in marine systems. What we found that was surprising, though, is that we have a negative relationship in soils. And it's not incredibly strong. But it's a but I think it's there. And the natural tendency is for this relationship to be positive, because there's also a mutation bias, whereas smaller genomes tend to have a lower GC content, because there's a mutation bias from GC to AT, that requires extra genes to correct. And so there's this natural relationship for these two to for genome size and GC content to be positive. So to have this relationship be negative really indicates that there's some type of unique selection pressure that is going on in soils. And so this is where we kind of start to think about how soils are different from oceans, which a huge part of this is soil microbes are generally carbon limited, even though a huge proportion of the soil is made of carbon, these compounds tend to be harder to break down. So they tend to be things like cellulose, hemicellulose, lignin. And so despite a large portion of the soil being carbon, they are limited by labial carbon. And so that got us thinking that, you know, even though that AT base pair has a C to N of 10 to 7, the GC base pair, because it has a carbon to nitrogen ratio of 98 might be more efficient in carbon limited systems, such as many soils. And that's why we might be observing this negative relationship, where smaller genomes tend to have more of this GC base pair that saves on carbon. And when we look at where these samples are coming from, we see that smaller genomes with a higher GC content tend to come from places like polar deserts, deserts, grasslands, agricultural systems, and then the further right and down we move with larger genomes, we tend to see samples from forests, the rhizosphere of grasslands, tropical forests. And so this kind of supports this idea, right? But the difficulty here is that it's very speculative because we can't actually link this data to carbon and nitrogen data. And the metadata from all of these samples is very limited, because what we get from IMG are these meta genomes from a number of different studies that, you know, we know that the sequencing might be the same, but the sample collection might not be. And we also, the data that's available for each sample is not consistent. And so it really, you know, it really highlights how important standardized collections are. And there are a lot of large microbial data sets for oceans. I have the Terra Ocean data set here highlighted on the right, but there's far less for soils, which is really difficult because, you know, even though there are so many metagenomic studies that exist out there, soil is so highly heterogeneous that standardized sample collection is really crucial. And whether that's how the soil is actually collected or the timing, having standardization is really important. And as well as standardized metadata and standardized sequencing and also ease of access. And so this is where, you know, now 20 minutes in, I introduce our study with the NEON data, because NEON has that. And so this was really exciting. I knew that these samples existed, because as you said, I worked as a field technician in 2017. So I actually got to do the collection and then five years later use it as part of my PhD work. And so I knew these samples were out there and I was really excited because, okay, can we use this data to link below ground carbon to genomic traits? And I want to thank Cody Flag, who used to be at NEON. He helped so much with kind of orienting me to these data and helping me access them. And so part of this analysis actually involved assembling contiguous sequences. And we were primarily interested in bacteria. And most of what I've discussed has been kind of bacterial focused, even though I keep saying microbial community. I really should change it to just be saying bacterial community. And so for this analysis, I wanted to really hone in on the bacteria and look at the traits and how they relate to the environmental data. And so what we get from NEON are these sequence short reads, the 150 base pair reads. And then we assembled them into contiguous sequences, which are called contigs. And what this allows us to do is then actually like identify open reading frames of genes and go through and look at codon usage, so look at amino acid usage. And it also gives us some type of taxonomic identity. So we can start to really just isolate bacteria and just look at the bacterial population and how these traits then map back onto bacteria. And so what I have here are genome size and GC content averaged by site across the NEON sites. And you can see some geographic patterns start to kind of jump out. But before we start to speculate too much, I kind of want to move forward with looking at these trends and how they relate to each other and some of these environmental characteristics. So again, each point on here represents a community average. And similar to what we found with the IMG data, we found a negative relationship with average genome size and GC content in just the bacteria. And this, you know, it's not an incredibly strong relationship. I'll, you know, admit that usually I would not go to bat for an R squared of 0.12. But it is significant. And it does, as I said earlier, kind of work against the natural trend that we find, which is that genome size and GC content are positively correlated. So the fact that we find a negative correlation at all, I think is very promising and really supports that there's something unique that's going on here. We ended up using I think 398 metagenomes. I'm not sure why we didn't just add two on there to make it 400. But when we look at how these traits are related to soil characteristics, we find that the idea that soil carbon might be dictating the distribution of these traits is well supported. So again, each point here represents a community average. And we find that with lower levels of soil carbon. And right here on the X axis, I have extractable carbon, the ratio of extractable carbon to extractable nitrogen as it relates to GC content. And so that GC base pair, which as I said, requires less carbon is more abundant when soil carbon is low. Another way of looking at this is we could transform this data to make it into the carbon to nitrogen ratio of the DNA of a soil microbial community. And so we have the GC base pair, which we multiply by the C to N, so 9 to 8, the AT by 10 to 7. And that gives us an estimate of the C to N of the DNA. And what we find is that they're positively correlated. And this is just a transformation of this data, but I think it just shows it in maybe a little bit more direct way that the carbon to nitrogen ratio of the DNA is related to the carbon to nitrogen ratio of this oil. And we start to see a plateau around 20, which I think is a really cool result because previous studies, and I have a paper from Musa Emerita 2014, have shown that around a C to N of 20, we see a shift from carbon limitation to nitrogen limitation. And so as carbon limitation is alleviated, we no longer see this positive relationship between the soil carbon to nitrogen ratio and the carbon to nitrogen ratio of the DNA. And so I think this is just an interest, a really cool demonstration of how nutrient limitation is quite literally written into the genetic code of these bacteria. Now we also looked at the amino acids and the carbon to nitrogen ratio of the amino acid. So for a coding sequence, we can go through and we can link up each codon with the corresponding amino acid. And then we can total all of those for a genome and kind of get, it's not necessarily a carbon to nitrogen ratio of all of the amino amino acids that are expressed, but all of the potential amino acids in a genome. And what we find is that they're positively correlated. So higher levels of soil carbon tend to have microbial communities where the amino acid carbon to nitrogen ratio is higher. And so that resource conservation isn't just in the nucleotides, but also in how they are writing their proteins. Now, there is an interesting relationship where if you take the carbon to nitrogen ratio of codons and you plot them against the carbon to nitrogen ratio of their amino acids, you'll find that they are positively correlated. And so higher CN in a codon result were usually, sorry, an amino acid that requires more carbon tends to be coded for by codons that require more carbon. And when I plotted this, I was so excited because I thought, oh man, I have a science paper, my dad's going to be so proud. But it turns out that somebody found this, of course, like 50 years ago. But I think it's still important because it shows that this resource conservation is aligned in both the nucleotide composition and the amino acid composition. And that we were finding evidence for both relationships. But it did kind of suggest to me, okay, well, what if what if what we're finding with our GC content is really just an effect of changes in amino acid composition, right? And so we looked at synonymous substitution sites. First, we looked at fourfold degenerative sites. So if we look at our codon table, highlighted here are nucleotides where any change in that nucleotide will encode for the same amino acid. Now, if if GC content was only driven by amino acid usage, that these would all be even, that there would be no selective preference for cytosine or guany. But what we find is that both GNC are in higher frequency at these sites. And we actually see that cytosine is in the highest abundance, which I think is also kind of a fun result because cytosine is cheaper to make than guany. So there's also this, this, you know, other relationship where really the cheapest nucleotide is selected for at these synonymous sites. We can also look at this, at these synonymous substitutions as they relate to soil carbon. So, you know, up in the top right panel here, I have aspartic acid, which can be encoded for by either GAU or GAC. And GAC, so the one that has two GC base pairs tends to be in a higher abundance when soil carbon is low. So what this provides evidence for is that even despite amino acid usage, that there is some type of selection for GC base pairs at synonymous sites, even when soil carbon is low, which we suggest could be due to the fact that the GC base pair saves on carbon. And we find this for every amino acid. There is not a synonymous substitution where the GC content does not significantly influence, we're sorry, is not significantly related to the total level of soil carbon. And so again, this suggests that it is both nucleotide cost and selection for amino acids that could be driving this relationship. Of course, we'd be remiss not to look at all of the different factors that might influence traits. And so, you know, there's so much more neon data than just carbon to nitrogen ratios. And so with the help of Jeff Propster, who at NAU who worked up the PLFA data, and Austin Rutherford, who helped with the machine learning, well, I should say not helped, because what I wrote might as well have been written in crayon, did most of the machine learning approach that I'm showing here. Using a random forest model, we found that pH was one of the strongest drivers of genomic traits in soil. And so there here on the right panel, you can see that soils where with a higher pH tend to have a smaller genome size, and it's pretty pronounced relationship. And you know, this is this is good. It was kind of a bummer because I really liked my hypothesis that it was all driven by carbon. But it suggests a more complex story. I do think that soil carbon does have something to do with it. But soil pH is kind of this, like master parameter, right, or this indicator, indicator parameter that has a lot of different effects tied into it. And so we know or we found that soil carbon was related to soil pH, where low soil pH, we tend to find higher soil seed and ratios. We also know that the substrate in low pH soils tends to be more complex, we have more above ground biodiversity, you know, we know tropical soils tend to be very diverse. And so we might have a more diverse carbon compounds that require greater metabolic diversity, meaning that that larger genome size might be really advantageous. We also know that low pH requires more stress response genes. And so a larger genome is required to live in low pH soils. And then finally, it's been shown in a couple of studies in the past year that increased rigidity selects for smaller genomes. And I have the two citations here, Lou 2022, and Simmons in 2022. And so this, we find kind of a similar relationship to this as well, where high pH soils where that tend to be in arid environments, we tend to see smaller genomes. And so we kind of wanted to, we then use these results, use the neon data to kind of inform our analyses on a laboratory study that we're working on right now. And so kind of for the last part of this, I want to talk about this lab study that we're doing and how some of these traits that kind of we've been looking at with the neon data map back on to laboratory at laboratory incubation. So in this study, we looked at the Birch effect, which is the pulse of carbon mineralization after the rewetting of dry soil. So in terrestrial systems characterized by dry and wet seasons, the first rain event after the dry season can account for a large proportion of the annual CO2 respiration and is a really important time for carbon cycling, for microbial growth. And there are a couple of factors that drive this. One is that we're just reconnecting the soil matrix with water. And so there's more flow of nutrients around and it alleviates water limitation. And then there's a lot of accumulation of biomass either from desiccation during the dry season, osmotic shock. So as we add water to dry soil, we're rapidly changing that diffusion gradient and some of them explode in a process called lismoptosis. We also find increased viral Isis. So there's more necromass just from viral infections. And so this is a critical period of increased metabolic activity. And so our laboratory incubation soils were collected from the Hopland Research Center, which is about a two hour drive north of Berkeley. It is a Mediterranean grassland system characterized by warm dry summers and cool wet winters. Our soils were collected after the growing season, but before the first rain and fall. So kind of at the tail end of the dry season. And then we wet them up in laboratory incubation with collections at 0, 3, 24, 48, 72, and 168 hours posterior wetting. So we have this time series of the birch effect. And we used a couple of different omics approaches here. We used quantitative stable isotope probing, where there's really briefly we have a sample where we add 16 o water, one where we add 18 o water. After, you know, one of our collection times, we extract the DNA, we centrifuge it in a cesium chloride solution and taxa taxa then grew and incorporated the 18 o water into their DNA. That DNA moves further along that tube, that cesium chloride solution. And so when we fractionate it and sequence those fractions, we can actually get an estimate estimate of growth rate based on how much of that 18 o ice still was incorporated into the DNA. And so using that DNA, Ellis Saratsky, and there's a link to her preprint here, also assembled metagenome, so we can kind of frankenstein back together some of those genomes. And so what this gives us is we have we have whole genomes, we have growth rates, and we have transcription. And so how do some of these genomic traits that we have been looking at in the neon analysis, how do they relate to growth and transcription after the dry season? And one of the big findings that we found is that bacteria with smaller genomes grew more after rewetting. So here on the x-axis, I have estimated genome size. I say estimated because because they're reassembled. So, you know, it's our it's our best guess. But each point here represents an individual taxa. And then on the y-axis, we have atom fraction x-axis. So the amount of that 18 o isotope that was incorporated into their DNA. And we find a negative relationship indicating that smaller genomes grew more after rewetting, which when we kind of go back and look at the neon result and see that, okay, well in high pH systems and more area, which tend to be more arid, we tend to see smaller genomes. And so genome size could therefore be a response or be a part of a life strategy for rapid response to pulse driven systems, systems where you have a long period of drought and just short bursts of nutrient availability. And so finally, I want to present some results on codon usage. And let's see how much time do I have? Not a ton. Okay. So the kind of, you know, that the leading nucleotide and ribosome matches up with the corresponding anti codon and the tRNA. And so right here, we have AAC meets up with UUG, which encodes for asparagine adds it onto the amino acid chain. And you know, that's how a protein is born. Now these tRNA can be found, you know, they're synonymous codons and synonymous anti codon, tRNA anti codons. And so there are multiple tRNA that exist within a cell with different anti codons. And so the synonymous codon here instead of AAC would be AAT. We could see it links up with this asparagine highlighted in blue. And these tRNA are found in different abundances in the cell. And so we can see that there are more of the red UUG anti codons than UUA. And so this codon here is said to be optimized as opposed to AAT. And so the alignment of codons to the more abundant tRNA is often called codon optimization. And it is extremely important for both translation and transcription. So the alignment between the tRNA and the actual codon usage of a gene can greatly influence the rate of elongation when it comes to translation into a protein. It has also been shown that higher levels of optimization result in higher levels of transcription. Those promoter regions are more readily identified and tend to be transcribed at a greater rate. Now, this effect is so strong that codon bias, so how redundant the redundancy of codon usage in ribosomal genes tends to be a good predictor of growth rates. So here I have results from Weissman et al. 2021 where a higher level of codon usage bias is pretty well associated with doubling time. So faster growth rates tend to be associated where higher levels of codon usage bias tend to be associated with faster growth. And so I wanted to see, okay, do we find the same thing in ribosomal protein genes in response to this natural phenomenon of the Birch effect? And what we find is that as we move further, here I have codon usage bias represented as the effective number of codons. The further left we move on this graph, the higher the codon bias. And so we find that higher levels of codon bias are associated with higher atom fraction excess, which as I said earlier was kind of an index of growth rate. And so we can see that codon usage most strongly correlated with growth, especially early growth at early time points. Now, when we go back, oh, I forgot I have one more part. Okay, I'm just going to kind of fly through this so we still have time for questions. We were also able to map metatranscriptomes back onto these metagenome assembled genomes. We kind of separated them into different response types, based on their max level or max time of transcription. So here on the Y-axis, I have the net proportion of genes upregulated. And I just categorized those genomes, those mags, by either early, middle, or late responders, or if they decreased, I put them as sensitive. And we find that ribosomal gene expression in these early responders, predictably increases early. We also find that there are similar to our growth rate, we find a higher level of transcription with higher levels of codon bias. And so this is just to say that codon usage, how genes are actually written influences both transcription and growth rate. And I think this is really kind of an important finding, because once again, looking back at the neon data, and seeing that, okay, in carbon poor soils, we found actually very high levels of codon bias. We found that they tend to be highly dominated by one set of codons for each amino acid. And so I think this is maybe representative of a fundamental life strategy that's written into some of these genes. And so I think that this is, yeah, this is a really fun result and kind of shows how both neon data and can inform some of these experimental analyses and combined can really paint a much fuller picture of the selective pressures on bacteria and microbial communities. And so just to kind of wrap up the conclusions here, the standardized observations like with neon are essential for being able to link microbial communities and microbial community structure and traits to environmental data, because that standardization is crucial. We also found that using neon data, we're able to link these traits and find that soil carbon and pH drive genomic traits in soil. And then finally, using the results of analysis of neon data to kind of inform and guide experimental analyses. And so with that, thank you so much for having me. And I want to thank the people at NAU here at UC Berkeley and Lawrence Livermore National Lab. This work was funded by the DOE Science Focus Area where my groups persist as well as the USDA. And my email is down here on the left if anyone has any questions that they want to ask me after. But yeah, with that, I will take any questions. Great. Thank you very much, Pete. That was awesome. And I think people are thinking through some questions. And I'll just start the questions and please just add your questions to the Q&A tab on your Zoom bar. Just thinking back to the beginning of your talk, you talked about the relationship between genome size and habitat. And you showed some habitats. Then obviously digging more into the data, it was about pH and aridity. Thinking back to that first figure, does it all make sense? Is it all just about pH? Are there certain soil types or habitats that don't conform to that or have a different story to tell? No, that's a good question. I think pH is a very good indicator because it incorporates I think a couple of different factors that could be driving these traits, whether it's the actual stress of pH requiring more genes or whether it's its relationship to aridity. And so I think that there really does need to be more work and kind of more intentional experimental work to be done trying to pick apart some of these effects. So looking at how maybe like additions of carbon are just changes in pH alone might influence some of these traits because it's really hard to say right now which one drives certain traits. And that is probably a combination of all of them, but I think it requires more experimental analysis. Thank you. We have a first question here in the Q&A. How can you relate this study to global loss of soil carbon desertification of land? And how can this tie into the largest soil crisis at hand? Yeah, I've kind of thought about that because okay, if we're starting to see carbon loss from soil based off these results and I should have highlighted some other papers that came out this year that have shown that the importance of soil carbon on some of these traits. But with soil carbon loss, I mean it could mean that we find we kind of lose some of that functional diversity in those microbial communities because they require fewer genes. And so yeah, I think that it's also been shown that higher temperatures actually result in smaller genomes too. And it's hard to say what the effect of that really would be and what the effect of reduced levels of carbon would be. But I think the idea of starting to see our soil microbes shrink in genome size because they require fewer genes is kind of scary in terms of also thinking about resilience. I think it is understanding these traits, I think kind of is promising in being able to look at microbial communities and start to maybe understand what they're trying to say about their environment. And so I think that part of it is kind of promising and being able to use these as indicators of soil health. Awesome, thanks. So thank you. I can ask kind of a follow up on that while we're waiting for some more to come through in the Q&A because I was thinking about kind of plasticity or like time scales of response kind of following on that like where you have global change. You know, we're increasing CO2 so is the C to N slowly but surely getting wider and are microbial communities able to kind of adapt in like annual, several annual to decade old time scales? Let's say are they responding? Do you know if there's anyone maybe in face experiments or is there evidence that like these genomic traits are kind of plastic and the community can evolve to match the resource? That's a really good question. I'm not, I actually don't know. I'm not 100% sure. And I think it'll be like pretty interesting as, you know, we're still kind of, I feel like in the early phases of metagenomics, well now it's been like, you know, moving on like past 10 years, but like as we kind of start to have a reference data set that is far enough in history, it will be kind of interesting to see, okay, from neon data, which I think goes back for to what I'm not sure when they're the earliest metagenomic collections like 2013 or 2014. Or 2013, 2013. Yeah, yeah. To be able to have those as a reference data set and be able to see how environmental change influences those traits is going to me. I mean, let's be honest, scary, but interesting. Excellent. Yeah, you talked about transcriptome. Oh, sorry, we have another from our colleague here. Thanks for the great talk. As a neon scientist, I'm happy to see you contributing ecological insights with neon data. Can you reflect a bit more on your journey from collecting neon field data to early career scientists? Did you know you wanted to look into these relationships before working with neon? That's thanks, Dave. That's a great question. Yeah, so I think for a long time I was, even as an undergrad, I remember reading the early metagenomics papers and thinking, whoa, like this is so cool, we can actually look at genes of soil microbes and they're important and they can be related to the environment. And I think I was trying to figure out what I wanted to do for a while after my undergrad and really enjoyed field work. And it was just kind of serendipitous that at one point I was collecting soil microbes for neon that were then used in my dissertation. So you've actually analyzed some of the samples you collected? Yeah, that I actually, yeah. Okay, well, that's awesome. Five years ago. That's great. Yeah, just briefly on the transcriptome data, because you talked about the, when you looked at the carbon nitrogen ratio with codons, and that was with DNA work. So it was just sort of all the codons. Has there been work or are you thinking of looking at transcriptome data to look at the actually expressed codons and seeing if there's a stronger relationship or less? Yeah, so you're saying, does the GC content of, or like the elemental ratios of those codons relate to the short term response? Yeah. Yep. It actually ends up being more about the redundancy of codon usage. So how simplified those genes are actually being written ends up influencing transcription and growth rate more than stoichiometry of those base pairs. Excellent. Okay, we've got a couple more questions from an anonymous attendee. Some microbes in the soil are likely to be feeding off of other bacteria, microbacteria spring to mine. If you pull out mags that may be predatory, do you see or what do you predict in terms of shifts in carbon nitrogen ratio preference or genome sizes that differ from microbes that are more likely to feed off plant matter? Yeah, that's a really, that's a really good question. And there's been some more work recently, both out of Bruce Hange's group at NAU and a little bit from Jennifer Petridge here at UC Berkeley, looking at predation, both viral and bacterial predators. We have looked a little bit at, just predation based off of phylogeny of some of these mags. We tend to see that they have larger genomes, but the kind of problem is that, we end up assembling maybe, I think like a couple hundred genomes. And that feels really impressive, but it is such a small proportion compared to what's actually there. And so I think there have been a lot of, maybe there have been some 16S analysis that I think are maybe more informative, but using the mag approach, sometimes these community dynamics are hard to get at. Excellent. Okay, I think we have time for one more question. And I'll just throw a shot, just a reminder that it was, we have a data skills webinar on metagenomics coming up on December 5th. It is now live on the seminar page. And our colleague Claire has posted that in the chat, but check our webpage there for the links that's coming up on December 5th. And from our colleague Luciana at UC Irvine, thank you for the excellent talk. Do you have some thoughts on how your results influence, how microbial processes are represented in current models? I use ecosystem models. Yeah, I think that's kind of the nice thing about, one nice thing about traits is that they can provide maybe a very simple, easy to get at metric that might tell us something about carbon or nutrient demand that might be meaningful and a little bit easier to get at than analyzing like a whole community structure and trying to get a metric that way. I'm not sure what that would be yet because I think, yeah, it's hard to actually be linking these at this point to biogeochemical cycles, but I do think that there's kind of some like, there's a promising avenue there to have these kind of simplified metrics that could help get at some of these questions. Awesome. Well, thanks. Thanks so much for all that and a great talk. I'm going to turn it over to Samantha who will wrap it up. Yeah, just echoing that. What a fantastic talk Peter, very excellent to see how you're leveraging me on data. Please head over to our science seminars webpage. I'll just throw the link in the chat once more. The recording of this talk should be up by the end of the week as well as all the talks that we've had this year as well as the last year's seminars. You can watch those at your leisure. The data skills webinar coming up on December 5th and then we are not having a science seminar in December, but we'll be back in January. The first talk in the new year will be on the Society for Freshwater Science Merge program to think about how we can increase diversity of people who are interested in freshwater science. We hope to catch you for that and thanks everyone for attending and thanks again, Peter, for a great talk.