 Good morning. Good afternoon, and for others, good evening. I'm Max Hegblum, Editor-in-Chief of FEMS Microbiology Ecology, and it is my pleasure to welcome you to this webinar on approaches, methods, and challenges in microbiome research. After a short break in our webinars, it's great to get back to our series, which enables us to highlight different topics of microbiology. FEMS, the Federation of European Microbiological Societies, invests in science, using the income from our journals to fund charitable activities and support our community. I've said this before, but I think this is an important message to repeat. Societies and their journals provide grants to scientists, organize and support conferences, sponsor a range of events, such as this webinar series, which provides a forum for the presentation and discussion of ideas and enabling the flow of ideas to continue despite the cancellation of many in-person events and conferences. As the pandemic continues on the positive side, we have many interesting topics to cover. If you missed some of our earlier webinars, they are available via the FEMS OUP websites. Also, before we get started, I do want to thank the staff of FEMS and Oxford University Press for all their work that's happening behind the scenes in making these webinars happen. Today, we focus on the topic of approaches, methods, and challenges in microbiome research. The continued development of DNA sequencing technologies is providing exciting new capabilities to characterize complex microbial communities in various habitats. This webinar explores some of these developments as well as the challenges in microbiome research methods and takes a critical look at what we can learn about the inhabitants of diverse microbiomes and their possible functions. Key topics include the concerns with reproducibility and controls in microbiome research, the exciting improvements in the Oxford Nanopore long read sequencing technology as well as other technologies, and also the role of microbial rare biosphere. Our speakers today will highlight some of these questions with a critical look at the approaches, methodological advances, and key ecological questions. Bastian Hornum will talk about issues and current standards of control in microbiome research. He comes from the Center for Microbiome Analysis and Therapeutics from Leiden University Medical Center in the Netherlands. Lee Kirchhoff from the Department of Marine and Coastal Sciences at Rutgers University will discuss Oxford Nanopore sequencing for analyzing complex microbial communities. And then finally, Francisco Pasquale from the lab of Katharina Magales from CIMAR, that's the Interdisciplinary Center of Marine and Environmental Research of the University of Portugal. We'll talk about the rear microbial biosphere. After the talks, we will open the session to questions and discussions. So please submit your questions via the question link if you're in the audience. So again, we'll come back at the end with some of the questions and answers that you may have. So with that, I'm pleased to introduce in our first speaker, Bastian Hornum, and we'll discuss issues and current standards and controls in microbiome research. Bastian, the floor is yours. Thank you very much, Max. Thank you very much for inviting me, and thank you very much for organizing this. And indeed, I am going to talk about, as I titled my presentation, Quality Control and Microbiome Research of Foam. I will be mostly talking about controls and a tiny bit about battery effect, but in general, this is what is important. This presentation here is mostly based on this publication, which I have in terms of microbiology, and two other publications from my last labs, and some comments which are left on pubp here. And with this, we are going to get started. I have to say, so I'm talking about quality control, and I'll say quality control is normally never a story of glory, because I don't think ever someone has gone and said, Oh, my God, my positive control has worked out so great it saved the day. No, this normally does not really happen. In general, this is more a story of failure, because the only thing about quality control is actually stuff has gone wrong. But at the end, it's important for everyone. And I will be talking about some of the things where we need to pay attention to where things have gone wrong in the past, where things can go wrong for everyone who is attending this. And in the course of this presentation, I'm going to point out a few publications where things have gone wrong and say, I don't mean to be mean. I hope nobody sees me as like the bad guy. When I point out in this publication, these are these things have gone wrong, because if you read something, and you see something like that, and you see the error, and you say this could have been prevented. It's not nice. Per education, I'm actually a bioinformatician. And if in computer science, if you talk there about quality control is often is kind of funny. As an example would be also often given in the journey of 16 jet fighter, in a simulation, it turned out it would fly upside down after crossing the crater because of some error in the map. Or also the mass climate orbital didn't reach must at the end, it got lost into a wide of the darkness of the universe due to a conversion error between metric and imperial methods. From these examples, what we can see, quality filters can be potentially dangerous, because in case of xf 16, if it hadn't been found in a simulation, it would have killed the pilot. And for the mass climate orbital, it would it was very costly. Obviously, in microbiome research, we don't really have often the problem that something is really dangerous, because we are doing really basic research and not really applied. But what we can definitely see is can be very costly. And also important at the end, nobody of us wants to do that. Everyone wants to do good science. And also nobody wants to actually have a name attached to that science. So this is why this all here matters. And I'm going to start now off with the very first example. This is this publication. Very well known. Obviously, good, your journal science translation and medicine is about the placenta microbiome. title says that the placenta have a unique microbiome. The publication has as of a few days ago decided more than 700 times. So it's a publication everyone of us would love to have. And the problem with it is that it might just not be true. Two years ago, this publication came out and actually refutes the whole thing. Already for the original publication, it was that some people said, there could be issues, there is maybe nothing, but it has been going on on that being as you saw more than 700 publications citing this. So people have been working on this. And now two years ago, it turns out, it might just not be true. In the original publication, it didn't have sufficient controls. All the research based on this might have been complete ghost research. Nothing about how much time has gone in there, how much money has gone in there. People have my PhD on something which doesn't exist. There are professors who might have researched lines based on something which doesn't exist. Money wise, there might have been thousands, thousands, 10,000, probably millions of of euros, a lot of dollars what other advocates might have gone into research, and it might just not exist. And when there's issues, there's also this you don't get this out of the world, it's hard to spread this news that things actually might be wrong. There's similar also to other issues that you have in the publications, like a fraud or retracted papers that you don't get the information on. In this case, we have like here four publications also in good terms like gut or cell coming after this publication 2019 coming after this 2021 and 2020. And for this one in cell and actually the last issue, there's also a big discussion about this, like people pointing out might not be real. And the answer to that. So this is, is a complicated area and not easy to deal with. And now if we think about it, so what causes this in older times, so when we always have started, there was just no awareness of this. Also, at the beginning, it was mostly thesis. And when you say, Oh, how do you contaminate your thesis? I mean, this is pretty much one of the definitions of dominance. And other things that are controlled can be complex. Positive controls are not really always suitable or available. This had to affect for my review, I went through the microbiome journal and the isthmid journal in 2018, I went through all publications and counted 30% of the publications use a negative control 10% use a positive control. Not in all cases, it's really clear, it's actually also suitable what they did. Because I was like, I'd be used to the controls here if you don't give me details, I cannot judge. At the end, this is not good scientific practice. This is not how it should be. Just think about it, you would do any other experiment, you would run a PCR, you would do a cloning, you wouldn't use controls, you wouldn't get this published. And here, because people are not aware of this, it still happens. It shouldn't happen. And this is also why I was talking about this. If you now think about the first part, contamination using negative controls, many causes are known, many expect many are expected and you know it yourself, if you're in the lab, and you sneeze in your ebbedolf, you know, clearly, there'll be stuff in there, which shouldn't be done. Some things are not widely known, going to talk about this. And also for some microphones, it's really hard to say what is contamination. If you would do like skin microbiome on the hand, what is contamination? If you do like your mucus microbiome in the nose, what is contamination of these places? And the most important publication in this area is this one from Susanna Salter published in BMC biology, I think, is about the ketone or radiation microbiome. So your reagents which you have, your extraction kits, your polymerases, everything might have its own microbiome. Not a lot in it, but there's some in it. They termed it the ketone because it comes from the kit. And it's not constant. It is batch dependent. It depends on the time and the day and what kit you have. And it's mostly an issue of low biomass microbiome. What they did in this publication is they grew salmonella bongori. This is this black bar here. And then they took a pure culture, diluted it and diluted it and diluted it and diluted it and diluted it. And FBA and extracted DNA from it and and sequence it. And you see at the end, like after five, five dilutions, you wouldn't guess that this is a pure culture. After five dilutions, you see all different organisms coming up. Because in this case, you really have very little biomass of the five dilutions. And you see what other what other things are in this area in your kit. And this is not an artifact. This is one lap, another lap, another lap. In all cases, you see the control has microbiome. And if you diluted just hard enough, you find microbiome. And finally, in your control, you might actually find your original. So this shows there is a lot of things which are dirty. You can't rely on what you have is clean. And there are more points of contamination there. Like I said, the researcher, if someone of you does MS, you know, whenever you do MS, you have keratin as a control in there, because you can't work without skin getting here somewhere. You have environment. This is from our publication, we have three different negative controls here. So one kit, another kit, another kit. Okay, as I told you, the negative controls all look different, because the kids are different. But what comes from the environment, we're pretty sure is this lock here. As you can see, this is Clostridium Clostridioridus formerly known as Clostridium difficile. Our lab where we did this is a Clostridium difficile. Clostridium difficile is is a sport former candy around in the environment. In this case, it just dropped apparently one of our negative controls. It's the only time we saw it didn't come up again. But we're pretty sure it comes from the environment. And as last point, it can be your tools. In this case, sterile paper points did have DNA in it. So let's sterile was not sterile. And it goes forever. All the doing sequencing, you can have the same issues. Now you can have cross contamination there, like you do is you sent your app and talks to a sequencing provider, or well, whether you put the app in the samples on a plate, 96 welfare, or you do it yourself. And what will happen if you just by piping it in from one well to another one, it will splash over. So you would expect you have a control a low biomass sample, a high biomass something would only find these things in there. But no, your control organisms might end up in other samples, your high biomass sample might end up in other samples all around. And also the sequencing reagents might be an issue, and you might get index hopping index hopping as you throw in your sequencing primers for this sample for this sample for this sample. And because you have don't have much DNA there, remaining primers will end up in there. So you have lots and lots of ways where things can end up from one sample in the other, and you might just know. So and you end up you just you can't prevent all causes. If you don't think about what else could you do, like for example, for anything that special requires access, you need to have access to 96 welfare where it gets sequenced. In our case, we could walk over to a sequence provider, but if you send it to China or Korea or just anywhere, you can't. All your sample might come might cross contaminate even if you follow the best practices. And at the end, it might be difficult to find the exact cause. Despite all of these problems. Still, you need to use something like if it controls so like just a swap, which you use extracting a format. You need to use extraction controls just to see that that the kit which you actually use doesn't really like what's in there. When you doing the sequencing, just send water to your sequencing provider without anything, see what ends up in there. Also don't mix biomes. Don't send gut samples together with saliva samples or something. It might end up in other because gut has more or not more content. The gut might have the saliva samples and you might not be able to figure this out. And at the end, you might get the whole lane on the machines of it. Nothing else gets sampled with it. Else you might end up like one of my colleagues who had a very low biomass CSF viral and she ended up purely with cucumber. So you might need to just say only our samples please and you might need to pay more. Or at the end maybe you need to get your own machine. That is really the worst case if you're dealing with really, really, really low biomass samples where really, really nothing can come in there. Like also, for example, ancient DNA, these things. Probably for most of us not relevant but it could be. And for this part we are now, I'm not finished here for the negative controls but it's not still not finished because of my but negative controls is not about the positive ones. Also here you need to think. Did you sequence everything you wanted? What about the DNA extraction efficiency during positive versus during negative? Did you use right primers? Did the sequencing process go without any errors? As a negative example I have this publication here. In this publication they used for the 16S sequencing. They used wrong primers which made them miss one of the major members of a microbiota. So they did skin samples and they didn't get cuticardium acne bacteria. They didn't use the correct positive control. Sorry I changed that in a second. They didn't use the correct positive control. So they didn't notice that they're actually missing a major member of their microbiota until they get pointed out during peer review. So this is scientifically bad. It's bad for yourself. It's embarrassing at the end because you should have seen this. You really should have noted that this, I don't know much about my skin right now, but this is one buck which I say this one should be in there. And you think about it and you deal with it and you don't notice. You don't notice that you use wrong primer. You don't have control. This is everything has gone wrong there. On the other hand this is an example where things actually have gone really right. In this example I think this was some rare C microbiome. It really doesn't matter actually. In this group they always run positive controls. They make their own. In one or runs they had attacks on this thing. They saw on the positive controls. There should be something there, but it isn't. They couldn't figure out why. So they took the same example of the same extract DNA. They ran it again. The attacks didn't appear. Costs completely unknown but it shows that it's important to always have to control it because even if you said we tested this before this works that could something be happening and something could be missing and you don't know why. We have also here an imposter with a positive control of some issues. One thing is you need to know your community. If you want to see that your bacterium is missing, you need to know what that is in there. With a new community you get a new microbiome. You don't know. And then you also do you don't find necessarily suitable controls because they might not be available. Then also promise the rest of microbiome can influence it. In this case this publication is not about controls but it's about how lichens in your in your community can affect how the DNA is extracted. So even though you know maybe what bugs are in there, what other things which are in there, what lichens, the proteins and metals, whatever could have an influence. Then it might turn out that the control themselves might not work as you expect. These are here two graphs from also one of our publications. We used two controls. One working well and one not so working well. If you look at this here, so this is a control, has a few, has eight bugs in it. And you look, this is how it should be and these are all the samples that you have. One is a purely pre-extracted DNA from the provider and these are pre-protocols which you can see. There is some variation here also the green, but overall this looks pretty similar. If you now go over here, this is a different control. Also we bought them both commercially. This is from Simo Biomics, this is from ATCC. We bought them, didn't do anything with it. And here, okay, you see one, apparently we've swapped one of our samples. That is also, sorry, there's also a thing you need to consider to make sure that you know where your samples are. But what you can see here is this one, this here is how it should look like. The actual samples do not really look like. And you wonder like, why do they not look like? One thing there, so if you had to look all the deeper, this green bar here is missing from our samples. It's again the cootie bacterium. As I just mentioned before, yeah, it turned out we used the wrong primer. Yeah, these things happen. Didn't notice before, we had a control, we noticed. But independent of that, we did some other experiments. And our technician at this time actually thought, okay, this sample, this control has some nasty bugs. And it's like a bug about money or about Sylosiris. Let's have a look at actually after DNA extraction, we can use, we can take out the samples out of a BSL because they're all dead. So what did you do? She took the samples after DNA extraction, cultured them. You would assume after DNA extraction, everything is dead. Nope, it's not. The cootie bacterium, which we are missing in the sequencing, we actually recovered it blue well. So this is one of our controls. Actually, it was in there because you wouldn't even know the sequencing product might have forgotten it. They might have forgotten to put it in the control, but no, it was in there. And now you also maybe think, okay, maybe more or less efficient extraction has an impact on how this distribution is different from this one. And well, you will laugh. The only other bug which we recovered growing is this one, the blue one, the Sylosiris. So it was not so that the bacterium, which is very little here grew a lot. No, that one was dead. But the Sylosiris, a lot of DNA, it grew. So who knows what makes a difference between these two controls might be the other things. It's hard to say. Also here, despite all these problems we talked about, also the recommendations, use extraction positive controls. Don't get embarrassed by reviewer telling you you messed up. Also use sequencing positive controls is available. Also use multiple, in case you're really doing something really new, use some bulk. I think you might need to consider making your own mock community. This could be really tough if you're working on something really exotic. And I'd say there is no publication out there for how to make your own mock community. So if someone was working on that, I think would be very useful to have. And then after more general recommendations, you use common sense. Are you missing something critical? Have other people recovered what you haven't? Why? Is the CODI bacterium from your skin missing? Why? Or do you found something unexpected? Recently, some publication, gut microbiota, they found in the gut microbiota oxy-photobacterium. I'm not an expert on this, but if I find oxy-photobacterium at a place where there's no oxy and no photo, you need to ask questions. Because both of these things can indicate issues. And last point, so there are more, there are open questions to this. Besides all of these problems and things which I haven't even talked about. What about a sampling positive control? Like, do the bacteria like in my mouth actually stick to my swab? You don't know that. And there's no way of figuring yourself right now. What about negative sampling control without material? Like, if you normally have poop, for example, what do I do? What is my negative control? If I don't have material? Do I take the poop and sterilize it? Just, I don't know. What about a positive environmental control? Like, how does the other things in my material affect my positive? How do other metals look like us? You don't know. What do you do about it? And the last point here, pay attention to all publications. All publications might not have paid any attention to any of this. So if you see something strange here, think about it. And this leads me to the acknowledgments. I would like to acknowledge my boss at Kaopa who started all his research in this lab and who I'm very grateful for being a very, very, very good boss and would give us all the freedom to do whatever you want. It was very supportive of everything. My colleague, Romy Sweeting, who I worked all the time with and who is very pleasant and very competent. Our technician, Anu Chelen, who did all the DNA extractions, who is also very proactive and very good technician too, and Clinton Dukamon, who did all the work for other publications and figured all of these things out and all the rest of the team. And some short advertisement. Our technician is moving to Norway to fulfill her dream and is looking for a job. She literally knows her pieces and if someone from Norway is listening, she needs a technician. Now back to the questions which I have. Maybe someone else from a, I hope someone from an audience will have some questions too. Some thoughts if someone else has problems, anything on their mind. Let me know. I'm happy to discuss one. Thank you, Bastian. Very, very interesting. Left us with a lot of things to think about and I'm sure there's going to be discussion later today, but also I think importantly continue discussions as we move on with new technologies, new types of samples, and so on. So thank you. One of your comments was or about getting your own sequencer and indeed our next speaker, Lee Kirkoff, will very much tell you about how you can do that. The handheld Oxford Nanopore sequencing. So Lee, you have the floor. Let's hear about what Oxford Nanopore has to give us. Thank you so much. First I'd like to thank Max for the invite and all of the people at PHEMS and OUP who have been organizing these webinars. It's delightful having the opportunity to talk science during the pandemic. And one of the other things I would like to do is just point out some of the co-authors and collaborators that have been working with us as well as the funding agencies and the different organizations without whom the data I'm going to be showing you just wouldn't be possible. Now most of us all know that we need to use molecular tools to profile microbiomes and because the shapes are not sufficient and there's trouble culturing everything that's out there and there's a whole variety of different biomolecules we could be using but clearly the bulk of all the work that's been done has been looking at short segments of the 16S ribosomal RNA gene. And what I want to do is spend, you know, my time talking to you about a different sequencing approach, something that's somewhat different. And this is a method that is fast. It can be cheap. Very importantly it directly provides the sequence information and so that means that you can actually read the DNA sequence off of the molecule without making or synthesizing a labeled copy. All the methodologies prior to nanopore sequencing made some kind of labeled copy or captured a signal when the copy was made. Here it's done directly. So a lot of the issues that Bastion highlighted get minimized. And I can show you some of that maybe afterwards at the panel discussion. What's nice about nanopore sequencing and our approach is it doesn't really require an extensive amount of training. It's DNA extractions, amplifications, and blast searches which a lot of us can do. If you're careful you can make it quantitative. And if you're interested I can point you in directions where we demonstrated that a number of years ago. But the really important thing is that this technology is transportable and that means that a lot more microbiome research can be done in the field. And what I'll be talking to you about today is this Oxford nanopore minion. It's the world's first handheld DNA sequencer. Here's how it works. There's a series hundreds or thousands of protein channels get embedded in a non-conducting membrane. And these protein pores can interact with unwinding proteins that you attach to your DNA when you're creating a sequencing library. When voltage is applied across these membrane the unwinding enzyme begins to thread the DNA through the protein channel and generates a current. The important thing here is that if each of the nucleotides have a slightly different electrical profile that is they make a different current it's possible to take this electrical signal and convert it into the base pairs. And what makes this machine so small and inexpensive is you're just measuring the current across the pore and that's done actually very very easily. There's software that the company provides that does the data capture base calling, ways of analyzing it afterwards that are really easy. My lab we prefer to use a commercial DNA software analysis package. Most laboratories have this. So the minion became commercially available back in 2014 and at that time the sequencing accuracy is how should we say less than optimal for all of our needs but very quickly the chemistry and the base calling algorithms changed and by 2018 raw reads of DNA on the minion were 85 to 90 percent accurate. And that's changed pretty significantly just within the last couple of years. This is data from last week's nanopore community meeting that Oxford runs and you can see that the sequence accuracy has gone from 90 to 99 percent and with some of the new kits and approaches we should be at 99.9 percent shortly just within the next few weeks. And that's for the raw read accuracy. Okay the other thing that the minion is capable of doing because you're not synthesizing a copy you're just threading it through a pour is you can generate very long sequences and so this opens up the possibility now to not just sequence a small portion of the ribosomal RNA gene you can sequence the full operon and so there are advantages to doing this. One is that essentially with the same amount of effort you generate 10 times the sequence information usually these operons are about 4,000 base pairs versus 400 for the traditional approaches. Also because you're doing the entire operon you're getting both the ribosomal RNA genes but also this ITS region and that region contains strain specific information. And finally it's really relatively easy to characterize all of these amplicons that you generate by doing BLAST or the software that Oxford provides they have a little package called What's in My Pot WIMP that you can use to characterize things. So of course we're not the first ones to do this that honor goes to Benitez Páez and Sands who published this approach initially in Gigascience but there are a number of laboratories that are closely following along because of the power of this methodology. So there's a number of questions though that we'd like to kind of get at using operon profiling. Number one is can we easily achieve species or strain level resolution using the Mennon in this long read approach. Can we develop some kind of screening method that would provide this level of resolution in the field where we don't have access to the internet where we might be trying to sample exotic microbiomes. And then finally is it possible to somehow combine this information with other data or methodologies to truly link phylogeny and function and increase our ecological understanding. And so I'll spend the rest of the time talking about our kind of approach to do this. And before I get into it I just want to explain why are we interested in strain level resolution. I mean aren't these things strains highly similar and it turns out that all the pan genome studies kind of indicate that that's not truly the case. And I just want to point out one here where they isolated or looked at data from 3100 pneumococcal strains collected at four different geographic locations that you can see at the top. They sequenced the genomes of all of these strains or pulled them out of the database and they found a total of 37,000 different genes just within the species for all the different strains. But the important thing is when they looked at the core genome those genes that are shared by every member of this species it turns out to be relatively small. You know roughly about 15 percent of the the genes they could detect were part of the core genome which means then that 80 85 percent of the genetic potential and the functionality resides at the strain level not at the species level and not at the genus you know our family level. So it seems critical if we really want to understand microbial ecology that we begin to generate methods that provide this level of resolution. So how do we go about doing this? One of the things with our partners at the U.S. DOD was to begin to assemble a ribosomal operon database. And once again we weren't the first to do this that honor again goes to Benitez-Piaz and SARS. So what we did that was a little bit different was we included the strain information in this operon database and we did a level of quality control that was really pretty stringent. If you just try to retrieve operons from the genomes in RefSeq at NCBI you get about a million of them. But it turns out that most of them are either too short they have very long strings of ambiguous bases because ribosomal operons are often in multiple copy highly conserved and really difficult to assemble using short read approaches. So this is where all the genomes break apart. But there are about 300 000 operons probably more now that meet a criteria of about 4 000 to 6 000 base pairs and don't have long strings of ends. And this type of operon database is what we'd like to do to see if we can use the Minion data to detect strains. So here's how we would do something like this. We pull out a couple of sequences from this operon database and mutated in silico to try and mimic what a Minion read would look like. And the idea is that we could use this, excuse me, this historic accuracy roughly 60 to 90 percent sequence accuracy or something else. And we'll work out some BLAST parameters that provide us this level of resolution we want. And then we would go back and test it with multiple examples from the database representing different filen, things like this, and even some environmental samples to see if we can pick up the resolution that we would like. So here's one of the parameters that we're trying to constrain our search with. And that is taking a set of operons that we generated from the human respiratory tract. And to take a look at about 150 000 of these BLAST them against this database and just get a sense of what the data would look like. And what you can see here is that the bulk of the data has alignment lengths between 1000 and 5000 base pairs from this database, which is what we want. You know, ideally you match the full entry within the database at about 5000 base pairs. And what you can also see is that the percent identity for these ranges from about 70% to about 95%. That's the bulk of the data. So this is the range we want to kind of work with. And so the idea is kind of shown here. I picked, as a marine scientist, two different strains of a Vibrio species. Vibrio corallelidicus. So strain 58 and strain snuty. And then we did this in silico mutation. And I know you can barely read this. They're all stacked up together. It gives you a sense of the gaps that are generated sometimes during sequencing or the mismatches. The important thing is shown here with the red boxes. And that is we can generate in triplicate essentially mock reads of this database entry that range in identity from 70% roughly to 95%. And we can screen these back to the database to see if we actually get the match to the original entry. And what we're trying to do is to achieve the signal to noise here. And so you would be getting some kind of result, a hit table back that would look something like this. If you look at the red oval on the right, it's showing the exact strain that we start with being recovered and the identity of the different entries on the left in the green box. And so roughly from 80% to 95%, you're always retrieving the right strain from the database, pulling out the signal from the noise if you have the blast settings right. Now, if we look at a bunch of different examples of that, that's the results of this type of analysis are shown in this heat map. And what you can see here is if there's a green box, all three mutated files were correctly identified at the strain level. And if it's red, we didn't get a correct identification for anything. And so what you can see is that roughly at 84, 85%, almost every one of the mutated strains still picks up the mutated file, still picks up the proper strain when doing a blast search. And it doesn't matter whether there's only 11 competing entries within this 300,000 database or 400 competing entries, with multiple copies in multiple strains, you still pick up the exact match to the strain that you start with in mutate. So this becomes potentially very, very powerful. And so what we want to do then is go back to natural samples and try and see can we essentially do a blast search and by doing some kind of quality control at the results, determine whether or not we can pick up strain. So if we go back to this human respiratory microbiome and do a blast search with the right settings, almost everything returns a response. We get a blast hit for like 95% of all the reads that we put in here. And if we ask this criteria, are we getting long alignment lengths? And are we getting at least 85% or 84% identity? The bulk of the returns actually fall within this category. And so that means then that as a quick and dirty approach, you can identify strains as long as you look at the quality of the blast hit and the match within complex environmental type samples. But if you go to other environments like here, Max and I had the opportunity to do the marine, some marine sponge microbiome samples from the Great Barrier Reef. And once again, we got blast hits for virtually all of the samples that we loaded up. But when you look at the strain level criteria, it drops off precipitously. And that seems to be a reflection of the difference between the biomedical world and the environmental microbiological world. Of the 150,000 entries in the operon database, most of them are biomedical samples. Very few samples truly are coming from environmental sites such as marine sponges. So hopefully things will get better as the database increases. So in the last few minutes that I have, let me spend a little bit of time talking about a study that we've done with Minimanisto at the Natural Resources Institute of Finland, and with Max in Kilpisjärvi in Finland. And we were interested in trying to use this technology to see whether we could discern differences in samples with very close proximity, but very different physical conditions. And that is places where there's a blanket of deep snow during most of the winter versus adjacent sites that are essentially windswept. And temperature loggers had shown that when areas that were covered in a thick blanket of snow, essentially experienced very mild temperatures during the course of the Arctic winter, only dropping down to minus one or minus two degrees, whereas the windsweb sites got down to close to minus 15 or minus 20 during the winter and cycled back up again during the summer times. And so what I want to just show you quickly is the results from screening the acido bacteria that we can detect at this site. This is a bubble plot just showing the windsweb sample compared to a sample that was collected in the Tibetan highlands. But I want to draw your attention down here to an Adapha Bacter like a Nicola strain. In these finished samples, we can see both of these strains existing in the site. And we'd like to understand how is it that these organisms can coexist. And so we have M8 UP30 as the dominant one and M8 UP22. So what I'd like to do is to remind you that this operon database is based on genomes collected in RefSeq. So if we can detect a ribosomal operon, we also have access to that microorganisms genome. It exists in the database. And we're not really predicting function based on ribosomal RNA phylogeny. The best blast hit that you get is directly linked to those genes. And they're presumed functions after annotation. So essentially there's a wealth of information that's buried in this. And I just want to put out a caveat here that ultimately you'll be very prudent to try and verify whether your best blast hit really represents detection of that particular strain. Sorry. Or whether it represents a novel species. But that could be done using consensus building and phylogenetic analysis, which we could talk about later. What I just want to show is there were six different strains in the database. And we can stack up their genomes very quickly in genus. It only takes a few minutes. And these areas in green are places where the M8 UP30 have unique genes that are not present in any other strain. And so here's just an example of a few of these. There are genes that are involved in the breakdown of a rabbit nose or round nose, utilizing catacol or polymers like poolulin and even things like urea. And so this provides a more focused approach to try and understand the mechanisms that are allowing two different strains to coexist or to help in terms of isolating new organisms or things. So I'm done. Let me just conclude by saying that the minion and this operon database can achieve both species and strain level resolution for the microbiome. Any hit can be directly linked to a complete or partial genome in RFC, but you can also generate mags from your samples and verify that link. The low cost of the minion, $1,000. Its portability, its standalone nature can allow for microbiome studies in remote sites with very little infrastructure. I certainly second the notion that Bastian mentioned that, yes, you can get your own machine. And then finally, as more genomes or ribosomal operons get deposited in GenBank, our abilities to be able to rapidly identify strains in the field will only improve. So with that, I thank you and look forward to any discussions. Thank you, Lee. Indeed, there are already some questions regarding the methodology that we'll get back to them later and move on to our last speaker, Francisco Pascal. And again, he's going to discuss the rare biosphere, so expanding from not just the ketome concern, but of course, organisms that in many habitats are in low abundance and what might they be doing or what might they be waiting to be doing. So Francisco. Thank you. So my name is Francisco Pascal. I am a PhD student at the University of Puerto in Portugal. And my research is focused on many microbiomes, but with a particular emphasis on the microbial rare biosphere. So this webinar will be about, will be centered around the article, the microbial rare biosphere, cream concepts methods, necological principles. This webinar was prepared by myself and by Catelina Magellanes, which is a co-author and corresponding author of this article. She will also be available in the discussion section to answer questions. So first of all, I also want to thank the famous microbiology ecology for the invitation to participate in this webinar, which we appreciate very much. So in this webinar presentation, I'll start by explaining what is the microbial rare biosphere. Then I'll give you some reasons why we think it is important to study the microbial rare biosphere. And I will try to provide you with the big picture of the methodologies used and main shifts in the field. Then I will go a bit deeper into some logical aspects, specifically on the definition of microbial rare biosphere and then other aspects regarding the taxonomy and community composition of microbial rare biosphere. Concerning these two points, I will also provide some real data from our own research, some of it not published yet, but it can be useful to illustrate some of these ideas. So my first point is what is heretic and what is not in the context of environmental microbiology. So usually, when we talk about rare biosphere, we are referring to the tax with low relative abundance. And you can illustrate this using a rank abundance curve. So if we can order all species from the most abundant one to the least abundant one, we can see that we have a few species with very high abundance. And then we have many species with low relative abundance in the so-called long tail of the rank abundance curve. This is a visual representation of the rare biosphere. But first of all, heritage is a statistical concept. So we are saying that one species is hair in comparison with another and specifically hair because it has low relative abundance. Usually it is calculated per sample, but not always. But maybe more importantly, heritage is not frequency of occurrence. But at least it is not that in this particular context. In other contexts of ecology, sometimes people do look at heritage from the perspective of frequency of occurrence, but not in this particular case. Heritage is not an intrinsic biological property of the species. So when we say that one species is hair, we are saying it is hair in comparison with the others. It is not a property of itself. But of course, if one species is hair in a given environment, there might be some biological process explaining why it is hair at that specific moment. And finally, heritage is not defined as singletons in the context of molecular surveys. Some studies simply remove their singletons, others do not. But even if you remove the singletons, you will still have several types that have very low relative abundance in comparison with the others. And finally, the main idea that I want to give is that one species is hair in comparison to another abundant one and not hair by itself. And to explain this a bit better, I will use an oversimplified conceptual exercise. So imagine that you have three samples. And in those three samples, you have different microbial communities. You can see very easily that the star is clearly hair. It has the lowest abundance of all. And you can also see that it is present in all samples. So although it is hair, if you look at it from the perspective of frequency of occurrence, it is actually very frequent. And the same thing goes for the circle. The circle is present in all samples. But contrary to the start, the circle is very abundant. So it is abundant and frequent. And finally, you can offer, for example, the triangle. The triangle is the most abundant one in sample one. So it is abundant. But from the point of view of frequency of occurrence, it is quite infrequent. It was only found on one single sample. So now that I explain this difference, we can move on and try to understand why it is important to study the microbial hair biosphere. Now I will try to provide you three simple examples of its potential relevance in the ecosystem. So first of all, it can be useful in response to perturbations. You can imagine that you have some sort of hair species in the environment. And that hair species has the ability to degrade a specific pollutant. As that pollutant enters into the environment, it might be degraded by that species because it has the ability to do so. For example, it might degrade the pollutant to produce energy or something else. And by doing so, it can grow and become abundant and degrade completely the pollutant. And then it might go back at being hair. And you provide in here one example. And we have another view which focuses specifically on these sorts of situations if you want to know what about that. Then you can also have hair biosphere members that, despite being hair, have high activity for very specific reactions. And this might be useful for some specific biogeochemical cycles. Again, I provided you one specific example. And finally, the third example is the host-symbion interactions. So imagine that you have, for example, a sessile host in the ocean. It might have one symbion that is very abundant in that host. And then it might be detached and appear as hair in the nearby waters. And then it can be transported as hair into a long-distant location, for example, by a mobile animal like a fish. And in a new location, it will still be hair, but it might find a new host and grow abundant in that new host and then play a specific role. So for the example of the sponge microbiome, you have this article. And if you want to look deeper into this particular case, we recommend this review. So overall, the hair biosphere, despite its low abundance, might still play different sorts of roles in different contexts. So that's usually why we think it is actually relevant. Now, in terms of the topics that are addressed by the studies focusing on the microbial hair biosphere, what we see is that these studies are very much dependent on methodologies used. That's why I say that historical tendencies follow the methodological advances. And so most studies focus on taxonomy and community composition questions of the hair biosphere. And if you look at it from the perspective of Reykjavínius curve, so again, this is just a conceptual exercise, if you use molecular methods like 16S and re-re-ampli-consequencing, you can easily see that the higher the output of your sequencing, the better the coverage of the lowest abundance species or of the long tail of the Reykjavínius curve. But importantly, despite the fact that we focus so much on molecular methods, you should bear in mind that if you use cultivation dependent methods, you can also identify either hair or abundant taxa from the environment. You simply cannot say that they are hair or abundant without profiling the entire community, but you can identify them with cultivation dependent methods. And you can even have the reverse. Sometimes some species, you cannot identify with molecular methods, but you can identify them with cultivation dependent methods. And I provide you one example. But in recent years, there has been a shift in the literature from the community composition aspects of macular hair biosphere, which functions. So we also want to know if it is active or inactive, what genes are there, are they being expressed, and so on and so forth. This has been mainly supported by the rise in omics approaches. So now we can have metagenomics, metatronomics, and so on and so forth. I put in here in bold metagenomics and metatronomics, because those two approaches have already been used in the context of the hair biosphere. They also put in here one note, which is the fact that all of these different functional questions and aspects of either microbial communities or single species and so on and so forth, they can also be addressed with traditional methods, cultivation dependent methods. So that should also be taken into consideration, and we should not restrict ourselves solely to the omics approaches. I also highlight here the possibility of having metagenomic symbol genomes. They can be particularly useful if you are interested in candidate filler. But in any case, there are many studies approaching all of these functional questions. The point in here that I want to make is that the challenge is to make a connection between this long tail of the handkerbiness curve, this hair biosphere, and its functions. And this challenge is not trivial at all. It's quite complicated. In any case, some studies have already attempted to make this shift towards the community composition description to its actual functions. And we provide some examples in the review on which this webinar is based on. In any case, in our own research group, we are focused so far mostly on taxonian community composition methodological questions, and not so much on the functional aspect, but we hope, at least I hope, to be able to participate in that shift during my PhD. In any case, we are going to focus the rest of this webinar presentation more on this side with some case studies. So the first question that I bring up is on the inconsistency of the definitions of the microbial hair biosphere itself. So after reviewing several articles that focus specifically on microbial hair biosphere, I found that actually around 45% did not specify how did they define the hair biosphere. And from those that did so, the vast majority used relative abundance thresholds. So basically what I was addressing before the hair biosphere is understood from the point of view of relative abundance. Then it might be calculated in different ways. And some studies use, sorry, some studies even use several definitions instead of one single definition. So some studies distinguish hair tax, very hair tax, and so on and so forth. Now, if you look specifically at the studies that use relative abundance by its definitions, we see that most of them calculate the relative abundance by sample. But more importantly, we see that there are several different specific thresholds being used. The most common ones are 0.1% and 0.01%. So what this means is those studies are considering that one tax is hair in one specific sample, if it has less than 0.1% relative abundance. It simply means that. Now, why is it a problem, the inconsistency of definitions of the hair biosphere? So because the hair biosphere is a statistical concept, it is very much dependent on its own definition. And so we wanted to address this issue a bit deeper. And to do that, we used three different data sets that are publicly available, and they were produced by independent research groups, and they are from three different biomes. So we have a marine, symbiotic, and a terrestrial biome. And one question that we did was, what was the difference in the distribution of the prokaryotic hair biosphere species richness? And in this case, also note that we are using specifically 16S data and four prokaryotes. But you could also do this for other biological groups. In any case, if we compare the original community or the total community with the hair biosphere defined as 1% relative abundance per sample, you see that there is not any sort of difference in the distribution of species richness. The values are a bit lower, as expected. But essentially, the distribution is the same. Then if you look specifically at the two most commonly used definitions, 0.1% and 0.0%. We said distribution of species richness is different, and the values are also quite different. And so if we have two studies using two different definitions of the hair biosphere, a direct comparison between them is actually complicated, because if they use different definitions, they're actually referring to different things as the hair biosphere. Generally, what we advise is that studies should explain objectively how they define hairity and why all of this data is available, and I provide you here the information. Besides the distribution of species richness, we also took a look at the community composition, and when we calculated the progress errors between the ordination of the hair biosphere defined as either 0.1% or 0.01%, we also find a shift in community composition. Now, another question is on the actual methodologies that we use to survey the taxonomy and community composition. So the most commonly used approach is the empty consequencing of a marker gene, for example, 16S, and this approach has some advantages and disadvantages. So on the downside, it might be biased to the primers that are being used, but on the upside, it has a very high throughput. So it is the best approach if your objective is to identify low-breedness sequences. That is the reason why most studies on my hair biosphere were based on empty consequencing of marker genes. Another approach is to sequence all DNA without empty consequencing. Well, this approach is not so good if you want to identify low-breedness sequences because you have lower throughput for the marker gene with taxonomic interest, but from the point of view of taxonomic diversity, you have a higher chance of identifying a candidate filler, for example. And we saw precisely this in our own data of the RT quotient with the NIS dataset where we compared ecological replicas by using either approach. So 16S error and empty consequencing are total DNA sequencing. The needed taxonomic profiling of this environment using either approach, we clearly identified more candidate filler with total DNA sequencing than with empty conversion based approach. More specifically, we found 26 prokaryotic hair biosphere candidate filler using total DNA sequence and only five using 16S empty consequencing from the same initial environmental samples. Another question that has risen in recent times is the utilization of either empty consequence variants or operational taxonomical units. So I will explain them very briefly with a dummy but simple example. So you can imagine that you have six sequences and some of them are quite similar, but they are not 100% equal to each other. And so they'll be considered six different ASVs. But if you cluster them based on a percent similarity, for example, 97% or another threshold, for the same number of sequences, you can actually get, for example, three auto use. We also wanted to address this issue in terms of the microbial hair biosphere. And when we processed for the same whole sequences into ASVs using data two or into auto use using the magnified platform, we found a higher number of sequences attributed to the prokaryotic hair biosphere using the ASVs approach than in using the auto use approach. And then the same thing for the species richness of the hair biosphere. So using either ASVs or to use, we were able to access the microbial hair biosphere and in this particular case, we actually have higher alpha diversity with the ASVs than with the auto use. And also another maybe interesting question is on the quantity of material that you sample from your biome. So if you are talking of marine microbiums, which is the particular case of our research group, we can ask what happens to the microbial hair biosphere if we increase the volume. And so you can think that we want to describe the microbial community at this point in a marine environment. And what is the difference between using one liter, 10 liters and so on and so forth up to 1000 liters. To test this, we are using the NMOS dataset which was tailored specifically to compare physiological variables. And a bit surprisingly, from as low as one liter to 1000 liters, we did not identify a very clear pattern of species richness at level of the microbial hair biosphere. So we were more or less expecting to have a steep increase in species richness because of the drastic change in volume. Then even when we divide the volumes according with size fractions, because you separated the cells by their size, we also do not see any clear pattern. Although we do see that some size fractions have higher species richness for the microbial hair biosphere. But more interesting is that if you look at the proportion of hair tax to total community, that proportion did not change at all from as low as 1000 liters. And this in practice means that in this dataset, and bear in mind that we used auto use in this particular dataset, but in any case, in this dataset, we found that around 89% of auto use have less than 0.1% relative abundance per sample. But besides that, we can ask, why is it that the proportion did not change from one liter to 1000 liters if this difference is so drastic? Well, that is because like I said in the beginning, the hair biosphere is essentially a statistical concept. So of course, with more volume, we have more cells, we have more sequences, but the proportion between them does not change. That's why the proportion of the representativeness of the hair biosphere did not change in this dataset. You can also try to see this plot from the point of view of the rank convenience curve. So the hair biosphere occupies almost the totality of the x-axis with the rank auto use. And so very briefly and to conclude, the microbial hair biosphere was already addressed in every sort of environment and for most biological groups. We focused more on prokaryotes during this presentation, but it was already addressed for other biological groups. There is happening a shift in the literature from taxonomic to functional profiles of the microbial hair biosphere. So far, we have not participated in this shift, but we hope to be able to have our own results regarding the functions of the hair biosphere. In any case, this is not trivial at all. It's quite complex. And we believe that consensus on definitions and methodologies is important, especially if you want, for example, to compare different studies. If different studies use different methodologies and use different definitions for the same ideas, then it might be complicated. And finally, also very important, but we did not address this issue on this webinar presentation, is that bioinformatics and other biological improvements are still necessary to distinguish real hair attacks from molecular artifacts. So because of the fact that the hair biosphere is the low it comes from the sequences with low relative units, then if there is any problem, any contamination, any biases, any error in any of the previous steps of the molecular approaches, then there is a high chance that the hair biosphere will be the first group to be impacted. So that's why it is actually quite relevant. But the previous webinar presentations already addressed some of those issues regarding the molecular methods themselves. And of course, those issues will have a particularly higher impact at the level of the low stimulus taxing. And finally, also, we did not address this on this webinar presentation, but we did so in the article. We believe it is important to have a synthesis on ecological principles of macro hair biosphere. Also to compare studies, but because different studies focus on different research questions, so they might use different ecological principles, different ideas, and that is perfectly normal because they have different research questions. But when we want to gather the collective knowledge, it becomes very complicated to make bridges between those concepts. So that was it. Thank you very much for the attention and I'll be happy to answer your question. Thank you. Well, thank you, Mrs. Goh. And again, we'll open up everything for discussion. There's a few questions that are coming in. And again, we'll ask the other speakers to join us as well in here. One immediate point that came in to me, and Francesca, you mentioned about this question of definitions. And I think here, the rare biosphere has a problem. As you mentioned, some of the organisms are actually very frequent. So meaning low relative abundance versus rare are two different things. And I think that's important to realize now. Saying rare biosphere versus low relative abundance biosphere, of course, it's easier to just throw out rare. But I think we have a problem there in terms of going back to the ecological questions of what these organisms in low abundance might be doing. And what there is any comments that you want to expand on this? Sure. Actually, different schools of thought and so on might use similar terms and similar concepts that actually represent different ideas. And I think that also more in general ecology, it was more common to think of heresy from that point of view. But I can give you an example from outside microbiology. Oftentimes we talk of endemic species. Sometimes we have a bird species that exists only on one island. Well, we can say that species is rare because it exists only on one island. But if you go to that place, it might be the most dominant species in that particular island. So it is very important to consider the point of view of the observer. We could also ask, take the point of view of the entire world. Well, from the point of view of the entire world, maybe everything will be here. So it is always very much dependent on the observation in the case of microbiology of your sample. Yeah, there's a comment here from Jeff Shields that the idea of the rare biosphere doesn't change in proportion. It's reminiscent of the principle of constant proportions and oceanography. Okay, there's another question here to Vastian regarding controls. So you mostly discussed qualitative controls in your presentations. This is from Jeff Berthes. Have you looked into quantitative considerations regarding statistical design? I think this question leaves, I would need more details. I'm not sure what is being referred to here. So quantitative measures. Yeah, unfortunately, it is not clear from this question either, actually, any more details on this. Well, actually, well, you showed some depth, I mean, in your controls, you are getting, okay, qualitatively, maybe sometimes all of the members of the community are extracted, our lives are extracted and analyzed, but actually, quantitatively, they look very, very different. Yeah, well, you would hope they don't. But is there, I guess, then a way that you can deal with that in any way? I guess it so considerations what we had in the processing was you commonly know, you have a concern that your gram-positive organisms are less getting extracted than your gram-negative ones. Because harder, et cetera. So if it was like this, which it isn't, you probably could think about applying some correction factor to it, but it isn't why we've seen that potentially the gram-positive are getting, we are missing a bit, it's not that constant. So this is, zero wouldn't be possible to apply like one factor to a thing and say it's fixed now. Okay, no, but it might be possible to either spike a sample. And this has been done historically when people were doing transcriptomics with something that could be detected to show loss during the processing. It's harder with whole cells, like Vestian said, working with something like fecal samples, soils, it's hard to think of a really good thing that you can add. But one thing that I'll just kind of point out, mostly because of my hair color, is that old methodology winds up working very well in the kids, although they're very, very convenient, might have this level of variability that's hard to control. And so early on in my work, we developed quantitative extraction methods, maybe not for everything within the sample, but certainly for subsets of the samples. And it would be really good for the community to check this periodically. If you take a soil sample and just mix it with sand, 50-50, all of your signal should drop, right? By half. And things like this, that, you know, sand that you baked out. But there are some things that it would be a good habit that, you know, I don't think the community is adopting because of the convenience. And the flip side is that, you know, investigating the rare biosphere, the more samples we can process, the better we'll get a handle on the ecology of these organisms that might be very low relative abundance in one place, but high in another. And, you know, this is really important for understanding some of this microbial ecology. And it means that we have to be able to process hundreds and thousands of samples ultimately. Okay, just one comment here in between. This is to Dries Berth and others in the audience. You're raising your hand. Unfortunately, I can't do anything about that. So please type in your follow-up question as well. And then we can look at this later. So we're a bit restricted on what I can do with your hand waving in the background. I'm sorry. And again, there are questions that I know we won't have time to get into, but we will then share these with all the speakers and they can get back to you separately after the session as well. So actually for Lee, are there major challenges for the nanopore sequencing to replace the current standard short-tread applicant sequence for microbial community profiling? So what do you think is going to be the challenge for replacing alumina or other methods? That's a really good question. Partly, I think it's this preconceived notion that it's just not that accurate or that you can't, if you have noisy data, you can't do anything with it. And what I hope I show this is that you can pull out the signal to noise. People that have worked in marine environments, physicists and stuff have been doing this for decades. So that's one thing. Just this perception, oh, it's a toy. It's not real. We need to send it out somewhere. And no, you can actually do all this in-house. The biggest thing is probably looking at trying to generate either the longest amplicons that you can. Most of the methods that we've used give us very high quality DNA. We test it by running gels. Hardly any lab does that anymore, right? They stick it in a nano drop. They say it's good. They do an amp. It amps or it doesn't. You know, you can't get big amplicons if you don't have big DNA to start with. And we will use kits and get everything sheared. And you can get small reads. You just can't get long reads. So that's the other big thing is changing a little bit of the way that you do things and doing some of the QA and QC, the best you haven't talked about. So any sound has actually a QC question. And this is again in terms of the error rate. How would you screen out fourth positive strains by a sequencing error? That's a really good question. I don't know if that's ultimately possible, right? If you've got a strain that the sequence error actually makes it a best match to something else, I don't know how you could do that. Because ultimately, if you build a consensus from those, it will still match something different. But what I can say is by doing a longer amplicon, we're not looking at single nucleotide changes, right? And it turns out the ITS region is highly variable. We can't even align the thing for goodness sake. Yes, but actually, your initial blast is not on the consensus. That's on the initial line of all reads. Yeah. And you can use those to build a consensus and do a phylogeny. I think our argument or the thing that Max and I have delightful discussions about is you have novel species that between the two of them, let's say they're 15% difference, but they match something in the database at exactly the same thing at 10. But we've shown in our publications how you can pull out those reads, build a consensus, just aligning them to each other, and then separate out novel strains and novel species. So Pierre Ramon has a very interesting comment on this. And this, again, was initially thought to be very, well, rare or uncommon. But there's a paper out, and I don't remember now the author. So this is a comment. There's a paper showing that unlinked RNA genes are widespread among bacteria and archaea. Can your method based on nanopore account for that? If not, the considerable amount of the tax and strains would be missing, right? Very true. You know, doing just a single amplicon and looking at a linked operon. It's true. So I think that's an important aspect to see is, okay, they've been found, what does widespread, how many tax actually have on linked operons? Yeah. So that's something that could be looked at. You know, you could conceivably do separate amplicons. You could pick any loci to characterize a community. You know, it doesn't have to be ribosomal. But it just has to be a common gene, a polymerase. You know, that was all done in the early days, too. It's just that the ribosomal RNA genes became very easy for all of us to adopt. But it would be worthwhile looking at that. As Francisco said, you know, you can use total DNA as a way of getting at this. And the upside of doing long reads is you don't have the assembly issues. And, you know, so that's something else. So just combining some amplicon work with some total DNA work in certain environments should at least shed light on this. And Lee, I just have a question on the following, what you just answered. And do you think that this, the low DNA yield will limit the use of a minimum for the metagenomic analysis? You know, that's a really interesting question. And, you know, one of the things that I did right before was to do a couple of things. I pulled up some data we have on the ketone that we did with the minion. And it's a little bit different because you're basically taking a negative PCR reaction and a positive PCR reaction, comparing them. And, you know, we see four orders of magnitude difference. The problem that you have is that the mass of DNA that you're trying to sequence is what's, you know, determining how much, how many reads you get. Now, you could try to preamplify it with something like genomify or something like this. And there are kits for very low input that should help. But, you know, you go through, that's very different than, let's say, extracting 1,000 liters of seawater and jamming the whole thing into a flow cell, which would be really interesting to do at some point. But, you know, in our hands, we like to have a pretty high mass of DNA that goes into the flow cell. So with amplicons, you'd have no limitation. You know, we can take one or two micrograms of DNA, you know, essentially barcode it 36 times, jam it through a flow cell and get 2 million reads. If you're at low biomass, either you have to, you know, low biomass samples, like when we were doing the respiratory bronchial lavage, there's very little there. So we didn't try to sequence that directly. We did the amplicon way of kind of characterizing the DNA. But it'd be interesting to do it. And there are others that are trying to do these things. You know, the number of reads that you get is really dependent upon how much mass you put in the flow cell. Yeah, I mean, the aspect of what to do, I mean, with the numbers and the sequencing technology, I'm just following up on another comment from Jeff Shields here, is that with respect to rare members, this is in parasite ecology, frequency, sort of prevalence, is a distinct variable from the density or intensity or abundance. So I think, Francis, you were sort of alluding to this, that we really need to be looking at both variables for determining the organist how rare it is or abundant. So should we get away from that terminology and just call it as it is that we actually are looking at two different things, abundance versus frequency and samples. Well, I guess that depends a little bit on the consensus of the different peers that work in this sort of question. I think that the most important thing is for the studies to be very objective in all these on what they mean by each terminology. And if people, well, I don't think that there should be an effort to have a consensus on several concepts. And some studies actually simply do use really large terms to specify everything. So we can say that we have a common rare tax because it was found everywhere as rare or uncommon because it was found sparsely, but as rare and so on and so forth. Well, that's not up to me, of course. I think it's always dependent on the community itself, because it's the terms that the different research groups use. But again, if they are explicit, then that's best. But yes, maybe it will be useful to refer to low abundance simply instead of rare biosphere. The term rare is I find very much subjective. And that can be a problem in scientific projects. What is rare to me might not be rare to another person. So there is a subjectivity intrinsic to the concept. In any case, it's the group of concepts that have been used by the different studies that are focused on this specific question. So that's why we use them. Yeah, there were a number of comments about sort of arbitrary cutoffs of whether it's 0.1 percent or 0.01 percent or so on. I mean, I think that is indeed arbitrary if you don't link it to actually what is the ecological question that you're asking. And then does its factor in. And like Bastian said also earlier, how do you know that they're not just from the kit and demonstrating that this is actually a true biological signal that you are monitoring? Actually, a colleague of mine, he might be in the audience, he was working on his own pipeline for OTU calling and everything. And in his process, he benchmarked this pipeline with many positive controls. And it turned out if something there dropped below 0.1 percent, it mostly seemed to be an artifact. And it would be thrown in again into the interview. So in this case, it was possible to have an objective criterion for what you would consider real and what you would consider not. But then again, in this positive control, nothing was really rare. So it's also a bit hard to say like what I believe everything there was right, but the sensitivity at some point, yeah, hard to say how sensitive it goes. And yes, the thresholds are arbitrary, at least from my point of view. When I reviewed the literature, I did not find a biological reason to use, for example, 0.1 percent. And those are the majority. You can also have more exotic sorts of thresholds. But I'm not saying they are wrong, because sometimes in specific, for specific research questions, they make all sense. So for example, you might be studying a specific species that you know well, and then by molecular profile, you also identify it in the environment. And you can see that it has, for example, 0.000 something of relative abundance. And so it is clearly rare from the point of view of abundance. So and then there's I think many examples where you have low abundance taxa that are doing very critical processing in the environment, biogeochemical cycling and so on, that is completely dependent on that organism, even though it never becomes one of the abundant in the group. So I think it's important to, to really when whatever you're looking at, going back to what is the question, not just okay, what is the data you have or through the different methodologies. Do you know what? If we've been going on now for an hour and a half, I think we're going to have to wrap this up. Any final comments or questions from you? Was nice being here. Yeah, well, definitely. Thank you all. It was wonderful hosting you. And I want to thank everybody here in the audience. We actually had a record audience for one of the FEMS microbiology ecology webinars. And of course, if you missed this, well, then you're not hearing this. But the webinar will be available on the FEMS YouTube channel in a couple of weeks as well. So you can go back to any details that you want to go in. And again, of course, read more on the papers that were highlighted here. And hope to see you in a couple of months for another webinar on another topic. No, no yet. Any more. So also feel free to send your suggestions to me and we'll look into what things we should be discussing. So again, thank you, Lee, Francisco, Bastia, Katharina. It was great. And again, thank you for everybody here in the wings from the staff who have helped us have them. So thank you all. And stay safe and see you soon again on Zoom and hopefully in person. Thank you. Yes. Happy holidays all.