 Thank you so much for the introduction and for the possibility, the opportunity to present. Let's see if my slides start sharing here. I would have loved to be in Boston for the meeting, but given the circumstances, this really is a great alternative and I've prepped with lots of fancy snacks, so I'm ready for a nice symposia evening here in Fubsala. So the title of my talk today is genomics of A and Fandjai. In my career, I started out working with ectomycorrhizal Fandjai and got increasingly interested in like the large unknown groups of Fandjai, like what Tim will be talking about later today, like the autorisomyces that we worked on and where we are able to culture some of this unknown diversity, which is challenging in some ways, but I was always convinced I would never work with A and Fandjai because they have seen way too difficult to work with, but you have to grow them with the plan. Honestly, when you work with meta barcoding, like I've done a lot, you kind of believe they don't even exist because using standard protocols and standard primers, you can hardly detect A and Fandjai in many systems. But about 10 years ago, I ended up working in Bloomington, Indiana for two years and with the deciduous forests there, my interest for A and Fandjai grew. And I also started collaborating with Jim Beaver and learned more about this high levels of variability in our DNA genes in A and Fandjai, and I found that really intriguing and interesting. This example here, it's from Teresa's work, but they showed based on large subunit sequences within this clayed outline in red there, the Clarydeoglomus, all the species had two highly diverged types of ribosomal genes. They exist in all the strains, then if you go into one of them, like the tree to the right, the S variant there, there's still variation within that variant as well. So to the degree that you cannot distinguish species using this marker, the way we would with a lot of other Fandjai. This high level of polymorphism in the DNA genes and as well as in other markers, in combination with the multinucleate asexual spores of A and Fandjai, this has led to a long and somewhat overly heated debate about the genome organization of A and Fandjai, where they are heterocaryotic, like have lots of nuclei, different nuclei within the same organism, due to the fact that there's no visible stage where they have only one nuclear in their life cycle. So this led people to think they could have a cumulative variation in this way, or the opposite, the homocaryotic model where all these observed variation would be found within the single nucleus, ploidy, auras, multiple copies. So the debate has been intense, but it's sort of set them down now. Because knowing, like figuring out this polymorphism was really hard before we had genome data. And it's only now in 2013 when the first genome of an AM Fandjai was published. It's the Rhysophagus irregularis, the model Fandjai that's used for a lot of experiments as well as genome work. And this figure here is illustrates the papers published on AM Fandjai genomes. There's a gray line that increases it's the number of species sequence and there's a yellow line showing the number of genera represented by these pieces. We really learned a lot from these genomes. So the genomes are fairly large and have a high number of predicted genes, in particular when compared to separate topic species like nuclear and multirellas in their sister lineages of the AM Fandjai. And there's no evidence that it's an effect of genome replication, but rather the sizes seem to be driven by expansion of repeats and you see that the repeat content that varies a lot. The number of genomes is from 20 to up to 86% there in the Giga Spora Margarita. So this, it depends a lot on how you define and identify repeats but it's, it's a big content in most of the genomes. On the other hand, when people do estimate the variance as snips per kilobases, the numbers are fairly low so the variance interanomic variance in that sense is fairly low. So based on the rise of Vegas, the regularis, there are several strains available genome data from several strains and this has a lot of comparisons. So in a paper from 2018, they estimated that around 50% of the genes were accessory genes, meaning that they're found, not found in all strains. And they also found that these accessory genes could belong to any predicted functional categories. So it's really between strains, there's a lot of variation in the genome content or isolates. They describe this high variability to the high number of active TEs in the genome suggesting that TEs move things around and copy things around and that this has been a reason for the observed genetic polymorphism previously interpreted as an atherosclerosis. My favorite paper for AM genome papers is really this one by Maida and coworkers from 2018. Because they demonstrate why we have this high level of RDNA polymorphism in AM fungi. We know that in virtually all eukaryotes, the RDNA gene is organized in an often extensive tandem repeat. Occurring one copy will be effectively removed during the combination. So in the end that within a species variation is very low, but between species, the upper on diverges, particularly in non coding regions like the ITS that we use for as markers for species in barcoding studies. And this is exactly what makes the RDNA gene an excellent marker for barcoding, but obviously not for AM fungi. Because what Maida shows is that in the Inveresophagus regularis, there's no tandem repeats of the ribosomal gene, instead there's 10 or 11 non repeated copies that are spread on different contigs in the genome. And these copies evolve independently. So that's why you can find, even within a single spore, there can be many different variants of the ribosomal genes. Of course, we have no evidence yet that this is the case in all AM fungi, but given the high variability, it's fairly likely that that's what we're going to be seeing. Another really exciting finding in AM genomics is the work by Ropers and her colleagues that came out in 2016, but they identified the mating type of lead in AM fungi. Again, they use several strains of rhizophagus and found that in two strains. So if you look at the allele frequency, two strains had a biallelic distribution with the peak at 0.5. In these two strains, they also found a dip in coverage in one contig and could locate the mat locus there and found that within those strains there were two alleles. Whereas the other four strains that they examined had the opposite allele frequency distribution and only one mating type lead in each strain. So based on these observations, they outlined a possible life cycle for AM fungi, but the clonal cycle where strains having only one mating type would propagate clonally and then if these two strains with different mating type met they could fuse form a dicharion-like stage with two mating types and then maintaining those nuclei at a 50-50 ratio within the mycelia. We don't know how that would be organized, but that's what the pattern of the allele distribution looked like for the strains. And it's likely that the cycle would also include cariogamy at some point with the possibility for meiotic recombination, but this stage has not yet been seen. And the evidence that we have for recombination are still thin because it's really difficult to identify recombination events with this in these organisms. And it's a likely part of a cycle like this. So I don't think anyone really doubts anymore that AM fungi have a sexual cycle, but the question really is how frequent this does it happen in relation to the asexual reproduction of these fungi. So like I said, we really learned a lot from the genomes that have been published. It's important to remember one thing that almost all of these genomes come from Auxanic cultures where the fungi can be grown with the transformed roots. The exception is the diverse spore IPA, which has this little fruit body structure above ground with the red arrow. So that was used for getting enough DNA. Another sort of an AM fungi was also published. It's UC from the forum is, but instead of plants it's associated with nitrogen-fixing cyanobacteria, not stocks. And it belongs to the same film or sub-film depending on how you like it. Given that given its host, I think many of us expected that UC from would be a basal sister to the AM fungi. But that's not what it looks like based on the phyloenomic analysis using transcriptome data that's available. To me really this paper, the best part was that the authors went back to the location where this species was known from and managed to re-isolate a strain. That's really impressive. And they could culture it and get enough biomass for genome sequencing. And that really is the key problem with AM fungi. It's hard to get enough biomass for DNA extraction. And that's where our work comes in. So we recently released a pre-print in bio-archive, the paper is still in review, where we present de novo genome assemblies and phyloenomic analysis, including 18 previously un-sequenced taxa, representing eight new genera of AM fungi. We regenerated these without exemic cultures. So instead we started with hole inoculum that we obtained from INVAM. And we extracted spores from the soil with sieve and sucrose gradients. Pick the spores, crush the spores, stain the content for DNA, and then put it on a fax cell sorter. And at the facility, we could sort individual particles into wells. And then we amplified the DNA, we could score if DNA was fungal or bacterial or both. And then we picked the ones we'd only found, sent them for sequencing with Illumina high sec X. And all this was possible thanks to Claudia at a single, microbial single cell facility. And Barbie Ellis, who was posted with us, as well, of course, of Massey Montlianarino was a PhD student who led the work. But let's take a step back, because this is part of a large ERC project that I started in 2016, with the aim of studying evolutionary stability of AM fungi. And you realize that's an impossibility, it's a huge task, but it's very interesting to dive into. And I built the idea on this model of a temporarily dynamic heterocereosis model so that different nucleotide exists that they segregate and they re-merge ionostomosis. That was the main idea of the project. And the thinking was that we would be able to determine if nuclear were different by sequencing them individually and comparing them. That's enough. That's what I thought. And I started with Clarity-Lombos because we knew, like I was saying, that there was high variation in the RNA genes, so we figured there'd be other variation as well. So we wanted to generate reference enomes and then map reads to the individual nuclei. And when I started, we had already done pilot experiments showing that we could sort the nuclei and amplify them. And I kind of thought that would be the hard part. It turns out that when you get the sequences, it's very easy to assemble it to something that looks like a genome, but then to actually make sure that the assembly is a good representation of the genome. That's a whole different thing. So that's what we spent a lot of time figuring out how to handle this kind of data. So in our case, we had 24 individually amplified and sequenced nuclei that we wanted to combine to an assembly of the genome. We use MDA for amplification, and supposedly it's unbiased, meaning that every nuclei will not represent the whole genome, but together everything should be represented. But using amplified DNA, you have very variable coverage. So you can't really use the coverage for information, and it causes problems with a lot of assemblers. And it's also difficult or impossible to distinguish a repeated region from a highly amplified region. So there were several challenges with that. So we ended up spending a lot of time evaluating assemblies, looking for single copy genes, how they were assembled, estimating completeness, the size of the assemblies, repeat content, looking for the other DNA variants, and so on. We worked on this to the extent that we started to call it as emblemics because we had over, I think we had over 300 different assemblies of these same nuclei that we published in this work. And it does sound a bit crazy when I say it, but I think it's an important lesson for trying to explore new types of data. So I'm going to walk you through the three types of workflows that we developed and the conclusions we drew from them. So we had all the nuclei and two of the pipelines or the workflows that we did started by assembling single nuclei and then putting single nuclei together. In the number one, we use Masurka, we assemble all the weeds into single nuclei, whereas in the second one we use spades, but then we first normalize the weeds, assemble single nuclei and put them together. And we use Lingon, a new like assembler of assemblies that Manfred developed for this purpose. So in this way we got whole genome assemblies. And then the third option we had was that we just put all the data together, we normalized it and then assembled it with spades. So we went straight to a whole genome assembly. And this is one of the ways we evaluated it, we looked at Busco genes that we expect to find as one copy, and we expect to find 290. So we estimate how complete are the genomes and we have increasing numbers of nuclei, so for example one, two, three and so on. And what we see is that the single copies is the dark gray, the increase, but then there's a light gray with duplications that start to accumulate as we add more and more data. And with the other methods, the assembling with spades, you can see that the single nuclei, we get bigger assemblies, but as soon as we start to put them together, we accumulate a lot of duplications, artificial duplications that should not be there. So putting all the data together, we get a higher completeness estimated already at six or seven nuclei, we had over 80% completeness of the genomes. So putting them together, we saw that using Masurka, it gave us a very accurate estimate of repeats, whereas the single nuclei were very fragmented assemblies. And also the complete assemblies were very fragmented. With spades for single nuclei, we get better assembly of individual nuclei, but we really cannot use them for putting them together as a larger assembly. The third method using spades and all the data really gave us quite nice and long context and accurate gene structure. But the repeat contact was severely collapsed. So we felt pretty happy with the how the assemblies were looking and myself felt confidence that we would actually be able to use this on a lot of different strains who would be able to sequence a whole bunch of a and find that can extract single copy autologues. It looked like we were getting genes represented in a good way. So we decided to go for a broader sampling and we talked to Jim Bieber about text on sampling and I got isolates from in one. So we will get at least two representatives from every genus in the collections, and we started this massive sorting effort. And using pools of spores instead to get more nuclei and we're getting everything from seven to 24 nuclear per isolate. In the end, there's always some that didn't work where we couldn't separate out particles corresponding to nuclei or we couldn't get a good amplification. In the end, we're 2021 isolates that we could generate assemblies for and we use the system, the space assembly putting all the data together. So this is our final analysis. We're able to use the genomes we generated as well as previously published genomes and get a very good tax on representation throughout a and fun guy. And it's based on 371 single copy autologues represented in at least 50% of the included tax. And the quota here as an out group based on the previous analysis where we had that carrier included as well. And we recovered the same topology, both with racks ML IQ tree innovation analysis, and all the notes are fully supported so it's well supported. We also colored it based on the consensus taxonomy from the redecker and coworkers using the in mom, the colors on the home page because it's so nice and colorful. So in comprehensive polyanomic analysis, we could demonstrate that all the family level classifications are well supported, and they are monofiletic lineages. We found that in contrast to work based on right so many genes. The order. It's polyphiletic was what happens is that you see the purple color credo blue, blue marase comes out as a basal sister to the devices where Alice and the rest of the global Alice. And in the paper we go into some analysis of conflicts here on displacement, but overall the picture is a well supported file on the phylogeny with broad text and sampling. The stars indicate a new genomes from our work. And one reason that we can feel confident about this assemblies is that we also include the rise of Vegas the regular is the same strain for which there is a very good reference in on the version 2.0. That was published by Canada 2018. So we took the two assemblies hours and the published reference and then we'll compare them. First mapping reads, we find that 99% of our reads map to both of the assemblies and individual nuclear cover on average 50% of the assemblies. Based on this we concluded that the reads, like all the genome is represented among our reads that we have generated. Then we took of all the busco genes that were present in both the genome assemblies we did linear like pairwise alignments and we compare the similarity and found that there was an average 99.7% similarity. And the background there is that 260 of the genes were identical. Then there were a couple that was like 96 or 98, 99, and then one of the genes had only 60% similarity. But our conclusion is that any random errors that are introduced in the MBA, they don't really are not retained to a large extent so most of the genes really are assembled correctly and well represented using our method. So we felt that this is a very solid method to generate single copy-artilogues suitable for phyloenomic analysis. But we're always interested in learning more about what happens with the different assembly methods and what kind of analysis are different assemblies suitable for. Marisol who was a postdoc in the project then and she's now moved on to tenure track position at the Agricultural University here in Uppsala. She said developed additional modified assemblies for the same data set and gone into an extensive comparison and she will be presenting this work. It includes link copy number, how they are represented as well as repeat landscape. So he will be presenting them at the Kanfunnet here later in May, you can go and listen there if you want to hear more. If you don't have so much time, I will speak quickly. You're wondering what about the single nuclear assemblies. So very shortly, looking at Clare de Glomis Claredium single nuclei assembled with spades. What we see is that we have a total size that is smaller of course and the complete assembly. The completeness based on busco that ranges maybe from 20, but even up to 80% completeness for a single nuclear. We detect only one mat alil, but we detect it in most of the nuclei so we assume, we conclude that this is similar to a homo karyotic stage described by Roppers. We have two ribosomal variants with this detect one in all of them, the L variant, whereas the S variant we only detect in 15. In our conclusion from this is that the S variant probably occurs in fewer copies. So we are more likely to miss it in the assembly. So we're not thinking that they actually absent. It's more all of the genomes are not represented. We're looking at inter-organismal variants and map the reads from individual nuclear back to the biggest assembly and see that it supports the idea that these are haploid nuclei. But we also have a sort of a part of variants within each nuclear that's intermediate, which we ascribe to noise due to the MDA and we want to filter that out. So we keep only alternate alila and everything about point nine reference alil if it's smaller than point one, and then we map across the whole organism, we look at variants. And we find again low polymorphism, 0.18 in the coding and 0.35 in just a non repeat content of the genome. Mostly rare variants, but really interesting. We find a strong signal for negative selection and we can estimate the NDS across the coding, the well annotated coding part of the genome. So this is very exciting, I think, because it's also very much in line with other words from Teresa's group where they did this beautiful study monitoring using visualization monitoring the fate of nuclei in the mycelia and in the spores. And what they saw then was that spores populate the nuclei and our nuclei populate the spores as the spore mature nuclei move in and out. So it's really not founded by one nuclear. They saw that in the mycelia, there's sort of program nuclei death in the mycelia when some nuclei, they are degraded. So we're thinking, maybe we're seeing the effect of this negative selection on the level of nuclei, but this is really very early work and we're going to look into it further to some of the work we have. I hope I convinced you that the single nuclei sequencing can be used to generate good AM genome assemblies, and that after careful analysis of the assemblies. I would say that now this is when the fund begins that we can start looking for biologically important aspects of the genome content and determine features that are specific to AM compared to their sister lineages. And even more interest maybe diving into within AM variation. We're interested in phosphorus transporters, for instance, the diversity of those within those like these lineages. I'm really excited to start diving into this data and see what we will learn. I want to thank everyone, my funding, a lot of great resources here in Uppsala and what great people to work with. Thank you very much.