 Okay Anna, we're good to go. Welcome to today's open research webinar hosted by Elife. This series aims to give early career researchers an online platform to continue to share their research as an alternative to in-person meetings which became impossible now. My name is Anna Akmanovan. I'm a deputy editor of Elife. My own research is focused on the site of skeleton and particularly on micro-symbols. But today we will hear talks from three very different young researchers working on stem-ness. This will be related to Maxine, our first speaker, Kevin Eng, who will be talking about evolutionary concert truncated PDL one variant and last but not least Nikos Vakirlis working on orphan genes. So before I tell you a bit more about how to ask questions I would just like to very briefly say how important it is that we do these kind of webinars for two reasons. First to provide a platform for young researchers to share research broadly when meetings became impossible and second also because on the news we read now every day about the importance of science. We're in a unique period of time where everybody turns to scientists for answers and for ways to move forward and so I'm particularly excited to be able to participate in this Elife webinar so that we can share the newest research with you. So of course you will listen but you will also be able to ask questions. After each 10-minute talk we'll have five minutes for questions and then we will move to the next talk to ask a question. You can either type it into the Zoom or directly in the Google document when the link is shared also in the chat. We are joined today by Miranda Anye and Naomi from Elife who are working in the background to support you. They will help you to line up the questions and if it will be possible we will invite you to ask your question in person and then the symbol unmute you so that you can do that. Otherwise if it's not possible I will read the questions and include your name. The open notes document is also a place for you to contribute shared public notes. We welcome you to do so and to list yourself as a contributor in the list above the speakers to today's webinar. Thank you. Okay then about YouTube so this webinar is being recorded. There's also live streaming on YouTube and since it's a live webinar we ask you please to be respectful, honest, inclusive, accommodating, appreciative and open to learning from everyone else. So please behave yourself, don't attack, demean, disrupt, harass or threaten others, encourage this kind of behavior. If you feel uncomfortable or unwelcome on any of these webinars please contact Elife by email. The inbox for this is watched by Miranda at Elife. Organizers reserve the right to ask to leave and to deny access for subsequent webinars on Zoom. If you need any help at any point please send Miranda, Naomi or Anya a chat message directly using Zoom. Okay so now that we are through the introduction we will start with the first talk from who is from Rallitza Madison and she will tell us about the regulation of stemness. Welcome Rallitza. Thank you. I'll just try and share my screen now. You'll see this. All right thank you everyone for joining in and particularly Elife for what I think is a really great opportunity given the current situation. So I've never given an online talk before and in case something goes completely wrong I want to make sure I thank the right people at the start particularly my previous supervisor Robert Sampo. Sorry well can't go back but I want to thank my previous supervisor Robert Sampo from with whom I did my PhD at the University of Cambridge and subsequently a postdoc at the University of Edinburgh. Several other people have been involved in this project also my current mentor Bart van Heesenberg at UCL Cancer Institute. Now I'm going to focus on the big picture here looking at stemness and how it's regulated by a peak 3ca activity threshold but for those of you who want more detail in an actual look at the data you can come back to this slide on finding the slides online and you will be able to find all the data upon which this talk is built. So going forward the focus of my research really is the P3KNA signalling pathway and for those of you who are not familiar with this pathway all you need to know for the next 10 minutes is that when it's activated in a cell that cell is told to grow and to proliferate and to resist apoptosis. So quite unsurprisingly this pathway is commonly hyperactivated in cancers and often this is because of an activating mutation in the peak 3ca alpha enzyme. This enzyme is a lipid kinase composed of a catalytic and a regulatory subunit. The catalytic subunit is known as P110 alpha and that subunit is encoded by the peak 3ca gene. Cancer biologists will be familiar with the peak 3ca gene it is the second most commonly mutated gene in human carcinomas and the mutations in this gene are activating their considered cancer drivers and therefore a lot of focus has been placed on the development of P3KNA alpha inhibitors for use in oncology. Now having said this it is also very important to stress that peak 3ca mutations activating peak 3ca mutations should not automatically be equated with cancer. So we know that this is not the case from a group of rare overgrowth disorders commonly referred to as the peak 3ca related overgrowth spectrum of pros. So this lady here has extreme lower body overgrowth caused by an activating heterozygous peak 3ca mutation that has been acquired during development and in this particular case this mutation although it also occurs in cancer it does not lead to malignant overgrowth. So you've got a situation here where you can have the exact same activating peak 3ca mutations but in different contexts and leading to different outcomes. So when I joined the sample group to do my PhD my aim really was to study activating peak 3ca mutations in a developmental context because we knew a lot about them already in cancer but we knew relatively little about their consequences in development. So to do this I chose to use peripodent stem cells and of course as it typically happens in research you end up studying the least what you didn't plan to do so ultimately I ended up learning a lot about peak 3ca cell fine cancer and contributed perhaps very little to pros understanding. The reason for this is the peripodent stem cells share many similarities with cancer cells and in fact you can view cancer as featuring a barren activation of embryonic programs and in fact this is now supported by big datasets where for instance the research study derived a molecular stemna score from peripodent stem cells and used that stemna score to then classify human cancers as either differentiated or undifferentiated and the researchers were able to show that the higher the peripodent stemna score in these cancers the more aggressive in the worst the disease outcome and this duality if you like came in very handy in my research because I was able to study the effects of oncogenic peak 3ca cell activation on stemness and I've put a question mark here because this is typically not a phenotype that is widely studied when we think about peak 3ca signaling and that is perhaps something that we should start paying more attention to as I'll come back to later. So as I said initially I wanted to develop human peripodent stem cell models of pros so I use CRISPR-Cas9 in the idea of to knock in the strongly activating peak 3ca H747R variant into a human induced peripodent stem cells. So in hindsight I turned out to be very lucky because my CRISPR was quite efficient and I ended up getting homozygous as well as heterozygous mutants and I say lucky because this is an oncogenic mutation and the general view at the time was that you only really need a single copy for its full blood effects to be observed so in other words an extra copy was likely to be redundant and if anything not particularly relevant for what we wanted to understand. We now know that this was a wrong assumption even when just looking at pathway activation and again this is all published data we were able to discern greater activation of the PA3 kinase signalling pathway going from wild type cells to heterozygous H747R mutants and then to the homozygous counterparts but the most striking difference between homozygous peak 3ca in mutants and the heterozygous and wild type cells was really the effect that this extra copy of peak 3ca H747R had on stemness. We found that homozygosity but not heterozygosity for peak 3ca H747R resulted in a phenotypic switch characterized by self-sustained stemness so in other words these cells are now resistant to differentiation and we think this is quite important also in the context of cancer as I'll return to you later on because we also demonstrated that you can have multiple peak 3ca in mutant copies in human tumors and therefore this dose dependent regulation of stemness and perhaps other phenotypes downstream of these mutations could potentially also be relevant to look into in cancer. On the mechanistic side of things we of course were very interested in trying to understand what maintains this switch in your position so what is maintaining the self-sustained stemness and importantly is this irreversible switch can we bring it back off again and is it simply a matter of using one of the peak 3ca H747R inhibitors that is being developed for use in oncology. So we took an unbiased approach here in collaboration between the Lindings lab in Copenhagen where we profiled our cell lines using RNA sequencing and also proteomics and consistently our computational predictions suggested that the activation of the TGF-Beta signaling pathway plays a key role in this self-sustained stemless phenotype of the notipics which that we observed in homozygous mutants. This caught our attention for several reasons so very early on in my work I had observed that homozygous H747R mutants exhibited a strong upregulation of this gene here known as nodal. So developmental biologists will know about nodal as being a critical morphogen during embryogenesis it's a TGF-Beta like ligand and like TGF-Beta it instructs cells to undergo epithelial to mesenchymal transition and during embryogenesis this is very important for gestrulation and organ development and again exemplifying this duality between cancer and development TGF-Beta in a cancer promotes metastasis and disease progression. An important feature of nodal is that it can actually promote its own expression so you've got this positive ultra-regulation which again is very important considering the self-sustained phenotype that we observed in these cells. The reason why it's important is because you also need to supplement nodal or TGF-Beta into the maintenance medium of human peripheral stem cells to keep them in the undifferentiated state and we knew that homozygous mutants were capable of maintaining the undifferentiated state even when you switched them to a medium that wasn't promoting this. So we hypothesized that self-sustained TGF-Beta nodal signaling could be the mechanism for this stemless phenotype in homozygous IPSEs. So we went on to test this and I apologize in advance for what may look like a very complicated slide so I'll walk you through it. It's actually fairly simple. On the left hand side I've got results for wild type induced peripotent stem cells looking at nodal expression and also nanoc. Nanoc is another very important peripotency factor it's a downstream target of the TGF-Beta signaling pathway and if you look at the two bars here the left one is the gene expression in cells cultured in complete medium and then when you remove nodal you now see a decreased expression of nodal itself as expected and it also translates into lower nanoc expression. The same is not true in homozygous mutants they remain high in nodal expression and even when you add AP-1-2-Alpha inhibitors so here BYL 719 although you see a drop in nodal expression this is not sufficient to trigger downstream down-regulation of nanoc and nodal factors that I'm not showing here. What we needed to suppress this stemless signature was actually inhibition of the TGF-Beta signaling pathway consistent with our hypothesis. So just to summarize what we've been what I've told you about is that homozygous Haitian 47R mutants exhibit this self-sustained stemness and it seems to be triggered by constitutive TGF-Beta signaling we can inhibit this phenotype with a TGF-Beta inhibitor but very importantly we cannot reverse this stemless phenotype using AP-1-2-Alpha inhibitor. So for cancer biologists how is this does this have any relevance? We think it does and here is some unpublished data which we're quite excited about we looked at breast cancer transcriptomics and we computed PA3 kinase activity score and a stemless score based on gene expression data and we saw a remarkably strong positive relationship between the stemless score and the PA3 kinase activity score and more importantly this was also related to tumor progression with grade three tumors the most undifferentiated tumors having the highest stemless score and the highest PA3 kinase activity score. So I'll finish here by just highlighting two of the questions that arise from this one very important question is whether PA3 kinase inhibitors were active against cancer stemness. This is not something that we typically evaluate when we use these inhibitors we tend to look at proliferation and growth but we think that it is also important to look at stemness because you need a single cancer stem cell to survive to actually regenerate the entire tumor. So even if the inhibitor suppresses growth it might not be effective in the long run. And finally PA3 cation 47R and other PA3 c mutations are heterozygous in pros and maybe this is one of the reasons why the growth in these patients doesn't progress to malignancy. I think I might have run over time so I think I'll stop here and I'll be happy to take questions. Do I need to stop sharing? Naomi do we have any questions from the audience? The questions are just coming in on the document and I've got a first question in the chat for you if you like. Okay so a question pretty obvious one. Could the lack of a wild type allele rather than the presence of an extra mutated allele of PA3 cation be the driver of the observed phenotype? Yeah so that's a question that has come up sometimes when I present this work and we haven't actually looked at this directly but we don't think it is the case partly because there is a negative regulator of the PA3 cionase pathway P10 when you lose that homozygously you also trigger self-sustained stemness. So downstream of PA3 cionase if AKT is hyperactivated in human peripheral stem cells you again see self-sustained stemness. So we really think this relates to a PA3 cionase activity threshold which you can achieve by other means than just the P3 cation 47R mutation. Okay next question is from Michael who may be watching you on YouTube. Have you checked the canonical TGF better SMAD such as SMAD 2-3 levels in the PIK 3C mutation? Yes and we've done a reverse phase protein array and we've shown that SMAD 2-3 phosphorylation is increased in the homozygous mutants and that you cannot reverse that with a PA3 cionase cell for inhibitor even though you reverse the canonical PA3 cionase signaling pathway. So yeah the canonical SMAD 2-3 pathway is its own. I see okay maybe a question from me so how well do you differentiate stemness from proliferation and these different effects? So how well can you define these different phenotypes? Yes so we actually don't have any evidence for these cells proliferate more than the wild types and the heterozygous mutants. So we don't think it is related to proliferation in this case. What we know is that these the mutants but that's seen both in heterozygous and homozygous mutants they have increased resistance to apoptosis when you remove the growth factors. I see another question is which PA3 cionase and effectors do you think would be involved in the pathway that you are looking at? So in our in a pre-print study that we've published recently we've looked at this with a network computational analysis and we think that MIG may be playing an important role. So MIG is being stabilized downstream of the heterozygous mutants activation and we see quite substantial evidence for it playing a role in this and also partly based on prior literature but it's something that's a work in progress. I see and maybe the last question do we have more questions from the audience not at the moment we have one more question so this stemness switch how do you envisage it mechanistically? So this boils down to a bit of complex signalling dynamics but what basically what we envisage is that you've got this a by stable state if you like so at the transcriptional level this could for instance be caused by epigenetic modification and opening of chromatin and then you're a state where you have more open chromatin for instance at the stemness genes and continuous binding of activating of activating factors whilst inhibiting the repressors of the phenotype so we believe that there is when I talk about the switch it's basically at the signalling level if you're imagining a steep sigmoidal curve above a certain threshold you then stabilize into this new state and this is how we imagine that this is happening but as I say this is something that needs further mechanistic studies it's not it's not being addressed or we haven't addressed it at the moment. Thank you very much and this is a very exciting study so thanks a lot we'll now move to our second speaker who is Kevin Eng and he will talk about PDL1 variant functions. Welcome Tim. Great thanks so much let me just start sharing my screen great is that working great okay so thank you very much for having me today I think that I hope everyone is staying well in these rather unprecedented times and I think this is a really fantastic initiative to continue to share science in these times so our lab is an immunology lab and we are primarily interested in T cells which are in many ways central and orchestrating immune responses to cancer infection but also can be responsible for autoimmunity and auto inflammation so a T cell response in its most generalized form begins upon recognition of a T cells cognitive antigen by the T cell receptor the TCR and when this happens this results in a clonal proliferation of that particular T cell and release of various soluble mediators cytokines and other signaling factors and of course this can result in all sorts of collateral damage resulting from an overly exuberant immune response and so in order to restrict this response and to prevent this collateral damage T cells express PD1 upon activation and what happens upon binding with its primary ligand PDL1 is that you get a decrease in the magnitude of the T cell activation response and so the importance of this pathway is highlighted by the phenotype of PDL1 deficient mice which develops spontaneous autoimmunity conversely PDL1 overexpression is a very well studied mechanism of tumor immune evasion and there's been remarkable success in the clinic by targeting this pathway and so these two dichotomous phenotypes to me really highlight the importance of tightly regulating the expression and activity of PDL1 so what I'm showing here is the gene structure of CD274 which encodes PDL1 so you have the IGC and IGV domains and exons 3 and 4 and a transmembrane domain on exon 5 and we did a few years ago our lab did a established a pipeline to do de novo transcriptome assembly of RNA-seq data from both healthy and cancer patients and we assembled this truncated variant and we saw that this truncation occurs due to exonization of a retro element a line element within intron 4 which donates its poly A tail to the transcript and hence we refer to this transcript as CD274 L2A and biologically why this is quite interesting is as I've mentioned the transmembrane domain is located on exon 5 and when we see that we model the protein that would be translated from either full length PDL1 on the left or truncated PDL1 on the right that the truncated transcript retains the IGV and IGC domains required to interact with PD1 but lacks a transmembrane domain and so we hypothesized that expression of this transcript would produce a soluble form of PDL1 which has previously been described to be detected in the serum of cancer patients and other pathologies as well. What was interesting to us as well is that this intronic region was evolutionarily conserved among all hominid species which is shown here in this red box and when we compared the sequence divergence between all mammals we observed that introns within the gene which are in gray and the non-coding exons which are in white show far greater divergence than the coding exons which are shown in dark blue and this is of course expected but what was quite remarkable to us is that the this line element which is shown here in orange shows comparably low divergence to the coding exons and so in terms of expression what I'm showing here is expression of full length PDL1 which is split by tissue type and either cancer and orange or non-malignant tissue in gray and generally we see higher expression in cancer but we also see expression in non-malignant tissue and this is primarily in the new tissues and barrier sites such as the lower and then when we compare expression of the truncated transcript this is on average 10 fold lower I've expanded the y-axis just for clarity and as with the full length transcript we see highest expression in immune and barrier sites such as the lung and the skin we didn't quite see the same extensive upregulation in cancer and surprisingly we saw quite widespread expression among non-malignant tissue and so the conservation and hominids as well as the expression in non-malignant tissue to us suggested a potential conserved biological function and so in order to test our hypothesis that this truncated transcript would produce soluble PDL1 we cloned it into an expression vector and transfected a 293T cells which are deficient in PDL1 and when we ran an alliance on the supernaint of these cells we saw a dose dependent increase in soluble PDL1 importantly when we analyzed the these cells by flow cytometry we see no increase whatsoever in surface of transmembrane PDL1 and this demonstrates that this truncated transcript produces only the soluble form of PDL1 so of course the obvious question what does it do of course our first assumption was that soluble PDL1 like its transmembrane partner would be immunosuppressive and so we can assay the immunosuppressive capacity of PDL1 by taking primary CD8 T cells from humans and co-culturing them with 293T cells transfected with full length PDL1 we then activate these T cells and then analyze the proliferative and cytotoxic capacity of these cells so as we can see what we've done here is we've stained the cells with the proliferation dye and the more cell divisions the cells go through the more the dye dilutes and this is what we see in these distinct peaks and as we can see a stimulation induces robust proliferation which is decreased in the presence of transmembrane PDL1 however and to our surprise no amount of truncated PDL1 transcript was able to induce any suppression and this is true even at equivalent DNA concentrations as the ones used for the transmembrane form and given that we found that the truncated transcript was expressed 10 fold lower in our RNA data we reasoned that these were already super physiological conditions we've also tested for granzyme other activation markers CD4 T cells and K cells all of which were sensitive to transmembrane PDL1 but not soluble PDL the one phenotype we did observe was a decrease in PD1 staining on the T cells with soluble PDL1 and this could be reversed with the addition of a blocking antibody to PDL1 however we didn't see any decrease by QPCR of the mRNA for PD1 ruling out the possibility that soluble PDL1 was somehow decreasing the gene expression so what could be happening instead is that the soluble PDL1 was binding PD1 and thus blocking the fluorophore from binding resulting in this decrease in PD1 possible cells that we see and this is good interesting as it suggested that soluble PDL1 was able to bind the PD1 with insufficient affinity to cause suppression and then but and thus biologically could be acting as a receptor antagonist and so to test this directly we did a co-transfection experiment in which we keep the concentration of the transmembrane PDL1 constant and titrate down the soluble PDL1 again in 293 T cells and we generated PD1 overexpressing Jurkat cells and this is a T cell line there's a more tractable system to study PDL1 mediated suppression and again we stimulate and stain for activation markers so as before a co-culture of these T cells in the presence of transmembrane PDL1 decreased activation um and um as as we could see um and what we noticed was a dose dependent sorry what we noticed was a dose dependent rescue of activation as we titrate up the soluble PDL1 and once again when we tested this in primary T cells we saw that the presence of soluble PDL1 with transmembrane PDL1 was able to rescue proliferation inside a toxicity and that this phenocopy the effect of having a blocking antibody and so of course we wanted to test this in vivo which is slightly complicated by the fact that the analogous transcript and protein does not exist in the mouse genome um and so but given that human PDL1 has been reported to bind mouse PDL1 we decided to test it anyway we but we also designed a construct for the Earthologous mouse gene in which we truncate mouse PDL1 at the same position and so we stably transduced these two constructs into mca38s this is a colon adenocarcinoma cell and we subcutaneously injected these into immunocompetent mice and so what we know is that um this is these cell lines um in this subcutaneous model are sensitive to checkpoint blockade as seen when we inject the um therapeutic blocking antibody to PDL1 but what we saw as well is that cells transduced with either the human or most soluble PDL1 phenocopy the kinetics of exogenous PDL1 blockade suggesting that soluble PDL1 has the same receptor antagonist effect in vivo and so what uh very quickly this work this is all been published in um elife last year and so our um our our sort of this is quickly some of the work that we've done since then we were interested in the cell type in the biological relevance and what we found through analysis of RNA-seq data that uh this transcript was specifically expressed in activated CD4T cells and we could again confirm this during a time course PCR and ELISA of primary um CD4T cells and in terms of relevance to disease we were interested uh we are collaborating with marine and botterous group at imperial which has RNA sequencing data from CD4T cells from healthy donors and lupus patients I mean what we see is that full-length PDL1 is upregulated in lupus compared to healthy donor CD4 with no real difference between active and in inactive disease and in contrast the truncated transcript is selectively upregulated in patients with activated with active disease and if we express it as a ratio between the full-length and truncated transcript we can differentiate quite nicely between active and inactive disease suggesting that this may be a useful biomarker in the lupus setting so just to quickly conclude um we detect this truncated PDL1 transcript in both healthy and cancerous tissue we know that this truncation occurs due to exonization of an evolutionarily conserved retro element the expression of this transcript produces a soluble form of PDL1 that's not suppressive by itself but antagonizes the bioactivity of full-length transmembrane PDL1 and finally we observe that the truncated but not full-length PDL1 is expressed in CD4 T cells upon TCR engagement and that this phenomenon can be observed in patients with active lupus and with that I like to thank my lab all of the platforms of the Crick Institute and and so on and I'm happy to take any questions okay thank you very much Kevin so we are ready to take questions do we have any from the audience you got some questions in your chat Anna I see the first question is so what do you think regulates this switch between the two PDL1 isoforms right so that's been a very active we've been pursuing that question quite actively we don't really know yet what we suspect is that the three prime UTRs of these two isoforms are very different and what we what we know from the literature is that the three prime UTR of the full-length protein it contains many regulatory elements and so we suspect that it might be some sort of RNA binding protein that we're actively doing are you planning to pursue that look at which RNA binding proteins bind there or micro RNAs or something like that yeah so we are I'm currently doing some experiments just to see the screen a panel of RNA binding proteins that we know are expressed in CD4 T cells with similar kinetics we have a question from Jonas Dutra thank you for the very interesting talk when did this soluble PDL1 appear during the evolution of the hominid lineage right that's a really interesting question because as you've seen this this this intron is very well conserved among the hominids but not other species but what puzzled us a bit is that this this this class of line elements these L2 elements are extremely ancient and would have integrated likely millions of years before the divergence of the hominid lineage and so what was interesting to us is we were able to reassemble for instance the mouse genome and the rat genome and we were able to see traces of this remnants of this line element that had inserted there but have since mutated away so what we suspect is that this insertion itself is very very ancient but it has only been conserved in the hominid and we see evidence in some other species such as the pink for instance which has we can't detect this line element but we see other other integrations from other retro elements that seem to do the same thing okay and then the last question so what can we do with this molecule in terms of cancer therapy yeah so again that's been something we've been pursuing given our rather preliminary in vivo data showing that it manages to delay the growth of the universe we think that it might be promising given that this is molecule that's much smaller than the blocking antibody and might have better penetrance and within very dense tumor tissues we're also exploring whether we can use this system expressed in CAR T cells for instance but you can have very localized and temporally regulated and this might go some way in ameliorating some of the side effects of systemic I see very interesting thank you very much Kevin and now we'll move to the last speaker Nikos who will talk about the orphan genes welcome Nikos I thank you very much let me go ahead and share my screen here is this okay perfect so like you thank the organizers for putting this together this is a really nice idea to keep keep science communication alive during this weird times and so the title of my talk is syntany based analysis indicate that sickness divergence is not the main source of genes and this is work that I did while also got the Department of genetics of Trinity College Dublin so let's dive right in so what what is an orphan gene so an orphan gene is quite simply a gene that we cannot assign to any known gene family so in this toy example that I have here the blue black and red genes are shared between all these species so these gene families have members in all the different genomes and I'm showing here however this green gene that we see in this in the focal species here is an orphan because we only find it in this species so we cannot assign into any known gene family simple example therefore it is an orphan gene and in fact if we include the the phylogenetic relationships between the species that are represented by these by these toy genomes here we can easily infer that this the most likely scenario is that this orphan gene has originated so it has appeared along the branch leading to that species over there and it was missing from the ancestor of all these species therefore it is an evolutionarily noble species specific gene so the concept of the orphan gene can be generalized to include a taxonomically restricted genes which are genes that are not only found in one single species but that are found in a small group of species such as the green the yellow gene that I'm showing here that is found in the group species B and these taxonomically restricted genes we also call them TRGs and orphan genes are found in any plate that we look at across the tree of life so they are ubiquitous and what's interesting to think about is how this richness of patterns of taxonomically restricted genes and species specific genes and evolved from an early ancestor that likely had a small genome with a limited gene repertoire so this leads us to ask the question how do noble genes originate so in general during evolution so there are two main ways the first is through sequence divergence so what does that mean quite simply if we imagine an ancestral gene okay the following especiation event is going to produce two homologous genes and with the passage of time these two homologous genes can find themselves within the twilight zone of sequence similarity right here where sequence similarity between these two homologous genes is no longer detectable so with the passage of time we can find we can we can find two homologous genes in species A and B here that are actually invisible so we can call these invisible homologs because no detectable similarity is present between them so another way that new genes that orphan genes and noble genes in general can be can result is de novo out of non-genic DNA so this is when an entirely new gene is going to emerge from scratch from a part of the genome that was previously non-genic such as an intergenic region and for a long time this was thought to be as so improbable that it was virtually impossible but we now know that this isn't the case because we found de novo genes so de novo emerged genes in any species that we will look at so to sum up we have two drastically different mechanisms by which noble genes originate we have sequence divergence we have ancestral gene that diverges and that produces two genes or gene families where no detectable similarity can be found between them and then we have de novo emergence where we have an ancestral non-genic sequence that evolves into a functional gene I'm going to focus on the on the first one here sequence divergence and the question that we wanted to ask is can we study sequence divergence beyond recognition directly can we study this mechanism of new gene origination directly which is something that has now been done before and the second question is so sorry so that amounts to basically identifying cases like the what I showed you before so homologous genes that have entered the twilight zone of sequence similarity where no similarity can be detected between them and the second question that we wanted to ask was can we estimate the contribution of this mechanism so sequence divergence beyond detection and to the total pool of genes without similarity so the total pool of noble genes which includes orphan genes taxonomically restricted genes and why this is important is because for a long time as I told you de novo emergence was thought to be extremely rare so the assumed picture was that sequence divergence is going to be is going to account for the majority of noble genes and de novo emergence is going to be very rare so I wanted to ask is there a way that we can actually test this and find out whether this is true so what we used to do this is that we took advantage of the concept of property of syntony so syntony is basically the gene order and along the chromosome and conserved syntony means that there is conserved gene order between two genomes two genomes and two different species that we're comparing so in the example that I'm showing here we have a block of four genes that are in conserved syntony which we have identified using sequence similarity as we normally do but then in the middle we have the yellow and the green gene here okay which we don't know that there's any sequence similarity between them and we're going to go ahead and assume that in most cases the yellow and the green gene are going to be homologous because they're found within this macro syntonic block of conserved gene order between two genomes so they're found opposite each other within this block now we're going to based on this we can then calculate the proportion of the number of times in where we have this arrangement but no detectable signal sequence similarity can be found between these two genes so the yellow and the green gene and then we need to take into account that this proportion that we have calculated based on genes that are within this arrangement is going to be an upper bound because so this is an estimate and of course there's a lot of there's there's going to be times where the genes that are found opposite each other in this in this scenario are not going to be actually homologous so we need to take this into account and finally we can use this proportion and which we have calculated within genes that are found in conserved micro syntony and extrapolated to all genes in the genome and we can do this if we know the genes that are found within conserved syntonic blocks and outside conserved syntonic blocks evolve at more or less the same rate so this is the idea to use the concept of conserved syntony so we built a pipeline using this concept and we applied it to three separate data sets using three focal species fly yeast budding yeast and human and we compared it to a number of species that are found progress in a progressively increasing evolutionary distance from each focal species so we have a number of pairwise comparisons using these three focal species comparing them to each target species separately and what we can we can do with this is that we can see how this proportion that again as I'm showing here on the left is the proportion of genes that have entered the twilight zone of sequence similarity how this proportion scales with time so we see that there's a very good linear relationship with time since divergence between the two species that we're comparing so each point here represents a pairwise comparison between the focal species and one target species and we can see that there's a good linear relationship so this tells us that divergence beyond recognition occurs occurs at a steady rate we can see that actually in human we seem to have a much slower rate from the other two data sets so this is the one takeaway that we get from this and the second one is as I told you is wanting to estimate the contribution of this mechanism to the total pool of orphan genes so what percentage of the total pool of orphan genes does this account for and we can do this in the following way so first again we have one rate one proportion that we have calculated within genes that are found in conserved symptoms so in this toy example here we have this proportion would be one out of six because in one out of six cases there is no detectable sequence similarity between these two genes and then we have another proportion which is calculated over all genes in the genome and this is basically the total number of orphan genes if we take a simple pairwise scenario into account and now we can extrapolate the first rate that I showed you so the rate in the bottom to the entire genome so this is an estimate that we calculate over a subset of genes in the genome but we can calculate it over the entire genome if genes inside and outside synthetic blocks evolve at more or less the same rate and this is what we found so I'm going to go ahead and speed up a bit here so we can extrapolate this rate and what matters in the end is to compare these two rates and we can do this we have the two rates that we're comparing here so in the solid bars we have a portion of genes that have diverged beyond recognition and in the transparent bars the proportion without similarity in general between the pair of genomes okay and if we take the ratio of these proportions we can see that the estimated proportion that is explained by sequence similarity is actually much much lower than we would have expected so on average this is about a third so a third of the total orphan genes in these pairwise comparisons that we have of the genes that lack similarity can be explained by a divergence beyond recognition so this is how much these mechanisms explain these mechanisms explain so on average about a third and I'm gonna skip this last part because I think that I'm running out of time yeah so if you just can wrap up please yes so I'm gonna skip this last part so just to sum up really quickly so we have a novel approach to detect pairs of invisible homologues based on conserved symtony between genomes we found that divergence beyond recognition occurs at a steady rate within a given film and then the main takeaway is that surprisingly we find that divergence accounts for approximately a third of genes without similarity on average across the different comparisons that we have and then the last one didn't have time to talk about but if you're interested more about this work it was published very recently in in life and this is the paper right here in the the bottom and I'd like to end by thanking and if I'm a class that was the group leader in the where I did this postdoc and and carbon is something inverse of Pittsburgh so our collaborator in this work and thank you very much thank you very much for very exciting insight in the evolution about genome the first question is so how sure you are about the evolutionary origins of these genes I mean for each particular type of origin what is the error margin that you have and can then also be a different type of origin let's say horizontal gene transfer or something like that and yeah that's a good question so for we don't really know what the the false positive rate let's say for one particular case would be and we know that we for sure over estimates the number of times that the genes found in this arrangement we have two homologous genes opposite each other but no sequence similarity we know that a lot of times these are might not be homologous so the actual proportion are likely to be lower than the ones that I'm showing here so this average proportion of a third might be might actually be lower we don't really know how much lower but in terms of individual pairwise comparisons of genomes it depends on how far away the genomes are that we're comparing if there's a big evolutionary distance between the species then the number of genes in conserved sentient that we can find it is lower so then this percentage has a higher error margin and if that makes sense so the the more the number of genes in conserved sentient that we have between two genomes that higher the the confidence that we can have in this portion is I see we have a question from Jonas don't try again so thank you for the nice talk he has a question about giant DNA viruses which are well known to have lots of orphan genes would this approach be applied to study these viral or orphan genes and that's a great question actually and yeah I've read about these huge viruses and that have a huge like crazy percentages of orphan genes and it depends on whether there is sentient conserved between viruses and I might I would suspect that there might not be that much but I don't really know because I have not really a well versed in viral genomics but it it is a very very intriguing question of where do these orphan viral genes come from and because viruses evolve so fast so they mutate so fast it's it's a valid hypothesis that they might be the result of divergence but it'd be really interesting to try to apply that to to giant viral genomes yeah I say we have one last question so okay so you have these genes which are homologous like distant cousins and they don't have any similarity but do they still bear some properties in common and yeah so I didn't have time to talk about that but I I also looked at the secondary structure properties and and composition properties between these between invisible homologs and we found that in a very small subset of cases there is correlation between over these properties between the different pairs and this is the case when genes so these invisible homologal genes share a protein domain so the proteins that they encode for share a protein domain so there's no pairwise similarity there but if you identify protein domains on both of them they share a common one and when that happens and you find that other similarity other properties such as secondary protein secondary structure tend to resemble that again in the absence of sequence similarity at the protein or DNA level between these genes so yeah there's a small percentage of case where this happens and presumably this might mean that they have a common function as well so the function is retained so you have one more question from Neil Pram are there any other studies that attempt to quantify the contribution of divergence and evolution of novel genes and yes so there's a number of studies that have taken a simulation approach and they have so there's a difference in the simulation approach in that you need to assume that every time that when you do a simulation and you see that you end up with a number of the percentage of orphan genes that are due to sequence divergence that doesn't really prove that they are due to sequence divergence but it proves that they could be which is also important now there's a so there's these studies that have taken a simulation approach which have who have more or less concluded similar things are as are a study and then there's a recent paper and a recent preprint I should say that is taken a more analytical approach and to to model how similarity decays over evolutionary time and in that work they find that and diverges accounts for slightly more than the majority so slightly more than 50% in their data set so and they give a range of course and our ranges and with theirs they overlap significantly but their estimates are slightly higher which is interesting thank you very much Nicholas I think we've run out of time so thanks to all our three speakers for very different and very exciting talks thank you very much for sharing your science before we finish we will run a poll so we'd like to hear from all the participants about how we could improve these online talks for you as audience members we are now launching a poll our question is if we change anything which two ideas below do you think we should prioritize please pick up two items you would like us to prioritize and select nothing you can also select nothing we welcome other ideas in the zoom chat as well and you can also email us so we will give you a few seconds to do this and I want to think to say at this point as well as if people if there are any attendees who would like to stay on to chat to the speakers today we can keep zoom open um so if you just say in the chat or put your hand up or something we can just get an indication of how many people would like to do that I think all speakers are available to stay on for a little bit longer um so we're happy to help you do that okay so I hope that you have filled in the poll which would be really useful for us to improve this online experiences and thanks again let's say Kevin and Nikos for sharing your work our online research talks series will continue on Thursday at 5 p.m. british time will be tracked by another deputy editor of elive that left a vital as so we will hear from these speakers which are now shown on screen and so please join us on Thursday if you have time so for updates about this series please follow elive community and there you can find the schedule and the registration information and finally I would like to thank all of you the speakers but also all the participants and once again you are very welcome to stay online to have an informal chat with the speakers thank you very much thank you very much Anna and all the speakers that was fantastic I'm going to stop the live stream now but we'll keep zoom going just a little bit see if anyone wants to stay