 We're going to switch gears a little bit here, and as Colin mentioned, we're going to talk a little bit about some pilot plan work in COMP to phenotype mice, I guess, without making adult mice. So Steve and I are going to be talking about a couple of connected items related to the development and piloting of a pipeline to analyze haploessential genes that we're coming across through the COMP work and IMPC. So I'm going to talk mostly about the plans that we have in place right in the midst of beginning the process of testing this out, and then Steve is going to talk a little bit about his experience outside of COMP actually using founder screening as a mechanism for phenotyping. So everything I'm going to be talking about today was a collaboration to get this project started amongst the three COMP consortia sites. So this is a collaborative process to get this going. So as most people in this room know, about 25% of the IMPC targeted genes have been identified as being recessive lethal, and we've obviously created a robust production in embryo phenotyping pipelines to capture the gene essentiality and the phenotypes associated with them. Just because for the sake of time, you know, to summarize what we've learned a couple of key important points is that the human disease genes are enriched from essential genes, so we can see that essentiality human disease association. And more recently in the submitted paper from Pilar, Violetta, Terry, and Damian that the essential genes, these recessive lethal essential genes can be divided into two broad categories of cellular and developmental lethal genes, where the cellular lethal genes map to genes that have been deemed to be essential for cell fitness in the human cancer cell line screens that have been performed with CRISPR-Cas9, and then developmental lethal genes that kind of fall outside that category really don't map to cellular essentiality and are important for more organismal level developmental functions. What we are however not capturing is another subset of genes that are dominant lethal haploessential genes. So in humans it's estimated that up to 3,000 genes cannot tolerate loss of function of one of the two alleles, which we all know is haploinsufficiency, and this can be categorized as those that are either compromised viability, so they're embryonic lethal in humans, or cause profound loss of fitness, so disease-causing mutations. So there are hundreds of de novo rare dominant disease and syndromes that have been identified that are caused by loss of function of one of allele in humans. And there is emerging evidence of contribution of these types of mutations to miscarriages associated with eupoid pregnancies. So they actually represent very important types of mutations that we should be collecting phenotypic information on. So as with the human disease recessive lethal association we've seen in the mice, we've hypothesized that dominant lethal haploessential genes in mice will also be enriched for in this case human haploinsufficient genes. So in this case, obviously, those genes that are also lethal in humans, and also those genes for which loss of one allele causes severe disease phenotype where we cannot, for our best case scenario, cannot get these mice to go beyond the founder stage at latest. So currently our IMPC pipelines are not designed to produce or phenotype haploessential genes because we can't propagate this line forward, either we can't make a founder or the founder can't breed or the founder dies. So we've seen this evidence of this within our production information. So this has really been an interesting project to develop because it really has bridged the gap of problems that we've seen in production and actually having to devise mechanisms for phenotyping these types of things. So the embryo people have had to learn some production things and production people like me have had to learn way too much about embryogenesis. So yeah, scary for me. So this is one of the earliest examples of seeing this. This is data from Laurel in our CRISPR-Cas9 production paper. And all the data that I'm going to be talking about will actually be from CRISPR-Cas9 work. The ESL work is a little bit trickier because we have a lot of other technical issues running around like the germline transmission rate from the ESL library itself and appropriate targeting of the allele. So we've kind of scrubbed that data out looking at this type of analysis. So the genes that we've attempted have been, we broke down into what we refer to as non-essential genes and essential genes. So this is basically using the data from the paper Current Land Review from Pilar and Violeta where they actually use some algorithms to establish and identify genes that are very likely to be human cell essential, the human orthologs or non-cell essential. And we divided our attempts into those two categories. And there's two pieces of information here. One is the scatter plot and the box plot that goes along with it that shows founder rate and the other are these two red lines shown here, which is the germline transmission rate of these particular classes of genes. And you can see that in these mouse orthologs to human cell essential genes that we've one see a very dramatic reduction in founder production rate in general and also germline transmission rate. So there's something going on here that is making it much more, much less efficient in generating these lines for essential genes. One could say, well, maybe they're homozygous lethal and that we're just creating founders that have both copies deleted. As most of you know, the mechanism that we use to generate knockout alleles for the COMP project is actually based on some flavor of an exon deletion involving at least two guides and to generate these knockout alleles. And we know from our data that in general, our founder animals are typically heterozygous and are often mosaic. So although that could be in some cases that we are just really editing at very high efficiency in creating homozygous animals that are dying on us, or there's actually some haplocentrality going on within the data. So if we actually categorize what the germline transmission failures kind of fall into as far as what we see when we can't actually generate a line, the first is that we don't produce the founders at all. And like I said, that could be due to a couple of things, either technical issues where our guides are not cutting or we could actually be over-editing or generating these homozygous knockouts. Also, the possibility is that we are seeing haplocentrality from the genes. So we're not able to produce the founders and associated with that are excellent lethal or disease associated genes where we're also not being able to produce founders. So it's a flavor of, I guess, haplocentrality. The founders can die before breeding, of course, that could be due to bad luck. Also, it could be due to haplocentrality or mosaicism in the founder animals of haplocentral genes for the knockout that allowing them to survive for some period of time. This is an interesting category, and it's really for us where we started thinking about haplocentrality is this founders produce and transmit only wild type offspring. So again, this could be something as simple as a genotyping error, or it could also be mosaicism for gene essential for germ cell development. But it also could indicate mosaicism for a haplocentral gene that allowed the founder to survive, but the offspring who are obligate heterozygotes are not able to survive, leading to only wild type offspring. And then, obviously, founders do not produce offspring at all. Again, this could just be the process that sometimes happens when animals just won't breed, or in this particular case we could be dealing with haplocentrality for a reproductive function type of a gene. Although we are mostly focusing on lethality, this is also an interesting class of genes that we can start working on capturing. However, from the data that I've seen, this actually represents a very small amount of the issues we have with germline transmission. It's these other three categories that are the main problem. So at Baylor, one of the things that we saw that actually got us thinking about this, especially because of the potential disease association between essentiality and human diseases, is our data from, as we've talked about, these human discovery partner that we have. And we started looking at, and a lot of the sites have seen this as well, as when we started trying to generate knockout mice or knockout lines for genes that have been nominated by our discovery partners, we actually started to observe more of an issue of getting germline transmission. So these are some of the discovery groups that we work with at Baylor, some of them through COMP in general, some of them, that's the specific sites at Baylor. And this is our data on germline transmission for what we call nondiscovery partner genes and those nominated by our discovery partners that are more disease associated. And you can see that we actually get a substantial drop off from 74% germline transmission to 65% germline transmission of our lines, which is statistically significant, which again is suggesting that there is some disease association there. So if we actually started to ask whether or not there was evidence for haplocentrality and whether or not there was constraints on loss of function of these genes within the genome, we actually went back and took our assigned genes in total. So all the genes that we've assigned, either we've started production, awaiting production, just basically what our gene lists are in total. And looked at with the PLI score, so the probability of being loss of function and tolerance is for those that are nominated by discovery partners that are likely disease associated in our general cast of genes. And we can see that we do get a statistically significant increase in the number of genes that have very high PLI score, so highly likely to be intolerant to loss of function. And on the right is just shown a distribution of these. So we can see that in this high likelihood intolerance group, we do see this increase. We also took a look at this from the standpoint of just looking at our germline transmission success versus failure list, independent of disease partner association. And again, what we saw is for those lines that we actually have no germline transmission, we've seen enrichment for an increase in PLI scores that are indicative of restraint on loss of function. And again, if you look at this group of genes that fall in the very high likelihood of intolerance, we see this pretty dramatic increase within that gene list. So we actually think this is, again, looking at the Baylor data that we see evidence for these genes that have constrained on loss of function. So if we go to the comp level and look at two different particular paradigms on this, one is do we produce founders or not, which is a category of germline transmission success, and then overall germline transmission success based on all of those four categories, we again see the same trends, where for those that we are unable to produce any founders, we have a higher average PLI score. And for those that we do not see germline transmission, we see a higher PLI score, which again is indicative that we are potentially capturing in our data set and list genes that are haploessential. So from that preliminary data, and because we do not have a way of otherwise phenotyping these lines, we submitted a supplement request to actually develop a pipeline to assess for these. So the overarching goal of these supplements at the three sites is to develop a robust yet rapid null phenotyping pipeline to characterize dominant and recessive lethal genes. I'm framing this around the importance of using this pipeline to capture haploessential dominant lethal genes that otherwise can't be phenotyped, but this pipeline also could potentially be used for those that we have bioinformatic information of even recessive lethality in just going to a screen that does not require generation of a line. So the idea of this line is to develop a list of genes that we prioritize that are likely to be haploessential or potentially recessive lethal, and to develop founder-animal-based screening platforms that can look for pre-implantation lethality phenotypes, which we believe probably are going to be more tightly associated with human cell level essential genes, and also founder screening platforms that will allow us to look for post-implantation lethality. So again, those more likely to be involved with developmental lethal genes that are involved with organogenesis patterning and other types of things like that. So as usual, all three sites have a framework of the same type of aims to do this with slightly different flavors to the aims. So these are the three Baylor aims is one to generate the list of prioritized genes to look at, develop a platform for pre-implantation phenotyping, and I'll show examples of this where we're actually going to be using an embryoscope to look at the development from the two cell to the blastocyst stage for developmental defects, an annotation of the imaging, and then a quantitative genotyping, and then post-implantation phenotyping, where we will introduce the CRISPR reagents, generate the embryos, and transfer them, and then directly at embryonic day nine and a half look for developmental abnormalities. So again, this is all founder-based phenotyping. And all the three sites have some flavor of this, and I'll show their examples as well. So just as an example of what we're doing, this is the embryoscope that we will hopefully be showing up in the next couple of weeks. That will allow live imaging of cultured embryos. This will allow us to do 240 embryos in one go. And as you can see on the video that's being shown on the right, that we can get some pretty good detailed imaging of the embryos as they develop from the, in this case, a single cell stage to the blastocyst stage. So we're going to be looking for developmental milestones, such as cleavage timing, cavitation, and other types of phenotypic events to see if we have association with developmental abnormalities. These will have to be pulled for genotyping, which is an issue that we'll talk about in a bit. So the E nine and a half pipeline will take advantage of the existing embryo screening pipelines that we have in place at all three comp sites. This is just the version of the Baylor workflow. Everyone's targeting embryonic day nine and a half to look for abnormalities. I think all sites have in mind that if we don't see the effects of embryonic day nine and a half, we might, if we have time and money, walk it further down the line and see where we can find developmental abnormalities. This is the the aims and workflow for DTCC. So this is work being done at UC Davis and TCP. Again, they have the same type of scheme, identifying genes, screening for pre-implantation, embryonic lethality. Those that they do not see embryonic lethality at those time points move on to post-implantation screening, such as, as I gave an example, E nine and a half. DTCC is looking at also doing this at different genetic backgrounds. Look at modifier mutations that might, or modifiers in the genome that might influence timing of lethality or lethality at all. And then finally, TCP will be doing tetrapolycomplementation assays for some of these lines that have later developmental abnormalities to see if they can rescue embryonic tissues, they can rescue embryonic lethality by complimenting the embryos with extra embryonic cells that are derived from wild type embryos. So again, they're looking for extra embryonic placental defects that might lead to lethality. The Jackson Lab is also doing this. They have a slightly different aim one where they're going to be using ESL based screening in different genetic backgrounds of ES cells to identify cells that are genes that are more on the cellular level essential, kind of like the human cell based screens. Their aim two involves both looking at the blastocyst stage and the embryonic day nine and a half stage like the other sites. And again, they'll also be looking at genetic context and their aim three about again looking for genetic modifiers that might influence either the phenotypes that are observed or the stages of embryonic lethality. So just as kind of some nuts and bolts about what we're working on right now as we're starting to get these pipelines going. So the first part was actually to work on a target gene list for the haplocentral pipeline. And thankfully we've started to work together on developing a unified list for us to all work from. So the first goal was to identify genes with null allele GLT failure across the IMPC. So we focused on the ones from the CRISPR-Cas9 work and on genes that had one-to-one mapping to a human orthologue. We developed, we look for other annotations within these sets, work that was done with PLAR to pull in some evidence from the human orthologues for likelihood of constraint on loss of function. This include PLI scores, other mechanisms of prediction of autosolodometer recessive loss of function, phenotypes, disease associations from like OMIM, human cell essential screening data, and also the studies in human populations on tolerance of knockouts or complete loss of function. So from this we were able to currently devise a list of 426 genes that had germline transmission failure when we attempted null alleles. 94 of those actually fall within the category of human cell essential. So likely to have very early embryonic phenotypes and 302 of those fall within the human cell non-essential category that was developed in PLAR and violetta's paper. And in this case, 26 ex-link genes were on the list. And this pretty much represents, if you do the math, the normal distribution of ex-link genes within the genome. So we are not necessarily seeing an enrichment for ex-link genes. So one of the interesting things that I did is I was actually curious about if we can, because this was done in PLAR and violetta's paper, looking at categories of genes and gene function in our list. So I ran the entire list. I did not filter it based on likelihood of loss of function and tolerance to just see what types of gene categories we had involved. And we actually observed something quite striking. We actually get quite a strong enrichment for RNA, what I call RNA biology. So mRNA processing, RNA splicing, translation, ribosome assembly, and splicing. I think I was on there twice. So this is from go-biological processes and CAKE pathways. And interestingly, and I don't have this, if you separate the data out, most of these are actually associated with the cell-essential genes. The ones associated more with proteasome, oxidative phosphorylation, metabolism, and transport are more associated with genes that are fall within the non-essential category. So there might be some stories there, but it's actually interesting to think from an evolutionary standpoint why we'd have such an intolerance to loss of function within a category of genes involved in RNA biology. So just as an example of what we're working within this list, this is data from Steve Murray to actually try to, as a representation of filtering down this list to the most likely candidates to enter into our phenotype screening. So the first, he started with this list that I talked about with these GLT fails and went through a process of filtering out of them, high PLI scores, domino prediction of likely or very likely dominant effects on function. Cell essentiality status, he considered both because we're interested in both cellular and developmental level essentiality. And then one of the filters that he added on was whether or not there's an existing mouse knockout, which makes sense since we're interested in haplocentral genes. And if a knockout exists already, it's probably not one of those. And then he actually did this filtering for the sets of genes that had been worked on at JAX. And this is kind of the filtering down that he got for on the JAX specific genes, where you can see that as you start building these filters in, we actually start whittling down this gene list to a smaller number of genes. So we have a handful of genes at each site that fall within these essential and non-essential categories or deemed to be likely have high constraint on essentiality that we're working on. The list is unfortunately not as probably long as we want it to be. So we're starting to think about how we want to annotate the list moving forward. So the first thing is, as we start thinking about it, this is incorporating other existing data, such as the mouse knockout data from MGI that Steve worked with, maybe also starting to look at the INPCESL library for genes that we were not able to target. And also maybe looking in the ENU library and the libraries of ENU screens that have been done, whether or not there's genes that seem to be intolerant to ENU mutagenesis. Integrating embryonic gene expression profiles to start looking at genes that might be important at particular time points and considering other metrics of human intolerance to loss of function that exist in data. And really what we want to do is to take this and take these other filters and add them in so we can move beyond the genes that we know already have germline transmission failure and can actually explore the entire genome using this type of information to narrow down to those that are likely to be haploessential or haploinsufficient, at least for humans. And we're really hoping to generate a target list of 200 genes to start with. So the last thing that I'll just talk about briefly is the actual approach to making the null alleles. And the reason I bring this up is that we are planning on deviating from the normal INPC comp allele structure. So I wanted to just mention this and this can be a point of discussion and go over why we decided to do this. So as I've stated and most people here know as we do an exon deletion approach to generate the knockout alleles for the haploessential screen, we're actually going to agree to move towards a indel approach to generate the knockout alleles. The primary reasons for this are shown here. So first, we had some concerns about the efficiency of generating null alleles by introducing the two guides or four guides to generate an exon deletion. While that is sufficient enough for us to generate enough founders to move a line forward when we're gonna be screening individual embryos, we wanted to maximize our mutagenic capacity in these so we had enough embryos to screen. The ease of detecting technical failures, so there will be in-frame deletions that will be created from this, some of which may not be pathogenic. So if we see them in otherwise normal embryos, we know that one, our guides worked. It was not a technical issue. Without that information being there, it would be very hard to tell if we were just not seeing editing at all or if it was actually lethal at some point. And then compatibility with different quantitative genotyping approaches, which basically you only need to do one PCR assay in this case as a basis for identifying all these events. One of the cons to taking on this approach is understanding the phenotypes associated with in-frame indels. While some of them might not be pathogenic, some of them might be. They might have a different phenotypic spectrum from loss of function alleles. So that is a complication in understanding what those mean if we see phenotypes. And these are the basic allele quantification mechanisms that we're using. I know Steve will talk a little bit about some of these in his presentation, just using deconvolution of senior sequencing data or doing a multiplex targeted deep sequencing approach to get an understanding of this. Because really in the end what we need to know is when we start screening for these phenotypes is whether or not we're having an issue with mosaicism or a disease spectrum or phenotype spectrum associated with mosaicism in these embryos. So just a list of challenges. The last set of slides before the summary. I know Steve's going to talk about this in some level in his presentation as well. There are going to be challenges for genotyping and allele quantification based on the amount of material we'll have and whether or not we need to consider regional mosaicism, which I know Steve will talk about. There's phenotyping considerations to still take that we still need to consider, including controlling the frequency and the types of phenotypes we just see with embryo manipulation, understanding the phenotypic impact of the in-frame deletions, mosaicism in the spectrum of phenotypes and then finally sample size, which we really need to get a more empirical information about our editing frequency to understand how many embryos we'll actually have to look at. And then finally coordinating with the DCC about a data collection and the ins and outs of actually having to annotate individual embryos that will have different genotypes and potentially different phenotypes. So to just summarize, up to 15% of human genes may not be tolerant to loss of one or two alleles. Our existing pipelines really can't capture that and we're really hoping to use this founder-based, CRISPR-generated, null-aleal screening platform of pre-implantation embryos to start annotating phenotypic information on these genes that might have very important human disease correlations with them. There's challenges that we still need to address with genotyping and phenotyping during the pilot. And then I've added a question mark under Lydia's comment on this, is maybe that these are the set of genes that we really do need to consider conditional legal production. If we're really going to think about that, and no Neil's not here, he would probably love this, but this is one set of genes that we should maybe consider that type of approach. So, and then just to thank the three comp sites that have contributed their slides and then some individuals that actually helped with the data analysis parts of this. So, I think it might be best for Collin's up to you really if you want to wait until after Steve to do a discussion on this and questions. So, this just actually brings in Jason as well, is in terms of this gene list and what might be haplo-essential, I'm wondering if the duplicated regions in the half one cells that you know of might be enriched for haplo-essential because the reason they're duplicated is they need two copies or something like that. And I'm just wondering if maybe we should look, if you know what those genes are by and large, if maybe that might provide some additional information. I mean, obviously there's a difference between organism haplo-essential and cell level, haplo-essential, but I'm just wondering if that might give us some ideas or some pointers. Yeah, so the interesting thing is there's only two regions that are duplicated in the half one cells and they've actually been shuffled out, but every time we make a clonal knockout of a given gene and then amplify that clone out and have multiple different clones, this is where it starts getting more interesting because we find different regions of the genome that seem to be amplified or duplicated. And we don't know whether or not it's because the query that we're going after when you knock it out is selecting for that and that's what we're trying to find. So, it's a good question. I'm not sure there's a systematic and easy way to figure it out, though. Yeah, so my question was on the, actually relates to Jesse's presentation earlier, and that is, do we need to consider the outgrowth assay since we know now that we can't realize directly on digital aging? Yeah, and I meant to mention that at the end of the, going over the three workflows. I would say yes, that we do need to explore that because I think we're, what I'm more worried about is the frequency at which that category in general was a very large, like 15, 20% range category. Yeah, I think it's something that we're definitely gonna have to consider doing because I think we've got a hole there. And I 100% agree with you on that. So, it's something we need to consider. Actually, unrelated comment, but certainly we can talk about that. I was thinking, if you're gonna go through this effort, you wanna make sure you have the right genes, obviously. Make a few conditionals, make sure they are. Or even quicker and a little dirtier, do some RNAi in embryos, make sure the null phenotype is lethal. If it's not, that would tell you these aren't haploinsficient genes, right? I don't know if RNAi would be quicker than what we're doing with CRISPR, but it might be, yes. And you can easily test to make sure it's working, right? Yeah. To make sure you have the phenotype. Yeah, the time, I would agree with you that doing conditionals first might be a way to do that, but the timeframe that we have to deal with this supplement kind of makes that a non-starter on this at this point. But in the future, if this is something that we continue to explore, that's something that we could do. Are we going to, before we leave the meeting, are we gonna finalize the list for the pilot? I'm sorry, what was that? Before we leave the meeting, well, and we're all right here. Can't we just decide on the list before we leave? I think we, right now, I think we have a working list, at least from my view. I mean, the piece that we do not have in the existing working list is what Steve did, which is the annotation of known MGI across the entire list. That's something that we've been doing manually for ourselves, but I think the metrics that Steve has on his slide are the ones that we've been going with and have provided a significant number of genes that we can definitely get this started. So, high PLI, domino prediction of dominant mode of inheritance, both cell-sensual and non-essential in an existing amount. I think that these can be enough to get us to a starting list. So, I think the idea was to repeat that genome-wide, not just focus on our production failures, which I think was sort of made a lot of sense, but I think to get more genes that are similarly constrained in the genome and likely to be in that category, I think it would be valuable to sort of repeat the analysis in the broader set, which Pilar and Vila provided that full list, so now we can mine that and compare it to MGI. Yep. So I think the plans are all there. It's just a matter of regenerating that new list. So, and Pilar's in here someplace, I believe she was here hiding. Oh, there she is, she's hiding in the corner. Yeah, so we've had those conversations have happened, so we should be able to, we have, I think the plans in place. So, yep, go ahead. Do I have to escape now? Yeah.