 All right, thanks Colin for the invitation and also for setting people's expectations for a interesting talk. Hopefully I can meet those. So today I'm going to talk to you about identification of fitness genes and genetic interactions and cell lines. So mostly human and a bit of a surprise at the end for you. So when I first started my lab, I'm not sure if you can see my pointer, can you? So when I first started my lab we were doing a lot of genome-wide pooled RNAi screens in cancer cell lines. And RNAi was a useful technology but we found that we were getting really frustrated trying to validate things from our screens and also even QC the screens that we were doing. So we came up with this idea of building a list of essential genes and non-essential reference genes that people could look at when they were doing genome-wide screens and QC their screens. And so here's just an example of how that type of thing worked in the middle grafts, a precision recall curve where you can see a decent performing screen in black and then a mediocre screen in dark gray and then a relatively crappy screen in light gray. So we had large sets of screens about 100 at the time and we used about half those screens to come up with what we called a set of core human essential cell line essential genes. And we simply defined that set as whether or not a gene was essential or a fitness gene was essential in half of the cell lines or more. And you can see here this was 291 genes if you essentially added up those essential genes which are pretty small list for humans and pretty pathetic and this was back in 2014 but at least it gave us a start, a place where we could start to QC screens and sort of put everyone on the same map and against the same standards. So then came along CRISPR and we tried hard to keep up with the broad but we couldn't. They beat us to the first large library but we were not too far behind. We built a large CAST9 based library and started to do screens in human cell lines to see if these screens could outperform our RNAI screens and if they were better at predicting essential genes. And so the way these screens work is essentially you build a library into Lenti virus and this particular library had about 176,000 guides in it and this was targeting over 17,000 human genes. We packaged that into Lenti and then we infect cells so that each cell on average gets a single SGRNA expressing cassette and then we track those guide RNA cassettes over time and under different conditions or different genetic backgrounds. And we read everything out by sequencing so the GRNAs act as a barcode. We can further barcode those with a cell specific barcode but we don't find that that actually gives us a lot more information for the cost. And so the first set of screens that we did was in five different human cancer cell lines shown here and we also included a screen that was done by Feng Zhang's lab at the Broad by Omar Shalan. And what you can see right away is the PR curves actually look pretty good for all of these screens but we were shocked when we looked at the numbers of genes at the same FDR that we would identify with our CRISPR screens relative to the SGRNA screens. So we were identifying about 2,000 genes on average as opposed to less than 500 with SGRNA. So we were pretty happy with this and realized we're probably more in the ballpark for the number of essential genes that a cell line needs for growth, division, proliferation. And so at least based on model systems information. And so we came up with this idea of a daisy model for gene essentiality and sort of conjectured that if we screened enough cell lines then we could define a core fitness, a list of core fitness genes just as we had tried to do with SGRNA data. And so if you actually do that and you remove, if you take a fairly simple definition and say, okay, is the gene essential in half or more of my cell lines that I screen, you can actually look at that list and then associate them with various properties. For example, core fitness genes are correlated to high levels or increased gene expression. They're anti-correlated to variation in gene expression. They're correlated to whether or not your gene encodes a protein in a complex. They're negatively correlated to mouse Dn to DS ratios and deleterious mutation frequencies. And they're positively correlated with numbers of axons in genes. So these features are general and some of them you would easily have predicted, but we had numbers to be able to actually calculate these out now. So we used that data and we used additional screens that came online from other labs at the time to empirically come up with sets of rules to improve our Cas9 guide designs and we got to version three of our library, which includes about 71,000 guides targeting 18 over 18,000 protein coding genes. And when performing screens with this library, we do better than our original library and the properties of the core fitness genes from this library also hold true. So we named this second set of core essential human cell line essential genes V2 and we used that to actually QCR screens now. Most of the genes in actually not all of the genes in the version one set overlapped with the version two set, but if you go beyond 684, you start to capture the rest of the V1 set. So if you fast forward a few years from that, we're now at many, many genome-wide screens that have been done in human cell lines and even some mouse cell lines and even other species. So here's just a very short list of some of the screens that have been published and some that are ongoing. So currently we probably have over a thousand different human cell line models screened at the genome-wide level with some pooled CRISPR library or another. And so this represents a lot of information, right? So at the bare bones, say there were a thousand different cell lines, you've got 18,000 genes in the human genome, that's a fairly complex matrix to deal with of information. So how do you start to actually use this information to move forward with, say, better functional categorization of genes in the human genome? And so one way to do so is to sort of do targeted-based selection of cell lines that may have certain features that you're interested in understanding genetic interactions with those features. So we did, this is an example of a case where we essentially did some screens in pancreatic cancer cell lines that had RNA-43 mutations. So RNA-43 mutations were originally discovered by the Cleaver's lab in Netherlands, and they were predicted of whether or not patients had certain forms of colorectal cancer. And so it turns out that in pancreatic cancer about 8% of PDAC patients have RNA-43 mutations or ZNRF3 mutations, which is a paralogue of RNA-43. And what these RNA-43 loss of function mutations do is they render pancreatic cancer cells completely dependent on the Wnt pathway. So we actually categorized the screens that were RNA-43 mutant pancreatic cancer cell line screens in one bucket and compared them with all the other screens that we had that were a wild type for RNA-43. And you can see here that we read out a bunch of Wnt-related genes that were important in the RNA-43 mutant, but not in the wild type cell lines. What was interesting from this set of screens was that Frizzled V was the top gene by far. And what's further interesting about Frizzled V is that it's one of 10 Frizzleds encoded in the human and mouse genomes. So people always debate it, okay, well, you know, there's 10 Frizzleds. They must jump in for each other when one's not there, then the other ones can take over. In this case, this is not the case at all. Frizzled V was very specifically required in these PDAC-related RNA-43 mutant lines. We went on actually to confirm this not only with genetics, but we built antibodies that were highly specific for Frizzled V and not the other Frizzled receptors in the human genome. We were able to prove that by knocking out Frizzled V in these lines or by treating them with antibodies against Frizzled V, we could actually impact the proliferation of tumors and cell lines that were RNA-43 mutant. So that's one way to kind of start to distill the data from all of these genome-wide screens. But there's also a lot more data that you can look at. And so one of the questions is, is there more of an unbiased approach that one could take to start to kind of organize this information in a better way? So you can think of this pretty simply. So we've got a library that's targeting all genes in the human genome, and then a number of different cell lines. This generates a matrix of data, and when you cluster that matrix on the gene side, you start to find things that are coessential or co-functional, right? And so there are a number of papers that sort of describe this approach, and you can build a network once you start clustering these things together. So you have a big dendrogram on the gene side. You can also cluster the cell line side, but the challenge there is that you don't really know what the genotype is that differs between, say, cell line C and D that are driving a specific pattern. And so you get really interesting data just on the gene side, and I'll show you a snapshot of some of the debt map data that's available online at the debt map portal. So this is a large matrix of 485 different cell lines and 18,000 genes on the vertical axis. So most of that matrix is dark, it's black. If you're blue, you're more essential. And so you start seeing stripes of blue across this big matrix, and I'm just showing you a slice of that large matrix. So about 25% of the matrix or so is blue in some way or another. But you can see these very prominent stripes, and if you just zoom in on the right-hand side, you can barely make out some of the gene names right now, but if you zoom in just to this little square or rectangle, you can see that basically this cluster of stripes is all the histone genes. So it's pretty amazing that you have a matrix of 485 cell lines by 18,000, you cluster the gene side just based on fitness profiles, and all the histone genes fall out right next to each other. So that's telling you something about how those genes are functioning in a common way amongst each other. I don't think I have to convince anyone that all the histone genes are probably doing something very similar in the cell. So like I mentioned, there are lots of really interesting clusters on the gene side, and that's one way you could imagine cutting down, say, a family of genes and just honing in on specific genes that could represent a specific phenotype. But then again, one of the challenges also is we don't really know what the genotypes across all those cancer cell lines are that really drive the clustering. So one of the projects that we started a few years ago was to think about how we could actually do systematic double mutations in a model of human cell line to start to understand double mutant interactions. So this is sort of exemplified by this schematic here. So if we knew what the differences in gene mutations were on the cell line side, we could start to cluster things on both axes and then generate a good understanding of what genes interact with each other and how they're functioning in a cell. So in order to do this, we started to collect and make our own HAP1 knockout cell lines. And our approach was essentially to take a knockout cell line and do a genome-wide screen on top of that. And so we have about 30 or so wild-type screens we've done over the last three years. These are just HAP1 cells where we can essentially track all the essential genes or fitness genes in HAP1 cells. But then we have a number of mutants too that we've screened, and we're up to about 200 now. And so this is a collaboration with Charlie Boone and Brenda Andrews, who are yeast geneticists and built the first double mutant genetic interaction network in the yeast saccharomyces serviceae. So they were very keen to kind of move their concepts forward into human cells to see what we could find. And Chad Myers, who's a computational biologist at the University of Minnesota. And so here's just a single screen I'm showing it for some of the data that we get. So this was an arid 1A knockout cell line where we did a genome-wide screen on top of it. You can see here that the best negative interaction shown in this red dot is arid 1B, not a surprise. The arid 1A and B are paralogs with each other, so we weren't surprised to find this interaction and it sort of suggested things were real. So we kept on screening, and this is one of our early matrices where there's about 100 or so different mutants along the top. And then this is just a stripe of an interesting cluster where I've boxed in red a series of negative genetic interactions shown in blue that are driven by a fast end knockout. So if you just look at the fast end screen alone, you can see here this is the fast end screen on the vertical axis and the wild type sort of hap1 or parental hap1 screen on the horizontal axis. The things that are off the axes are either negative interactions, so they interact with fast end knockout or positive interactions. So in the presence of a fast end knockout, when you knock out, for example, Tata1, the cells grow a little bit better. So what you can see from the fast end screen are a number of things involved in lipid uptake and lipid regulation or cholesterol regulation, and that makes sense. So when you knock out fast end, which incurs fatty acid synthase, you can't make lipids de novo, so you rely on exogenous lipids for lipid metabolism. And so, for example, LDLR is a synthetic lethal with fast end and so are some other genes. One gene we actually uncovered here, which I'll just mention briefly, is this unknown orph called C12 orph 49. There were no annotations for this gene, so we actually followed it up and found that it was an important gene for LDLR uptake. So we called it LUR1 for lipid uptake regulator 1, and it turns out that two other groups also discovered this gene and found evidence for it involved in the Shrebev pathway. So this is a great sort of way to discover genes and their functions. So how do we keep track of all this data? We run a large database, just like I guess you guys do. We track everything that we sequence. We track all the cell lines as much information as we possibly can. It's easy for people to go in and sort of look stuff up and even do de novo scoring within the database. And so all the scoring metrics and even different scoring metrics for the same data are available. And we also have a part of the web page that you can go in and you can look at different clusters of genes. So here's an interesting cluster where essentially all the genes in the heme biosynthetic pathway fall out beside each other. And you can see on the right-hand side of mapped out the heme biosynthetic pathway. The only genes we don't get in this particular cluster are two paralogs, Alice 1 and 2, and also this gene EuroD. So we're not sure why, but what's cool is you can, when you do this type of clustering, you often find genes like this red dot that have no annotation to them. But because they fall in that cluster, it provides an immediate hypothesis to go and test whether or not that gene is involved in heme biosynthesis. So I think this is a good way to start to fill in some of the unknowns and functional unknowns in the genome. But remember, this is only, you know, we're only knocking out two genes in this context. So maybe there's more than two genes that contribute to a phenotype, you're not going to find it. And we're only looking at fitness. So this is a single phenotype, but we're getting tons of information about how genes are talking to each other. We can also do other cool stuff we realized when we started looking at the data a little bit deeper. We realized that there were these clusters of genes. So if you plot out the genetic interaction scores by position on the genome, we often found these little clusters of genes that's denoted by this arrow that all pop up and they're all in the same region. So what it turns out that if you zoom in, you can see them, they're all here. So these are genes that, when you knock them out, it looks like they're growing better in the mutant. But what it turns out is that the wild type cells actually have a duplicated region in that spot. And the mutant cells don't. So the screens are so sensitive that they can actually differentiate between two cuts in one, right? So we're making a single cut in the genome, and that actually impairs your ability to grow just enough that we can see whether making two cuts is worse than one. So that's something to think about when you do CRISPR screens. So right now, we're basically thinking about how to fill in this double mutant interaction space, just probably not so different from how you guys are thinking about how to choose genes, right, to make knockouts. So if you think about yeast, yeast has 6,000 genes. There's 36 million potential di-genic interactions. These have all been mapped. There's a beautiful website. You can go and look at all the connections between two genes. In the human, it becomes a scaling issue, a major scaling issue. There's 20,000 genes. If we were to map everything, that's 400 million different interactions. So where are we today? So there's one group, the Jonathan Weissman's lab, published a paper where they looked at as a very, very small portion of this matrix, about 55,000 interactions, so you can see how that sort of lays out in the global landscape, 235 by 235 genes. Some of those are essential genes. Essential genes tend to give a lot of genetic interactions when you can screen them, but you can only screen hypomorphs because you've got to keep the cells growing. We're doing something a little bit different. So we're laying these sort of long stripes down. And so right now we've got, this is 150, but we're up to 200 now. So we have about 18 million tested di-genic interactions, and we hope to keep going. But we also have lots of other data sets, like this Brode data set and Sanger data set that represent many, many different cell lines of different genotype. And they're laying stripes down too, but we just don't know where to put those stripes in the context of double or multi-mutant genetic interactions. And we also have drug screens that can fit into this landscape as well and give us an idea of how the mechanism of a certain drug is working, and that works quite well. Now remember, we're just doing this by fitness. There's lots of other phenotypes. You can make a genetic interaction map for any phenotype you want. I would suggest that at least fitness is a good start. And then maybe some of the genes that don't have any interaction or show any signal in fitness screens, that's when you move to a different phenotype. I'll describe one later. So just like in the movie, unfortunately, no one can be told what the matrix is. You have to see it for yourself. So I think this is a product that hopefully, you know, we're doing this in Canada. And funding there is OK, not great. So we want to try to build this out and hopefully generate a di-genic interaction map for a model human cell line that can contribute and complement some of these other larger projects. Now having said all that, there are lots of limitations to the way that we're currently screening with CRISPR. And here are just a few issues that we've run into. So a couple of years ago now we started thinking about, OK, well, how can we multiplex this system to make it more efficient? And so many labs have been trying to do this. Our attempts at trying to multiplex SP-Cas9 were largely failures and very highly variable and uninformative and frustrating. The problem with most of these approaches right now is that they're all using SP-Cas9, variations of SP-Cas9, except for one called the big poppy system out of the road, which uses SP and SA-Cas9. But they're basically the libraries are built by random combinations of guides. And that restricts you big time in terms of how you can actually program libraries. And so the Zhang lab found this other enzyme called Cas12A. And this is pretty cool because you can multiplex guides and the Cas12 has two intrinsic enzymatic activities. It's got an RNAs and a DNAs. So the RNAs can cleave off all the guides if they're expressed as a single transcript and then load them into Cas12A, which is also known as CPF1. And the DNAs part of that enzyme can then go and in a sequence specific way find its target and cleave it in a double-stranded manner. Now the challenge is that no one's been able to figure out the rules for this or how to make it work in an efficient way. So the approach that we took was that we actually co-expressed SP-Cas9 and CPF1 or Cas12. And we came up with this thing called a hybrid guide. So a guide that contained both Cas9 and Cas12. So we can express that. And the way that CPF1 works is that it will cleave the CPF1 guide off, load it, and then you'll have Cas9 going to find its sequence target, Cas12 going to find its sequence target. The cool thing is that you can multiplex this too. So you can express a bunch of different Cas12A guides. And so this project was a collaboration with a Blanko lab at the Donnelly Center and two postdocs and a computational biologist. So the postdocs called this system chimera because it was a chimera guide. And their acronym is CAS Hybrid for Multiplexing, Editing, and Screening Applications. And chimera is a mythical creature with the head of a lion, the body of a goat, and the tail of a serpent. So the problem when we first started this, we had a couple of these chimera guides that worked, but we had no idea what the rules were. So we spent a long time making libraries and doing screens to understand what the rules were. And we came up with a convolutional neural net that allowed us to predict highly active guides at 80% efficiency. And so I'm not going to go through all of that effort, but what I will show you are some of the applications that we were able to actually use with this system to show that it worked well. And so we actually made a large library against human alternative exons to look for alternative exons that contributed to fitness. And then we also built libraries that targeted the same gene twice because we felt that, you know, maybe if you knocked the same gene, or if you took a large chunk of a gene out as opposed to just making an indel, we might get more information, it might be more sensitive. And we also took our system and we looked at all human paralogs that are duplicate genes, so strict duplicates. And I'll just go through a couple of examples for two and three. So this was the first data we got. So the libraries we were making to test the system and show that it worked were mostly targeting essential genes because we knew what to target and we knew what the phenotype was. So what I'm showing you in these box plots, essentially targeting core essential genes which are the red box plots with either dual guides where only cast nine or only cast 12 target that gene. The other guide in those dual guide pairs targets an intergenic sequence somewhere else in the genome that we know doesn't make any effect on fitness. And then on the left-hand side are dual guides that target the same gene. So the cast nine and cast 12 guides target the same gene. And what you can see here is this log-fold change, a negative log-fold change when the essential genes are targeted once by either cast nine or cast 12, you get a significant drop that's consistent with what we see using single target and cast nine libraries. But what's cool is when you target the same gene twice, we get a substantial increase in our negative log-fold change values. So this gives us a better signal-to-noise ratio that we can measure. This is also true in RP1 cells. I think it worked better in half-one cells because we only had one copy of the gene to target. RP1 cells, there were two, it was diploid. But you can see that targeting non-essential genes doesn't really have a big effect, but happy to talk in detail about how we controlled for all of those events. If you simply just add up the number of fitness genes based on these data, you can see that you get a significant boost by having dual targeting guides in your library for the same gene. So we can go from just around 1,800 genes at an FDR of 5% or less to over 2,500 by dual targeting in half-one cells. We get a much better effect in RP1 cells, probably, because they're diploid and there's two copies of the gene and you get four shots on goal, essentially. So the other thing that I'll just mention briefly is our ability to target paralogs. So we designed a large library that targeted all human paralogs. And so here's an example of some of that data. You can see here for HAP1 cells or RP1 cells, for example, targeting SAR1A and SAR1B alone don't do anything to fitness, but if the double mutant has a profound effect on fitness, same with SEC23A and B. So these three top three genes are consistent in terms of their effect across HAP1 and RP1 cells. But I added a couple of interesting results that we found from RP1 cells. So for example, targeting TET1 or TET2 had this very slight negative effect on fitness. But when you target both of them together, it seemed to eliminate that. And again, here's something else, a positive interaction where targeting SDK38L or SDK38 alone didn't do anything, but the double mutant had a very strong positive effect on fitness. So here are just some numbers. There are 672 human paralog pairs, 90% of the predicted duplicate paralog genes in the human genome. And what we see is that 33% of those, or 219 pairs, have a non-additive fitness phenotype in HAP1 cells, 18% in RP1 cells. The majority of the effects are actually negative effects. So most of the effects we see by targeting paralogs have a negative impact on fitness. And so maybe this system could be useful for sort of going deeper into the genome for functional discovery. Now, we haven't forgotten about mouse. We've also made a mouse library. And we did this specifically because we were interested in genes that regulated cytotoxic T cell killing. So this is an important clinical phenotype. And so what we did was we took six different mouse syngenaeic tumor cell lines. And then we expressed antigens on the surface of those lines that rendered them sensitive to T cells that had transgenic TCRs targeting those antigens. So we took those libraries. And we actually performed two separate screens in parallel. We did a fitness screen. And we looked for mouse essential genes in each of these backgrounds. We also did a T cell killing screen to look for genes that sensitized or resisted to T cell killing. So I'm not going to go into much depth here. What I will say for this crowd is that the mouse library worked really well. We were super happy with the mouse library. We generated a test set of core essentials and non-essentials that we could use based on expression data from mouse tissues. And these are just the PR curves across the six cell lines. Most cell lines, you can see that the essentials and non-essentials worked extremely well. We were able to actually find all the essentials we were looking for in most of the cell lines. And from this data, we were actually able to also redefine the set of core essential mouse genes from these types of screens so we could have a better sort of functional set that's been validated. And I'll just end on this last slide here to show you that based on some of the dependencies in these mouse cell lines, very similar pathways to what we find in human cell lines, but they represent a diverse set of genetic dependencies that allow us to look at some of the effects of T cell killing in these backgrounds. And I'm happy to talk to you more about what we found in those screens if you're interested. But I think I'm out of time, so I'll stop and acknowledge everyone in my lab. The JIN team, which is the genetic interaction team, which is a collaboration with the Boone and Andrews labs and the Meyers lab. And I tried to acknowledge people along the way, so I'll stop there. Questions? So the paralog targeting is really interesting that you got some of the sort of counterintuitive effects, I guess. I'm wondering if you looked at whether there was any relationship with sort of the time sense divergence or the duplication of that original gene into those two paralogs. And if there was some relationship, do you see that switching from redundancy to new effects, et cetera? Yeah, that's an interesting question. We haven't really looked at where paralogs have come online in terms of the human genome and mapped that to our results. We're working in a weird cell line, which we always get criticized for, but we have to start somewhere, right? It's just if people argued over which yeast drain that should have been used in the 60s, they never would have started. So you got to start just like you guys are using C57, right? So we haven't done that, but it's a really good idea. I'd love to do that, but we haven't. The other question I had was sort of you noted at the beginning, and I'm just curious from a screen perspective how important the time course element is. And so I'm guessing within that you take cross-sectionally, you take, and you sequence, and you determine the depletion at time points following selection, how important is that in terms of information gained in order to define that course? So predicting essential genes, single-mutant fitness is really easy. Double-mutant fitness and genetic interactions is very, very hard. So we've spent probably two and a half years trying to score this data, and we've come up with ways of being able now to essentially match, time-match screens with the wild type so that we can actually answer that question. So almost all the screens we do are with multiple time points because we're trying to time-match things to predict genetic interactions. But for single-mutant fitness, it's easy. You don't even really have to do it. We can correct easily for single-mutant fitness effects. Hi. You mentioned something about a big antibody resource, basically, that you guys have created. I wonder if you could just talk a little bit more about that because you went through that really fast. OK, yeah. So actually, it was a U project that started many years ago, a roadmap project that was antibody-based project where there were two teams essentially that were set up to go against each other. There was the monoclonal antibody team that was run by Jeff Bucca. And then there was the synthetic antibody team that was run by Jim Wells, Tony Kosikoff. And then there were a couple of wacky Canadians in Toronto that were helping out with Jim and Tony. And so based on that effort, it was called the Recombinant Antibody Network. There's a website for it. I can send it to you. We made a whole bunch of different antibodies for various targets. That initiative was based on targeting transcription factors, which are kind of a bad set or class of targets to go after with antibodies. But we had a separate initiative where we go after self-surface targets like frizzles that worked much, much better. So we did get a bunch of antibodies against transcription factors. And they're all listed on that site. But the self-surface antibodies were separate set. But I'm happy to talk to you afterwards about how we did it and what we've got. OK. That'd be great. Thank you. Hi. Thanks. Hi. The, you may have said this because I got interrupted in the middle of the talk. The synthetic lethals in yeast, have you looked at to translate that to what happens in human cells? Are they the same genes, orthologous genes? There are a number of relationships that are conserved, but not a lot. So we can't predict what's going to be synthetic lethal in human cells based on yeast. And even going from Saccharomyces cerviciate to Schitzel Saccharomyces pombae, there's only 30% conservation amongst genetic interactions. So it's a hard ask. But there are some very specific interactions that many people have published on that are conserved. But systematically, it's not a good predictor. And one other question was, if the scale is up to go after the 400 million interactions, is there other phenotypes you can look at at scale besides in this? Yeah, that's a great question. I mean, that's why our second phenotype was T-cell killing. And that took a long time to set that assay up. So I think there's all kinds of cool phenotypes one can look at. I wish we could accelerate the process. And even if you're looking at the transcriptome, if people have argued that the transcriptome is 1,000 phenotypes, if you can do single cell and you get the transcripts, you get 1,000 phenotypes, but the reality is you're just looking at the transcriptome. So it's another phenotype. How diverse that is, I'm not sure if that's going to get us to the end where we can build a very comprehensive and robust understanding, even one cell type. So we don't even have that for HeLa cells. I mean, we don't know what every gene does in HeLa cells. So I think we have a long way to go. A lot more. So is this all just non-adherent? Are you using non-adherent cells from robotic liquid handling? Actually, so the HAP1 cells are adherent. They're haploid. But then we actually converted them to non-adherent because it's easier to screen them. We can do it in spinner flasks. We can reuse the spinner flasks. We're using so much plastic. We started feeling really bad. So, OK, thank you.