 So, I'm John Stamatianopoulos from the University of Washington and Altius Institute and one of the co-organizers of the meeting to be chairing this session where we have now been appropriately set up by Nancy's talk, really kind of exposing the tidal wave of connections between gene regulation and disease that are coming forth. And our first speaker is one of Nancy's proteages, Barbara Stranger, talking about the genetic regulation of gene expression variation. Thank you. It's a real pleasure to be here. I really appreciate the invitation to tell you about some of our work. So I've been working on the genetics of gene expression regulation for a while, and I'm going to tell you about one particular project in immunology that we've been doing. So the project is the immunological variation project, and I'm going to tell you about how we've done QTL mapping in baseline cells but also in activated cells of representing adaptive and innate immunity, and we're looking really at context specificity of association. So really interested in this question of what does a particular polymorphism do? Does that function change with context? We're going to talk about how this helps us to understand disease risk and not just in resting cells but also in individuals' response to various stimuli and being activated. So let me start by telling you about the project. The goal of this was really trying to understand how genetic variation translates into gene expression variation among individuals in both adaptive and innate immunity and how that relates to higher order phenotypes. So this is a collaborative project that was started when I was at Harvard, and so involves a lot of people at Harvard, also Stanford. And we collected a cohort of just over 400 individuals in the Boston area representing three different ancestry groups, so European Americans, African Americans, and East Asians. We sorted cells, CD4T cells, and monocytes from all of these individuals, and we also had two branches of the study where we activated all of these people's T cells and where we activated monocyte-derived dendritic cells for all these people. But I'm first going to focus just on the baseline QTLs in these two cell types. So we did transcriptome profiling on arrays at the time. This was a pretty big project in terms of the number of conditions and states and populations that we were doing. So unfortunately, we didn't have RNA-seq for this, but we did have the exon arrays, and genotyping all these people, and we did EQTL mapping, both cis and trans. But I'm going to focus on the cis-EQTL analysis. So for every gene, what we did is we defined a window around each of the genes, so a two-megabase window around the genes, and we're trying to test for association between SNPs in that region and expression levels. And so what we're looking for are patterns of what you see in this kind of, this example here, where there's a SNP, where the individuals each represented by a dot are associated, the genotypes are associated with expression levels. So you see the people carrying the T, they all have lower expression. This is what we're looking for. And for those of you interested in the technical details, you can see how we built the model and how we assess for significance. But the key for everybody to know is that we analyze each of these populations and cell types separately, and then we also do a multi-ethnic meta-analysis for each of the cell types. All right, so this, I don't want you to really focus too much on the numbers in these plots, but are in this table. What I do want you to see, that didn't come out very well, I'm sorry, is the numbers of genes that we're detecting with at least one cis-associated SNP, or what you're seeing in this column here. And it scales with the sample size, right? Here's the sample size for each of those that went into the analysis. And we do see more EQTLs in the monocytes than in the T cells. And if you look across these rows, you can sort of get some idea about sharing. But we're going to get into the details of sharing across populations and across cell types in just a minute. All right, so this is a big overview of Manhattan plot of the associations. Each one of these dots is the most significant SNP for each gene that had an EQTL throughout the distribution of the genome. And so in the top panel, those are the monocyte EQTLs, and the strength of the association is going up the axis. And in the bottom panel are the T cells and the strength of the association going down. And so highlighted are some key immune genes people might be familiar with. You notice these are pretty strong, some of these associations. These are on a log 10 scale of the P values. We did notice something interesting here, too. I just wanted to point out that about 17 percent of the genes that we identified as having a cis-EQTL had more than one signal. So if you condition on the most significant SNP, there were still another significant SNP there suggesting more than one regulatory variant in the region. And so on a whole, about 30 percent of the genes that we tested had at least one cis-EQTL. All right, so let's talk about context specificity for a second. So in the EQTLs, I'm going to just give you sort of some cartoon examples of how you could imagine manifesting context specificity. So if you think of two conditions, this could be two different cell types. In our case, it could be different populations. It could be males and females, activated, not activated. So you can think of an EQTL being present in one condition and absent in the other conditions. So this is just kind of a cartoon of the association of a SNP to expression of a gene. You could have a case where you've got a single gene that has an EQTL in both conditions, but it's actually associated with different SNPs. You could imagine a SNP associating with one gene in one condition, but different genes in a different condition. You can have changes in effect size. So here you see an association where there's a large fold change, but in the other condition smaller. And you can even have these cases, like you see down here, where the allelic direction flips. So the high-expressing allele in one condition is the lower-expressing allele in the other condition. All right, so what do we see? So first we can ask the question about population specificity of the CICQTLs, because we have these three ancestry groups. So we see that only about 4% to 6% of the CICQTLs are population-specific. And most of these are driven actually by allele frequency differences, but there are some where the allele frequencies are exactly the same, and yet the association is present and absent in one of the populations at least. And so we can ask about the shared associations and say, well, what about this effect size modulation? Do we see much of that? And actually we see they're very highly correlated, the effect sizes across populations for the shared EQTLs. So there really isn't much in the way of this presence, absence, or effect size differences across the populations. But the things that we do see that are different, they might be really important for population-diverged phenotypes. Here's some examples of the population-specific EQTLs. So in the panels over here, we're looking at associations in each of the populations, so there's African-Americans, Europeans, and the East Asians, and a gene and SNP, and we're showing what the associations look like in each of the populations. So in this top panel, you see a European-specific EQTL. In this panel, you see an African-American EQTL that's not present in the others, and here you see an East Asian-specific CICQTL. This one over here, this gene Tarsal II, this shows an association only in the Europeans, not in the African-Americans or East Asians. This one's kind of an interesting one because it's one of the regions of the human genome with the most evidence for selection, strong evidence for positive selection in the Caucasian population. We don't know what the target of that selection was exactly, but it's really a standout on the distribution of selection kinds of coefficients. So what about cell-type specificity? Here we see more variation than we did across the populations. So we see that about 40% of the CICQTLs that we detect are cell-type specific. 30% of that is coming from the monocytes, about 8% from the T cells, and so, again, when we look at this piece where the shared EQTLs are and ask, well, what about the effect size differences? We see more variation here too, so this is less highly correlated than what we saw across the populations. So we have evidence for both the cell-type specificity in the presence of absence and the fold change. So here's some examples looking at this specificity in a rheumatoid arthritis risk locus. So here's the GWAS signal in this region for rheumatoid arthritis, and then you see that there are multiple genes in this region, and you can see a CD4 specific EQTL for this gene FADS-1, for FADS-2, there's an EQTL in both of the cell types, and again, you can see a T cell specific EQTL here. I mentioned that one of the ways that you can have contact specificity involves flipping of direction of the allelic effect, and that's exactly what's shown in this plot for the gene CD52. So the top panel shows the EQTL across populations in the monocytes, and so you see that the T allele is the high expressing in each of the populations. So this is significant in all three populations, and when you look in the T cells, you see the opposite. So the T allele is the low expressing allele. Again, this EQTL is shared across all three populations, but exactly the opposite direction. And this particular gene, I should say, is a drug target for chronic lymphocytic leukemia. So we've been doing a lot of work thinking about sex as a context too, and so we can separate our individuals into males and females and look to see whether there are sex specific EQTLs. Now, I've purposely put an X here, X% of cis EQTLs, exhibit sex bias, because we're not quite sure what that number is yet. We're still trying to work out how we assess the significance of this. It's tricky, but we're looking for patterns like what you see in this example here, and we do see associations that look like this, some even much stronger than what we see here, but trying to do this on a genome-wide level and trying to consider all our covariates and exactly how to do permutations for interaction tests has been a little bit tricky. But so what we see is a male-specific EQTL here, where there's not the signal in females, and this one is pretty interesting because this is a chronic lymphocytic leukemia risk locus where there's a difference, sexual dimorphism. Male risk is twice that of females. All right, so we can ask about these EQTLs and their role in disease. And so colleagues of mine at the University of Chicago, including Nancy Cox, several years ago were the first to show that genome-wide association study SNPs were more likely to be EQTLs. So that kind of heralded this whole idea of can we use EQTLs to help understand our GWAS hits? And so the idea is really simple, is that if you have some association via GWAS, trade association, and the same association or the same region is associated with expression levels of genes, then you have a potential hypothesis of what gene might be involved and a mechanism expression with your disease. So these become testable hypotheses. So you kind of overlay GWAS signal with EQTL signal, trying to look for a pattern where the two signals match each other. And there's been a lot of statistical methodology development in this area, particularly interesting area for people going forward now. These methods have gotten more and more sophisticated. And so we took our EQTLs and related them to the information that was in the NHGRI GWAS catalog. And so you can just see at the beginning, this is a little, I mean, if we redid this now, we would have some different numbers here, but you get a pretty good picture of what's going on from this. And so we took SNPs out of the GWAS catalog and looked to see how many of them were EQTLs that were specific to monocytes to T cells or shared. And we can parse this out by trait category as we have here. And so all the things that you see in these columns over here are these cases of testable hypotheses now. So we have a GWAS signal. We have a SNP associated with a particular gene and we know something about the risk allele and which way it's influencing expression of those genes. But these are immune cells, so I didn't show you anything here about the immune traits. So we can do the same thing, parsing out across autoimmune disease SNPs. And we end up, you can look at the numbers here. These are the number of SNPs that went into the analysis that we were looking at and then how many genes had any EQTL for them in our study and you can down here at the bottom see some of these totals. And so we used one of the metrics that I was telling you about one of the statistics trying to say whether the same causal variant is likely to underlie the EQTL and the GWAS hit. And we come up with for the T cells, 106 genes and monocytes, 123 genes. And so there's really a lot of data mining that you can do with this and there are people following up many of these studies with studies right now. So one of the things, we have these two cell types so we can ask our particular traits associated with regulatory variation in one of the two cell types more than the other cell type. And so this is kind of a complicated slide so let me walk you through this. If you remember when we asked, so what's the cell type specificity like, we said 60% were shared. So that's what this bar is representing, this blue butt is the 60% shared across cell types. This is all the EQTLs, 30% were monocytes specific and about 10% were T cell specific. So then we could take the whole GWAS catalog and say, okay, so if we take the GWAS SNPs and the ones that are EQTLs, do they tend to be monocytes specific, shared or T cell specific? And what you can see is that the GWAS catalog as a whole is enriched for the more specific EQTLs than the shared ones, okay? So we can take different traits as you see here, the SNPs for them and ask the same question. Do the SNPs, for example, associated with Alzheimer's disease tend to be EQTLs in monocytes or in T cells or in the shared EQTL set? And you can see here, so what we're comparing is the size of this proportion to the size of this proportion, right? Actually, I guess it's this proportion because that's the GWAS. But so you can see this massive enrichment of the Alzheimer's disease in monocyte regulatory effects. And these are ranked in order, the traits here, by their enrichment either for monocytes or for T cells. So more T cell specific down at the bottom or monocyte specific up top. And we can take those SNPs, so it's not just a matter of overlapping this and saying one is more likely than the other, but we can actually go and do some permutations or develop a background. So we can take the Alzheimer's disease SNPs and go and get matched sets of them and do this over and over and over and say how often do we end up with as drastic a pattern as you see here. And that's what we've done and that's where we get the P values. So you see these traits are significantly enriched for the monocyte specific and these traits for T cell specific signals. Here's an example. This is a CCQTL for CD33 associated with Alzheimer's disease. And so this one's a quite interesting one because there's matching data for this that CD33 expression on the cell surface is increased in post-mortem brains of Alzheimer's patients and also associated with beta-amyloid protein and plaque accumulation. So there's a genetic disposition there. All right, so just to bring the point home that it's important when you're doing this kind of work to be looking across cell types. So I said we wanna look at the association for example that looks like this but if you're in the wrong cell type you might never see that and so that really was part of the big motivation for the G-TEX project that we heard about already. So I'm a little short on time so I'm just gonna jump right into the activation QTLs and our questions here were is there inter-individual variation in immune response? Does it have a genetic basis and can we see that at the level of transcription? And so we did two different cell activation studies that I told you about before so we had monocytes derived dendritic cells that we stimulated with LPS influenza and interferon and also for the T cells but I'm not gonna focus on this part too much. So we first collected the cells from these people and we stimulated them in a small set of our data set number of individuals using a microarray, whole genome so we could see which genes are up-regulated or down-regulated in response so we could develop a smaller set that a code said that we used a nanostring pool for to assay all the individuals and we had to do these experiments to find out what were the right time points to do this and so on and so we did that. We did the EQTLs in each of the conditions and do some fine mapping. So these are the stimuli we're looking at. I just put this up here so for people to be able to look and see which kind of pathways are activated through interferon beta and what do we find? So of course we find EQTLs that are common to all conditions baseline and all the stimuli. We find somewhere they're specific to one or the other pathways so here's flu only, here's LPS and flu and here's all the stimuli but not at baseline. So what we're seeing is that genetic variation influencing the response to these different stimuli is the stimuli itself is uncovering the function of some of these SNPs. So we can also create our own trait if you wanna call it that which is the actual response. So the full change between activated state transcription and baseline state transcription. So this is how much does each individual change and we can map that as a trait and we find associations response QTLs that are specific to one or these other conditions or across all stimuli. Now one of the things we saw when we did this was that the response EQTLs were enriched for binding sites of the stat family of transcription factors and then when we could take a look at some of these so these are some example genes that came up from the response EQTLs and these are just the associations themselves but we can look and we can see that the most significant SNP for example for this gene is right in the binding site for stat two. We can see SNPs that change the response elements motif in these two genes and we went and we did some validation of these associations so doing Luciferase assays looking at allele specific changes in the response and we validated those three. We also did some CRISPR where we changed a heterozygote to homozygote and did the stimulus and then we could see the fold induction changing so these are ones that validated. Now of course these could have roles in disease as we said before and so this is just showing you a bunch of the different GWAS diseases that are autoimmune and infectious disease, the genes they regulate and whether those are response QTLs or state specific EQTLs. Again generating a lot of hypotheses that we can look at going forward. So I'll just stop here and summarize by saying we made this reference map of the genetic basis of transcriptome variation in adaptive and innate immunity. People are using these data a lot. We're really excited about that but I think we've shown for sure that there's context specificity of these EQTLs. In this study alone we've looked at population, cell type, sex and activation state and just in these two cell types and so this question of what does a given SNP or polymorphism do, I think the answer is very clearly it depends and it depends on context and we don't have enough context even represented yet in these kinds of studies to have a very good sense of what's common to the ones that tend to be more state specific versus not and it's really exciting area to look at going forward and I'll just stop there and acknowledge many collaborators on this. Most of the analysis was done by my old postdoc Tophiek who's now got his own position and I just point out my lab down here and that we're always looking for good people to join us so thank you for your attention. She's got one. We did, we did. We're pretty underpowered for that. The question I'm sorry, let me repeat was did we look at trans EQTLs and the answer is that yes we did. We're pretty underpowered to do that even in the meta analysis framework because we're only at 500 individuals and there are cell type specificity between the trans EQTLs but it's sort of hard to know whether those might just be false positives that don't replicate it. It's kind of a tricky thing with the trans. Yeah, I can repeat the question. Sure, so the question was is there a difference in the strength of associations for context specific versus shared EQTLs? Is that right? Oh, okay. Is there a difference between how well, between the specific context versus the reaction QTLs? I'm not sure, I understand what you mean. So when you go from baseline to flu, you said that you run the QTL analysis on the full change of expression. Yes, yes. Are these typically? Are they different than the state specific ones? Yes. Yeah, they are, I don't have a slide that shows this. There are some that are the same and some that aren't. I don't really remember off the top of my head the exact proportions, but there are some that were the same and some that weren't. I just can't remember the ratio, sorry. You show those three pathways of the three different stimuli. Yes. So for the stimuli specific responses, could you actually understand through the three pathways? Yeah, yeah, I think by looking at which ones are specific, which ones are shared, you can see which pathways those are going through. Absolutely, yeah. It informs about mechanism for sure. I just had a very quick question. That the sites where you see the contact specificity, if you've looked in ENCODE data perhaps, to see if those are also the sites that for example have high cell type variability? That's a good question. So what we saw when we had looked at the ENCODE data was really just this enrichment for the stat signaling, but that's a good question. I don't know the answer to that. We should. Okay. Thank you.