 All right, so our next speaker is Seth Amant. He is from Institute for System Biology, Seattle. So he's going to talk about reconstruction and analysis of tissue-specific transcriptional regulatory networks with Trina. Hi, thanks everyone for sticking around until the end. So this is work that I'm doing in the lab of Nathan Price up in Seattle. I'll be setting up my own lab at the University of Maryland this fall. So excited to expand all of this work there. So the big question that I'm interested in is really, how do we go from DNA sequences towards higher-level phenotypes? I think it's obviously the kind of big picture that many of us are interested in. And that this involves a transition from DNA through a series of networks to the kinds of phenotypes, higher-level phenotypes like behavior that we really care about modeling. But I'm interested in this first phenotype of molecular networks and gene expression as a step that we can perhaps begin to make some progress in understanding. And that's, I think, the power of ENCODE for many of us. This is my first ENCODE meeting, so it's great to be able to thank all of you in person for the data. So the approach that we've been taking to reconstruct transcriptional regulatory networks involves really trying to, through machine learning, integrate many different kinds of data to predict in two different domains. First, we'd like to understand where transcription factors bind in the genome, given information about the motifs that they recognize, and a variety of annotations of the chromatin, including DNAs, as well as enhancer annotations and evolutionary conservation. And then, given that we can predict those binding sites, can we use that information together with gene expression to predict how transcription factors regulate their target genes across the genome, and to then use that to understand the kinds of changes that occur in disease. So, and in each of these, we've developed machine learning approaches, ensemble machine learning approaches to solve these problems. So this is a work in progress, but this is sort of a status report on what we've learned so far. So the first is that there are a handful of methods that have been developed so far for DNA's footprinting. What we're finding is that if you combine those methods together and include additional kinds of annotations from chromatin states and evolutionary conservation of particular sites, one can improve the ability to predict transcription factor binding sites in the absence of chip-seq. So in other words, we can predict chip-seq peaks from the other data, with, in this case, a single transcription factor, but this appears to generalize that we can build models from a subset of, using the chip-seq from a subset of transcription factors and find binding sites for additional transcription factors, allowing us to build models for hundreds of transcription factors in any tissue for which there's DNA's and some chromatin annotations. Second, I think a harder problem is given a distribution of transcription factor binding sites, can we go from there to predict the target genes of those transcription factors? So this problem relates to the problem of connecting enhancers to their target genes, but I think it's even a bit harder because what we'd like to know is given, is whether a specific transcription factor is likely to regulate a target gene. So in this case, I'm learning on SHRNA microarray experiments where each of 59 transcription factors was knocked down and then microarrays were used in lymphoblastoid cells to figure out a set of genes that were differentially expressed. So I think, strikingly, a number of methods that have been used to build transcriptional networks in the past have no enrichment whatsoever for targets found by SHRNA knocked down. So this includes, there's essentially no relationship between, for instance, targets predicted by the Arachne method, which is widely used for reconstruction transcriptional networks as compared to these methods. So that's, I think, a bit disappointing because obviously we'd like to be able to make these predictions. There seems to be a small amount of signal from simply looking at Pearson correlations and from binding, transcription factor binding sites predicted by our method across a variety of different ranges around these transcription start site. But what we're able to find now is that by using an ensemble of these predictions, including both transcription factor binding sites and co-expression, we can begin to predict in a generalizable way which genes are going to be differentially expressed when you knock down the transcription factor. So that suggests that we're starting to be able to get some signal here. And I think we can probably continue to bump up these areas under the rock curves. So that's encouraging. What I think is a little bit easier and I think is actually quite encouraging is that if we do know a set of transcription factors that we think that bind in the region around a particular gene and we combine their expression patterns, we can actually do quite well in predicting the expression of those target genes. So that suggests that given that we do know which transcription factors are relevant, we can learn an enormous amount about the activity of a network based on just that relatively small number of regulators. So in this case you can see that we're able to predict the expression of over 10,000 genes in the human brain based on the expression of their regulators. So putting these things together, we're able to build a model for transcriptional regulation in the human brain and now we're expanding this to a number of other tissues where we start out with around four and a half million transcription factor binding sites predicted by DNA's seek and other methods from ENCODE, another data from ENCODE together with external transcriptomic data, in this case 2,700 microarrays from the Allen Brain Atlas and we end up with the data with a transcriptional network that incorporates targets for 700 transcription factors, regulating 11,000 target genes through 200,000 interactions. So the real key I think for any such method is does it allow us to learn interesting things about biology? So my key interest, I'm a neuroscientist by training and I'm very interested in understanding the mechanism of psychiatric disorders. So we applied these networks built with these methods to try to understand master regular transcription factors and causal regulatory variants in three psychiatric disorders, bipolar disorders, schizophrenia and major depression and we identified a set of transcription factors whose predicted target genes were strongly enriched among differentially expressed genes in the prefrontal cortex in each of these three diseases. Among these, the one I've circled here, Poo 3F2 is especially interesting because it's among a small number of genome-wide significant loci for bipolar disorder risk. So that's independent evidence that that transcription factor may be involved in a causal way in bipolar disorder. We then tried to use the transcription factor binding sites annotated from our model to try to understand among the haplotypes associated with risk for bipolar disorder and schizophrenia whether we could identify causal variants on those haplotypes. And again, we were, at least for a few, loci able to identify very interesting variants, including in this case a predicted binding site for the same transcription factor, Poo 3F2, near the, in the promoter of the VRK2 gene. And we've now done Luciferase asses and some additional functional validations to demonstrate that in fact, this variant in the VRK2 promoter modulates the activity of that promoter in a Poo 3F2-dependent fashion. So it seems like we're able to identify at least one functional variant on that risk-caplotype using our methods. So the software for this is up on GitHub through the Price Lab, and we're rapidly trying to get this out there so that we can generalize this and build a whole set of tissue-specific models. Thanks. Thank you.