 Hello, this is John Stemitianopoulos and I'm going to be speaking to you about high resolution maps of regulatory DNA and offering some insights and a forward perspective. In outline, I'll be talking first briefly about the global indexing of regulatory DNA marked by DNA's hypersensitive sites and also transcription factor footprints therein. My talk here is going to be a general overview and presaging the talks of Valtor Milamon and Jeff Fierstra later this morning, which we'll go into more detail. I'm also going to offer some key insights at the interface of genetics and gene regulation and their integration and talk generally about next steps. So what is indexing? Indexing is really about leveraging all of the data we have to create the best possible maps. To understand how this works in principle, we have to understand how we used to do it. So for DNA-seq maps, essentially what would happen is that we'd get a cell and tissue sample, we'd perform DNA-seq, we'd call peaks and create a map for a given cell type and then we would do this over again for another cell type or tissue and another one, etc. So essentially all of these maps are independent of one another. And the same procedure was also applied for calling DNA's one footprints within hypersensitive sites. So this approach has a number of drawbacks. First, you obviously need to redraw the map after every experiment. Second, there's no fixed genomic reference for each element. They're all discovered de novo. Likewise, there's no standard nomenclature for referring to specific elements at specific genomic positions in specific genome builds. And finally, it's difficult to study the behavior of the same element across different cell contexts. So with indexing, instead of looking at the data sets individually, we're leveraging all of the data together. Essentially what we're doing is combining maps along a genomic position axis, which is what are created with individual data sets, with a cell content axis. And this really is encompassing all of the different cell types and tissues. And the basic idea here is that you can combine the wealth of these cell context maps to increase the precision on the genomic axis, because many of the sites turn up over and over again in many different cell and tissue contexts. So how does this look for indexing DHS's? Here's a roughly 75 kilobase region on chromosome one, overlying a couple of genes as an example. If we look in thymus, we find that there are some hypersensitive sites in the promoters and the introns of these genes. And they replicate reasonably within between samples. But if we expand this range now of cell types sampled across a wide range of hematopoietic cell types and tissues, we see the following. First of all, we see that there is organized biological behavior and that there's very high positional stability of the elements. And this positional stability can be zoomed in on for a few of these elements here, where you can see what's going on. Essentially you have several different types of elements. One's in the middle are very highly positionally stable. One's on the left where there's a little bit of wiggle between different cell and tissue types, which coincide with the kind of expansion and contraction of the site. And the one on the right shows again different forms of a site that may appear in conjunction with neighboring elements. So essentially indexing is about taking all of this information, combining it together and integrating it using a technique developed by Wouter Müllemann to identify a centroid for every element, a core region which represents the confidence bounds on the mobility of that centroid between different cell and tissue contexts, and a consensus start and end of every site. So once you've done that, you've essentially resolved all of this different data collected over dozens or hundreds of different cell context experiments down to a single archtypal genomic element. And to that element we can assign a unique identifier. An identifier can have a meaning unlike, for example, the RS numbers of SNPs. Here we can assign the identifier to represent a chromosome dot a position or a positional number which represents approximately the percentage along the chromosome where the element occurs. So if this process is now repeated over the breadth of encode data comprising about 733 different cell contexts, sorry, experiments across approximately 440 different cell and tissue types in states, those are the contexts. If you go and call all of those data sets individually, you come up with about 70 odd million different hypersensitive sites that are mapped somewhere, but if you integrate it all together into a consensus index, you can see that all of those elements are really essentially manifestations of approximately 3.6 million archtypal elements that are precisely encoded in the genome. And again for each of these we can identify its center position, the variability of that center, and a consensus start and end. A similar approach can be applied to indexing transcription factor footprints. Here the resolution is vastly higher. We're now down looking within an individual hypersensitive site at the nucleotide level where we have base by base DNA protection data. And again, this can be integrated across a wide range of different cell and tissue contexts and done in not only accounting fashion, but also taking account of a full Bayesian model that encompasses all the data. And this will be discussed further by Jeff Verstra later in the morning. Essentially what this allows you to do is to identify reference TF-contacted nucleotides and to call consensus transcription factor footprints. And so out of this one can integrate 243 different cell and tissue contexts into around 4.5 million consensus footprints and assign also unique identifiers to them that are essentially extensions of the DHS identifiers wherein they occur. It's also notable that around 83% of the footprints lie in DHS core regions. And finally what does this provide you but a framework in order to integrate transcription factor identifications. So we can actually do this by consensus motif matching and now annotate all of the footprints by their predicted consensus motif and essentially this creates a matrix if you think about it of all the footprints and all of the different biosamples out there and whether that motif is occupied. If we look across all hypersensitive sites we find that a typical regulatory element encompasses about 200 base pairs and five to six transcription factors that are directly bound and generate footprints with quite ample spacing about 21 base pairs on average between them. One key finding is that this occupancy architecture at most elements is actually invariant across cell contexts. Essentially it means that when a given DNA hypersensitive site appears in a given different cell context its TF occupancy pattern appears to be constant. In other words most of the regulation is driven by the coincident activation of elements rather than the turnover of transcription factors within an element. Now I'd like to move on to some key insights from these work at the interface of genetics and gene regulation. So some key take home messages from analysis of consensus DNA hypersensitive site indexes are that incorporating the cross cell type behavior of DHS's markedly enhances the enrichment of GWAS signals. Vowder will be talking later about a technique to capture the biological behavior of DNA hypersensitive sites and to summarize those in regulatory components that can then be used for a variety of powerful analyses. One of these reveals that GWAS variants are distributed across collections of co-regulated elements that span gene bodies. So this is a fundamentally different paradigm than current thinking around individual elements that are individual variants that may be acting along. From the footprint perspective consensus footprinting has enabled us to finally distinguish between variants that land in DNA hypersensitive sites but are in footprints versus those that are not in footprints. And now we know that essentially all of the enrichment in GWAS signals that we see in DNA hypersensitive sites is actually accounted for by the variants that fall into the footprints not between them. So on a higher level, we can now integrate genetics and regulation on three basic axes. First is a genomic position axis of finally resolved consensus DNA hypersensitive site summits and transcription factor footprints. The second one is a cell context axis which is captured in the DHS regulatory components that Vowder will describe and these in turn capture the cross cell type behavior of DNA hypersensitive sites. And thirdly a gene context axis that represents the coherent co-localization of similarly regulated DHSes over gene bodies and the apparent enrichment of genetic signals across those co-regulated elements. So I want to take a brief look forward and talk about expanded data and also leveraging the power of reference indexes. So encode data are ever expanding but now almost exponentially so in the sense that we are currently on pace to roughly double the density of encode DNA hypersensitivity data by the end of 2021 including a wealth of conditional and perturbation data which were previously missing. These data are going to be particularly strong for the immune system which plays a very key role as many of you know in diverse diseases that affect virtually all major organ systems. So these expanded data are going to enable several things. First of all the identification of new hypersensitive sites and we predict that this will expand the DHS index by about 25 to possibly up to 30% and it will certainly more than double the transcription factor footprint index. Secondly because all of the data are additive and self-reinforcing they are going to enable refinement and sharpening of previously annotated elements on the genome. And finally they're going to give a new perspective on combinatorial activation of DNA hypersensitive sites and the understanding of cell context selective regulation. In terms of leveraging reference indexes there are really many things that one can do with these that greatly increase both the power and facility with which genomic data can be analyzed. First of all having a reference index that's largely complete has the potential to replace de novo peak calling paradigms with reference based detection meaning that you can do an experiment and that experiment can be even relatively sparsely sampled and by using the reference index to identify elements with much greater power than you otherwise could have based on the data alone. This will also enable new approaches to quality metrics for genomic data having so many fixed references out there. And more broadly will systematically anchor and expand sparse data from single cell and even single molecule experiments. Here the indexes are already being leveraged for this end and really I think are going to have a tremendous power to do this and anchor a wide variety of single cell analyses. On a more broad level what we're talking about here is there a transition from an era of discovery of novel elements to one of rapid accurate and highly sensitive detection in the context of diverse experimental modalities. So with that I'll close and thank the team at Altius and particularly Jeff and Vowder who are speaking later this morning and we'll turn to questions. Thank you.