 All right, good afternoon, everybody. My name is Yan Li, and I'm actually in Dr. U.S. lab. So his compliments means nothing to me. So today I would like to present the encode element browser and the 3D genome browser that our lab has built to help hopefully to increase the accessibility of the encode data. So the encode element browser is the first thing we're gonna cover. And it probably is a suite of four tools, and it covers two types of data. One is gene expression, and the other one is cis regulatory elements. And this part actually ties back into the data set that Michael talked about. And then the next browser is a 3D genome browser, which visualizes high C and GAPED data. So what are the goals for this browser? Why do we make it? So as I said, one of the goals is so that the user could query for the most relevant encode data. And encode have generated millions of data, and it's important to isolate the ones that you need for your own biological experiments. And the next component is to visualize complex data, and this is especially true for high C and GAPED. And the last part is providing an additional layer of evidence. As was said, it's difficult to know what the target gene is. It even given the cis regulatory element because the cis regulatory element may not, regulate the closest gene that's relative to its position. So high C and chromatin ligation experiments are important to find the chromatin loops that develops between the cis regulatory element and the gene promoter. So the high C browser is also geared toward that. And hopefully those are used to guide any biological validation experiments. Okay, so without further ado, let's dive in. So, well I see a bunch of Macs. Oh my gosh, that's a lot of Macs. Okay, but let us get started. So first we are going to do the encode element browser. And let's first go to the encodeproject.org website and then just follow what this animated give is showing. So click on data, annotations. And below annotated genomic regions, there is query tool at Penn State and to arrive at the Penn State website. Pardon me? I think so, it's not? Does anybody else have this issue? I think the most important thing is go click data and then click annotation. It will show you this link. Okay, and click on human, the human tab to access the human data. And let us enter the gene ikzf1, the aqueous DNA binding, zingfinger1. And enter it under option one, like the give is trying to show. Did anybody get this page here? Everybody get, yeah, okay. So as you can see this page, so this option one is gene expression. So we're looking at the expression of the aqueous family gene here across AD plus tissues. And so here is the gene ID with a lot of synonyms. Oh, and then before I forget, so the first part, so here we enter the gene symbol, but you can actually enter the revsig ID with uniprot ID as well. And when you enter the ID with a symbol, the website should prompt you to the correct spelling. Just if you see no prompts, that means there's something wrong with your spelling, so be careful with that. Okay, so we see the bar graph in RPKM, and then below that is the list of, basically the list, the bar graph, the information that the bar graph is trying to convey in a table format. And the IKZF1 gene is a protein that is involved in hematopoiesis as well as immune system development. So it makes sense, as you can see, there is the cells that have relatively high expression is CD20, GM12878, K562, and so on, and if you think the label on the bar graph is too small, you can actually click on it to get an enlarged image and you can read the label from that. Okay, so the next two options, option two and three, directly stems from Michael's presentation, the data set that he presented. So it queries for the candidate cis regulatory element regions with DHS or transcription factor binding site. And it's a fast and easy way to really determine the cis regulatory elements as well as their tissue specificity. So without further ado, go back to the, I guess click on the back button and under option two, select chromosome seven and that's 50 million and 300,000 for the start and 50 million and 305,000 for the end and then click submit. Okay, so you'll see a pretty busy page, but the gist of what you see is gonna be the DNAs, the DHSs and the TF binding sites and then what the identity of the trans factor binding if applicable and the tissue that the region occurs in. And as you could see, you see a lot of immune-related cells which is very applicable to Icarus. Okay, so it is not always, I guess it is, people don't really remember the regions that they want out of their head. So the next tool complements option two, option three is actually you can enter a gene and as well as an extended region, extended window which, and then to search these cis regulatory elements. So for options three, let us enter IKZF one and for extended region, if you don't put anything there it is 20 kb, but let's put one for today because there are a lot of people trying to access the data at the same time. Okay, so you see this page which is very similar to the page that you saw before except this time it is more honing toward the gene and as you can see the cells that are here also immune-related which is very appropriate. Okay, and then next one is option four and option four is actually, as Michael talked about, it's actually trying to correlate the activity between a proximal DHS and a distal DHS and as you can recall a proximal DHS is a DHS that's a near transcription start site of a gene but the distal THS have some sort of cis-radiatory function, enhancer function and as you see for this example, if we correlate the activities of DHSA with the distal DHS, there is very low correlation between the two because the DHSA is active in the tissue in which the distal DHS is not active in but this is not the case with DHSB because DHSB and the distal DHS actually active in the same tissue or similar tissues, so that means they have higher correlation and when this occurs, the pair of DHS is referred to as linked, so DHS linkage and when they're linked there means there's some sort of biological connectivity between the two and for more information, you can refer to the Thurman et al paper in Nature 2012. So without further ado, let's try this out on the element browser. So under option four, enter the IKZF1 gene again and click on submit. Okay, so you should see the page that looks like this. So the first three columns is the location of the proximal DHS which is near this gene in the center column and the next three columns is the location of the distal DHS and the last column represents their correlation and these DHS is only recorded when their correlation is above 0.7 and so I took one of the higher correlating DHS pairs, 0.96, this location here and I actually run it through option two which searches elements in a given genomic region and I saw that there is a transcription factor binding site which is for EBF1 and ELF1. These two are also transcription factors that are involved in immune development so which is kind of, which is agreeable with accuracy's function. So this means that there is some sort of biological connectivity between two and indeed if you knock out any of these genes you're gonna, the patient is gonna develop AOL. So this sort of illustrate that these suite of tools are meant to be in using concert to really find out the answer to your volatile question and keep in mind that correlation doesn't imply causation just because these two transcription factors correlate with a chorus doesn't mean that they necessarily directly regulate so we need an additional layer of evidence to see if indeed regulation happens and this brings me to the 3D Genome Browser. So the 3D Genome Browser's URL is 3dgenome.org. Yeah. Can you repeat the question? Yeah. So the question was can you download the data somehow? You could save the page as a HTML and then export it to Excel. Yeah, so let me answer a question. I think all the files can be downloaded. All the files we used actually, majority of them are from the encode portal, the annotation page that can be downloaded and the linkage file we used are from John Sam's Nature 2012 paper and that file can be downloaded from that paper. We will use another updated version soon. Well the question is not to download the whole data but part of the data, right? Yeah, we'll add that option in the future if you don't know how to use XML. That's fine. Okay, so is everybody at the 3D? Yeah. I think it will be really nice if you can just click to the option, do it from the result of the option seven. So, so that- Yeah, that's actually I had an epiphany this morning that I should do that. So, yeah, that'll be done soon. Okay, so is everybody at the 3Dgenome.org website? Okay, so there's two parts to this website. The first part is the visualization of the Hi-C, the Hi-C Genome Browser and the second part is the visualization of a virtual 4C and 4C, as you know, is a one to many query of the chromatin interactions data and the reason is it's a virtual, so you're looking at the interactions of your low side, a low case of interest toward the other low side and the reason is called virtual 4C is it's drive on the Hi-C data, it's actually not experimentally drive so that's why it's virtual 4C and then along with the virtual 4C is the Chiapet data and unlike the Element Browser, this part of the website requires JavaScript and HTML5, so most modern browsers actually include these two but if you haven't updated your browser in a long time, I really recommend you to do that, not only to access the website but only for security concerns as well. Okay, so the main features of these two browsers, you can easily browse some of the Hi-C, the high quality published Hi-C data available including the ones that were generated for encode and you can contextualize the data with a customized, miserable UCSC browser session and lastly you could browse your own Hi-C data and we'll show you how to do that. Okay, so let's click on Hi-C interactions tab at the top and let's enter the Gene Socks 2 and click on Show Interactions. So as with the Alamrom browser, we have two options, one is searched by a gene name and the other one is searched by location if you know the exact location and this part I'll explain later, the UCSC browser session which you can upload your own session, not the default one that's loaded here but your own session with your own customized tract. So if you ever click submit, you might have to scroll down a little bit so the Hi-C image is not shown on the first page. Yeah, make sure you scroll down. So for this session only, I actually filled in a customized browser session so people could use okay, so this is the results page so remember to scroll down and so in the center is your Hi-C heat map and you can adjust intensity with this bar up here and at the top is the navigation bar in which you can zoom in or zoom out move left and move right from the region that you're currently in and then below is the UCSC Geno browser and it should be aligned to the Hi-C data. So everybody got that page? Okay, so what am I looking at with that weird triangle? So normally the Hi-C data is the heat map visualization of the contact matrix of an N by N matrix and this matrix is so each the size of N is determined by the resolution of the matrix and normally the Hi-C matrix is diagonally symmetrical which means that if you want to look at the interaction of low side M minus two with low side three with low side M minus two, it's gonna be the same whether you're on this side of the diagonal with this side of the diagonal. So to really save some time and energy we cut off the upper triangle out. So what you see is actually the rotated upper triangular part of the matrix. Okay, so this part I'm going to actually go to, okay. So does everybody have this, right? Okay, so you could adjust the intensity of the matrix with this bar here. So if you increase the value here, only the values that are up, the Hi-C matrix value up here are gonna be cut off and representatives is red and if you increase this value down here the values that are less than that are gonna be represented as white. If you move the bar close together to increase the contrast, you can see more localized interactions. So if you want a more precise control of the kind of values, you can use the arrows up here to either increase or decrease the values here and then click on refresh. Or you can just directly enter the values you like and then click on refresh. Okay, so this is, oh, it's not showing up? Oh, okay, I've been talking to myself. Okay, there we go. Yeah, sorry about that. Let me go over that part again. So you can slide these sliders to increase or decrease the cutoff values for your matrix or you can directly manipulate the values with the arrows here and then click refresh or you can directly just input the values you want as the cutoff, click refresh and that's actually the contrast of I like and then if you scroll down a bit, you could actually, you can notice some really high signals. You can find out what two loci contribute to that signal just by clicking on it and then it will extend a gray bar within the UCSC Juno browser and you can look at, it looks like it's really near the transcription start site of SOX2 and it looks like over here, we got some histone modifications, the K27 oscillation, so it could be a potential enhancer here. So if I'm not sure, I could just double click on the region and then we would zoom in to the region of interest, not much here, huh? And then we could adjust the intensity, as I said and then remember these navigation bars here, you can always click here to zoom out, okay? And then is the UCSC Juno browser aligned for everybody or is it off by a bit? Yes. I'm not sure about whether it is aligned because if you scroll the bottom, it's getting in different positions. So the bottom scroll bar, just to the bottom of it. Yeah, yeah. Oh, so yeah, so this is meant to be scrollable. This is meant to be scrollable. So the user is supposed to find the optimal scroll and then click on set UCSC scroll. So there's a default value in which the track should align to the high C matrix. So if you have a different browser or a different operating system, it's possible that you're off by a little bit and when that happens, you can just manually align yourself and click on set UCSC scroll. And then as I said, we can manipulate this track, UCSC browser session however we want. So let's give an example. So let's say, let me change the Chrome HMM data. Instead of dense, I want pack here and then scroll up and click submit. So you interact with this window here as you would a UCSC, if you were on UCSC page. And then we notice that the alignment sort of screwed. So we wanna scroll all the way up so that you see zero for the horizontal and vertical scroll here and you see the upper left corner of the page. And then you click on align UCSC and then the alignment is done automatically for you. And then you can scroll down to see the different tracks. And I changed this track to pack. So that's what you see here. So now with this session here, if you just manipulate whatever region you want, the track stays constant. So this is a customized session just for the user. Let's go back to the slide. So as I said, it's possible to use your own data. And to do that, we can convert our high C matrix data, contact matrix data into a file format called the Butler format, the binary upward triangular matrix file. And this is a file format that's pioneered by our lab. And this file, the goal for this file is that it will act the same as big wig or big bed with UCSC. So it's a binary index file so we can query the regions that we want. As long as you put this file in this file format in the remote server and enter the URL, the data browser, our 3D genome browser will query the file in the particular region that you want without having to upload the entire matrix file onto our server. And the Butler file has many advantages. So it decreases this memory usage because it's binary and indexed and allows random access. So this increases the portability of the file, the speed of the file, we manipulated memory and so on. Okay, so the last part is the virtual 4C and GIPAT data. So if you go up to the tabs, click on virtual 4C, and this time we're gonna do the gene BCR1, the B cell receptor one. And then the extender region, the default is 500 KB. If you leave that blank, but you can enter 500 there. And then you can click on Go to retrieve your virtual 4C region. And as you can see here, you can choose a, as you can see here, you can choose a GIPAT data that is relevant to the cell type, which is GM12878, and you can actually change the tissue of the interest that you look at. And then also down here is the UCSC genome browser session that you want to enter. And as I said, I provided a default one for us today for the demo. So after you click Go, here is a page you should see. The color here might be a bit different, but so we have the navigation bar. So you can zoom in and zoom out. You're region of choice. And then if you enter a gene or SNP, it actually takes that point and that becomes your bait loci or your loci of interest, bait locus or locus of interest and grabs the virtual 4C for you. If you enter a gene, it actually takes the TSS as the region of interest. And then it grabs the virtual 4C plot for you. And this is supplemented with the DHS linkage data that is produced by Dr. John Sam's lab and the DHS linkage. So different colors represents one set, different sets of proximal and distal DHS that are highly correlated. And then down here you see this Chiapet data in which the loci that interact are represented as an arc. And of course down there is a UCSC Genome Browser. So if I go to the page, here. So the, what is it? Yeah, we show it. So you could actually, up here as I said, there is, you can click on whichever THS, that the BCR1 gene actually have two THS. So this is one you can click on this tab to get the other one. And then as you can see, this is not aligned. So you can click on align. And this is automatically aligned for you. And then if you mouse over any region of interest on the virtual 4C plot, you can see the corresponding region in the other tracks. So here we go back to the first region. You can see a peak here, which interacts strongly with the, of course the TSS is gonna interact strongly with itself. But the next strongest point is here, which looks like there's some weak H3K4 mono methylation. And then as you can see, there's actually some Chiapet arc that goes from here to a region that's close to the TSS, not quite, but pretty close. And then the DHS linkage data, I think there's a little bit of brown there and a little bit of brown here, which indicated those two regions to correlate activity-wise. Okay. Whoops. Okay, so that was our browsers. And then I would like to thank you. Thank everybody who provided the data for us. Dr. Stam and Dr. Wong was with Dr. Hardison as well and the entire ENCODE group. And we, the browsers are still in their infancy, so we're welcoming feedbacks. Thank you. I think we have asked many questions. Let's just move on to the next speaker. Dr. Luca Panillo from Dana-Farber. So he's going to give a tutorial on Chrome HMM, which is one of the most popular tools these days. And hopefully by the end of this half an hour, you can probably claim you will be able to run it. Or you've seen how it is run.