 So I just wanted to start by thanking you guys for inviting us to participate in this workshop. I know that some of the encode resources, especially in the genome browser, can be a little bit overwhelming, so hopefully we can get a little bit of that today. And I especially wanted to thank all of you. There's so many people here, which is really exciting to see so many people motivated to dive deeper into this stuff. We're going to take a really quick kind of Blitz tour of the browser resources. Even when I give workshops like this, we have several hours, and so it's going to be kind of condensed. But I'm trying to sort of cherry pick some of the things that are going to be most useful for the kinds of questions that I'm assuming this group is interested in asking. But before I get started, I wanted to get a quick show of hands. Who considers themselves either a really new user to the genome browser or has never used it before? Okay. Nobody, but not everybody, so that's good to know. So the first thing that I always like to show is just this summary of resources for finding more information later. Obviously, there's only so much that I can show you in an hour. We have a lot of great stuff up there. There's a wiki with a documentation on the genome browser. Some of you may know that we maintain a mailing list where you can write in, and no question is too basic or too hard. As somebody that has done some of the work supporting the mailing list, we've seen sort of our fair share of every type of question imaginable. And to that extent, if there's a question that you have, there's a good chance that it was also asked before. So if you don't feel like waiting for reply to your email, an even better approach is to search through the archives and see if somebody has had the same issue and see if we maintain a full archive of all the questions that we've answered before. So it's a pretty rich resource for getting at things that you might be having trouble with, if something doesn't behave the way you expect. If you have questions about the data that you're looking at, anything you can imagine, there's a good chance it's been asked before. We also maintain a training page where there's links to various resources. We recently launched a YouTube channel earlier this year with a few tutorial videos on how to use the browser for various things. And we also have Twitter where we announce new releases of data and features if you want to keep up to date on that kind of thing. If you're on Twitter, check that out. And especially for those of you that have limited experience with the genome browser, we also have an agreement with an external organization called Open Helix that has really great, fairly comprehensive tutorials on how to use the genome browser, everything from sort of very high level and comprehensive to more detailed stuff, like if you're trying to get better at using the table browser, that kind of thing. So, since I only get an hour to talk at you about this stuff, the coolest thing that I'm telling people about these days is actually this thing called genome browser in a box. It's like a 15 second pitch for this. We're gonna be having a poster in the poster session. So if you wanna hear more about this or I'll even do a live demo for you at the poster, come and find me during the poster session. The motivation for bothering to mention this is that everything I'm gonna show you can be done in this genome browser in a box context. And especially if you're doing work with sensitive data that has a lot of sort of, especially clinical data, has restrictions on where you're sharing it or looking at it, that kind of thing. This is a mechanism whereby you can run an instance of the browser on your laptop. I'm running one on this laptop, so you don't have to have high-powered hardware for that. You can run it offline. And the idea with that is just that rather than your data traveling back and forth to the browser over the internet, which isn't necessarily as secure as you need, our data travels back and forth to your machine and your secure data stays on the machine. So everything that you're gonna see can be done in that context and is possible for more sensitive works. If you're curious about that, I don't have time to get into it right now. But part of the poster that will be up later has details on that and I can also show you what it's about. So, what I was hoping for people to take away from this fairly brief browser session was just a little bit of what is possible in terms of using the encode resources to annotate your data. So a lot of times people will ask me why our genome browser as opposed to any other genome browser. And one of the things that I think we do well or that we have going for us is that we have a lot of data. And so a lot of the reasons that people will come to our browser is to see their research in the context of other things that have been done. And so what I'm gonna focus on today is how to see your data or data from the encode portal in our genome browser. And then how to use that to sort of grab existing annotations for your bed file or your VCF file, maybe have RNA-seq data. How you get that data into the browser and also how you find annotations that are gonna be relevant for what you're doing. Just a quick word about sort of the format of what you're gonna see. The exercises are at the end and if we have time, I'm also gonna cheat a little bit because I know the break is right after my session. So you could also maybe use that to work on the exercises if you're feeling motivated. But in particular, the exercises just sort of mirror what you're going to be seeing in the slides. So the idea is I'm gonna show you how to do it once. And then if you wanna go off and see if you can sort of reproduce what we're doing, find the same resources within the website. In the exercises, there's a step by step walkthrough of how to do each of those things. And I've also created what's called a session to show you kind of the answer or finished product of what you should be looking at. And I'll explain what a session is towards the end of this little bit so that you can see how to look that up for yourself. So I'm gonna start with some basics of the browser since I did see a fair show of hands of people that aren't that familiar with the layout and what it is. Gonna go through that kind of quickly just to try and get at the more interesting stuff. And then we're gonna talk about a couple really new tools that we've launched within the past couple years. One of them is very hot off the presses, which seems to be the phrase of the day, which is great for grabbing annotations from multiple different data sets to annotate, especially if you have a positional data, like a bed file, especially something like data you might be getting out of your stuff like peak calls. Maybe you wanna find all the snips that overlap with the regions that you're interested in. Maybe you wanna find the genes that overlap with the regions that you're interested in. This is a mechanism for collecting that all into one place. So in a very quick and fast talking nutshell, that's what we're trying to achieve with these next few slides. And then we're also gonna talk about track hubs, which you may or may not realize it, but you already saw when you saw the stuff in the portal demonstration because the thing that they were using is a thing called track hubs. And that's the mechanism whereby you can display stuff in the browser, but have that data be hosted somewhere else such as Danford, for example. So in looking at the main display, hopefully this image is just generally familiar to most of you. And for all of these slides, where possible, the URL is listed at the top, in case you wanna try and follow along. As was mentioned previously, the slides are available online, so you can go look them up later. I mostly just want you to kind of see what's possible, see where we're visiting. And then when we get time at the end, you can go and see if you can walk yourself through the same steps. And obviously you don't have to use the toy example that's in the exercises. Feel free to try your own data set or your own list of genes, that kind of thing. I just picked a couple sort of workflows that I thought might be of interest to people as a mechanism for getting a tour through two tools that I think are pretty relevant for what you guys are trying to do. So looking at the main display, the thing that people think of when they think of the genome browser is this main screen. I've seen it in some of the screenshots of the talks today. And for those of you that don't know, as was also mentioned earlier, when I talk to relatives, people that don't do, I guess I sort of limp it all into one, but people that aren't that familiar with genomics work, I kind of tell people that you can think of it like Google Maps. And so underlying everything is the physical map of the location of everything. And what we have are annotation data sets. And those might be, if you're thinking of the sort of Google analogy, maybe all of the locations of the businesses or all of the locations of traffic data that we might have. And each of those layers of annotation is built upon the underlying map. So each horizontal thing that you're looking at is basically a data set or what we call a track. And as you can see, just by looking at this one image with a few of the data sets set to actually display, there are many different types of data even just within this one shot. And you can tell that because they just, they look different. Some of them are bars, some of them are graphs, some of them are alignments. And that's sort of intrinsic to the kind of data that you're seeing. So each horizontal row you can think of as a data set. And as I was saying, there are different flavors of the different types of data. We're gonna talk a little bit about file formats a little bit later in the slides just to sort of demystify that a little bit. But just to give you a sense of what it is you're looking at. You're looking at stacks of data sets. And as complicated as this image is right here, if we were looking at this in a live web browser and we scroll down below, you would see the list of available data sets for display is actually really, really long into the hundreds, or maybe even thousands by now. But what we're looking at are only the data sets that we've told the browser to actually display in this main box. And I'm gonna show you how to turn on and off the data sets within there. So up above that main display is what we call the position or search bar. And you can think of that kind of like, I guess I'm referencing Google a lot. You can tell I'm from the Bay Area. You can think of that like the Google search bar in that you can put in anything you're looking for basically, positions, gene names. It'll also take motifs. So if you get to the browser, you're not sure where to start, where to start looking for your thing. That is kind of like the spot where you would type it in and try to search for something and then either it'll give you a list of results or if it's pretty obvious what you're looking for, it'll just take you to that thing. So that's where we would put in coordinates, gene names, that kind of information. And then up above that, there are buttons for navigating within the genome. So either zooming in or out and moving side to side. Those are fairly self-explanatory. Then the other thing that I wanted to show you about the main display was actually how to configure the tracks that you're looking at. So I'm going to take a slight leap of faith and go outside of PowerPoint, go into a web browser so that I can show you this, because it doesn't translate well into a static slide. So if I could just go over into Firefox. We are in genome.ucse.edu. And if I click in the top left-hand side, right up here where it says genome browser. That takes me to a page that we call our gateway page. This is where you would select the assembly or organism that you're interested in from the pull-downs. We're just going to go with the default, which is the human assembly, not the current one, but the next most recent one, which is the best annotated one currently. So I'm going to hit submit. But just before I do, each of these assemblies that you choose, when you go and choose it, you'll notice the thing down below, all the text down below is changing and the picture is changing. And that's because it'll also give you, this is where you would go to find information about where we got that assembly, different items and links to more information about that assembly. So as I'm going through this, the main thing that I want people to take away, especially people that are new to the browser, is all the different ways in which you can find more information. Because there's only so much that I'm going to be able to tell you, teach you in an hour. But if you can learn where to look for information later, if you can teach a man to fish, hopefully you can go and find that out on your own later once you've decided whether or not it's worth investing your time. So we're going to accept the defaults and hit submit. And that's going to take us to the main browser that I was showing you earlier. So this just has some of the, mostly the default tracks on. And it looks a little bit different than the thing that you saw before, because we have different data sets that are turned on or off. So for each of these rows, as I was telling you, it's a data set. And something that I can do is first I can reorder them. So if I click, if I right click, or if you're on a Mac control click, or sorry, if you just normal click, you can actually change the orders of these. So if you wanted to have these up next to each other because you're trying to compare two things, you're trying to generate a figure, that's how you would change the order of the tracks as you're seeing them. You can also zoom in. So maybe there's an item in the visual display that you're trying to zoom in on, rather than having to try and guess at what the boundaries of that are. If you take your mouse up to the top where you see the base pair coordinates, and you just click and drag over a region that's of interest to you, you can either choose to highlight that region or to zoom in. So we're going to zoom in, and now we're just focused on that middle region. So very simple, but just kind of to give you some sense of how to change the display that you're looking at. And for each of these horizontal rows or data sets or tracks, as we call them, you can also turn them, you can change the visibility. So right now, one of the settings that you could have them set to is hide, where they don't display at all. And most of the tracks, so now that we're looking at the web browser I can actually show you, there's all these, this is a slightly bad example, because I have a hub open. But if you, as you're probably looking at the browser now, were to scroll down and look at some of these groupings of data sets, of tracks, you would see there's quite a few, and they're all grayed out because they're set to hide. So I can change how much information is displayed and how much sort of screen real estate is taken up by changing the visibility settings. So for example, I'm going to change the setting of this publications track now to full, and when I do that, rather than having everything sort of compressed down into one horizontal row, it's going to show me each of the individual entries, each of the individual items in that data set within that genomic window. And I pick the publications track in particular because it's one that most people don't know about, but it's a really neat data set where one of our engineers took all of the sequence data that he was able to find in publications, and he mapped it to the genome. So it gives you kind of an orthogonal way to do a literature search. You can go to a genomic region and see what has been published there. So in particular, if you're trying to design primers, see if there's been published primer sequences, this is a cool way of doing it. And just a different way of doing a literature search that's sort of different than the way that we normally think of it in terms of date or key word, that kind of a thing. The other thing that I want you to see is so you saw that I could change the visibility by using the pull downs that are below the main display. I can also get to the settings for any given track by clicking on the bar on the left hand side, so this blue bar, I'm going to click on it. And now for the publications track in particular, for every track, up at the top you'll see a pull down to change the display mode. So I could either hide it or I can change the level of how much I'm displaying. And then for each of the data sets, they're all different types of data. So this menu that's offered to you at the top is going to be different depending on what you're looking at. For this track in particular, you can see that you can change the filtering. You can hide some of the items. You can only look at more recent publications if you wanted to change the date or you can filter that kind of thing. And then some of the data sets are also grouped and there'll be a list of what are called subtracts here and you can turn those on or off. So all of this to say that if you click on that gray bar or on the track title down below, it takes you to a menu where you can set up the conventions of display of what you're seeing. Some of that will be changing the coloring of the track if you don't like it, or maybe turning on and off different types of names. In some of our gene tracks you can turn on OMMIDs, you can turn on RefSeqIDs, or you can turn those off depending on what you're trying to show in that main display. And something that is a little bit more subtle that you might not have noticed, and I'm going to show you this in slides as well, is that everything in that main display, so if we go back to the genome browser by clicking on genome browser up top, everything in this main display and a lot of the things that you're looking at on the website are clickable. So if you don't know how, if you don't know what it is you're looking at and you're trying to find more information, a lot of times, either by clicking on something or by hovering your mouse, you'll get like a little pop up of more information or a click through to a page where it's going to describe different stuff. And so that's a way of teaching yourself and finding more information based on what it is you're looking at. So the reason to kind of dive into the browser now is just to kind of show you how to set up the display. You can click on to any of these gray bars and change the track settings. So we did that for publications. A sort of a quick way to do that, if you right click while having your mouse on this left hand side of any given data set, it'll give you just a very short list of the settings that, so it offers you to change the visibility. And then it also has some quick links to things that people usually like to do without having to go into that other menu, click around and then come back. So that's a way of setting up the display. You saw that you could change the order and zoom in and out. And then the last thing to note is down below the main display, there's a bunch of buttons. We're going to get into some of them in a little bit. One that a lot of people don't know about is there's a master settings button here called configure. If you click on that, and this is something that'll come up once you're a little bit more familiar with the browser. You don't want to waste your time clicking in different places and then resubmitting. This takes you to a master listing of all the settings for all the tracks and you just can go in and tweak them how you want them and then hit submit all at once. So it's just kind of a nice central place to come to you for that. There's some settings up at the top. So if you want to change the size of the box that you're looking at or maybe change the font size of the labeling so you can do that. And then there's also some neat tools for display listed up at the top. The one that I really like that not a lot of people know about is that because it's turned off by default, there's a next previous item navigation button here. If you turn that on and you hit submit up at the top to go back to the browser. Now you'll notice in the main window we'll get little gray chevrons on either side and the reason that's nice is what it's doing. So you can kind of see them here. What that is is it's a toggle to go to the next item up or downstream. So if you're looking at a region of the genome where there isn't any information for your track or you want to kind of skip ahead to the next or the previous thing, it's a nice way to do that. There's an analogous button like that for gene prediction tracks. If you're trying to toggle between exons or introns and you want to jump ahead. So just some quick time saving navigation tools. I don't want to spend too, too much time on all that since it's probably a little bit basic for most of you. So that's sort of the main display and how to do the configuration. And as Dr. Hong was saying, once you have one of their things visualized in the browser, those are all settings that you can do as I was just showing you on the data that's been loaded from the portal. So we saw all these display configuration things that you could do. You saw you could reorder things, zoom in and out. The configure page for any given track. And it wasn't going to show this, but since there's sort of a focus on how to find things, I wanted to just quickly show you this. And so there are even something as simple as where to look in the genome browser can be a little bit overwhelming. There's all these different spots in the browser and it's not clear. I want to search for my gene and I want to search for my region. Where do I even type that in? So there's a bunch of different places that you can do this. As I was showing you from the gateway page, there's also that same position search bar. It's the same position search bar that's above the main display. That's a spot that you can think of like the Google search bar for putting something in. Those search bars, so the one on the main gateway, if you type something in, maybe you type in BRC and I'm sure for this audience it's obvious what the next letter would be. But it'll suggest to you some things that you might be interested in, BRCA1, BRCA2, that kind of thing. You can choose it from there and hit submit and it'll take you to the browser at that spot. Similarly, if you're in the main browser and you type in part of a thing, it'll also suggest things. Or you can just choose to do a search with whatever it is you've typed in. So those are spots to look for, items in the browser and you've seen them already. One more spot that might not be that obvious, especially if you're not that familiar with the genome browser. So you saw the main display and you saw the long list of tracks or data sets that we have in the browser. Those are all data sets that live on a server in Santa Cruz and they're displaying in the browser by virtue of being stored by us. But more recently, we've come out with this thing called track hubs, which is a mechanism whereby data that isn't hosted by us, so data that's not sitting on our computers but is contributed by some external group, those files live on that group site. But we have a little bit of code in the browser that will go and grab data from those files and throw it up to display in the browser as though it was sitting alongside one of our native tracks. And the reason to display this, or to mention that, and especially for encode, is that when you're hitting that visualize button, it was generating one of these tracks in the browser. So I think there's a little bit of confusion about where that data is sitting. It's actually sitting at, I assume your servers are in Stanford or wherever it is that the portal data is stored, but it's not hosted by us, but it's displaying in the browser. So beyond even the hundreds of tracks that I was showing you, there's all this other data that comes from external sources that can also be visualized in the browser alongside of the data that we have as well as your data if you want to upload some data. So I wanted to show you a couple of the other spots that you could look for, stuff, outside of the hundreds of tracks that we have down below this main display. And so the first stop on that tour would be public hubs. So if you click on the hubs button below the main gateway, the one that says track hubs, or if you go up to the My Data menu and you click track hubs from there, or the third place is there's a track hubs button below the main display down there, all three of those things will take you to the same place. That's the hubs area of the browser. And the hubs come in sort of kind of like two flavors. Basically a whole bunch of people, anybody that is interested, can set up one of these hubs. And basically it's a set of files that you put somewhere that are internet accessible. And you feed the URL into the genome browser, and then it goes and fetches the data. And it displays it alongside data that we have. The reason that there's two hubs, two tabs, is there's one spot for you to upload your data that you can choose to or not to share. But we also have this other tab called public hubs, which is a public listing of other external groups that have contacted us, that have constructed one of these hubs and said we want to have our hub listed here in the directory so that people know that we're out in the world existing, and they can search through our resources and have those display on your site. So we've added this search bar here as well. So this is a separate search. It searches through all these external hubs. You could type something in here, maybe methylation or acetylation, something that's relevant for whatever you're doing, and it'll search through the hubs and give you the list of hubs that have data relevant to whatever thing you've typed in there. And that's going to be a different list than if you had done the search on our site, which would just be searching our data. And that's especially useful. I've already heard several people mention the road map data, and we have a hub from them as well. And there's an encode hub in the listing as well. And so you can go to this site yourself as well if you click on one of those track hub buttons and see the long list of external sources of data. I think probably most of the people here are working on human or mouse, but also if you're working on something outside of the standard list of model organisms, this is also a nice place to search. Because another nice thing with hubs is that they can be done on any organism for which you have the genomic sequence. There's a little bit of setup, but it's not too, too bad. And once you have that setup, whatever your organism you're interested in, so for example, we don't host information on Arabidopsis. But I'm going to show you in a couple slides that I've set up a hub on Arabidopsis. And once you load that into the browser, it displays as though plants are one of the options in that main pull down. So all this to say, this is sort of spot number two that you could search for things in the browser. And then the last thing that I wanted to show you, so those two things, you'd be searching for an item in a data set, so a gene, a region, a motif. If you wanted to search for a whole data set, there's a separate search for that. And if you click on what's called the track search button, so you'd be searching for a track, what we call a track, which is a data set, basically. You could click on the track search button in the main gateway below the main display. Or if you went up to the My Data tab, and you chose track hubs, that was not the right thing to say. If you went up to the main menu, there's a pull down to look for it to get to track search from there as well. And what track search looks like is this. This also, I guess for some reason, we really like offering things in two flavors. Track search also comes into forms, just sort of a general search. If you don't really know that much about what you want, you want to do a search across the description pages and the titles. Just kind of throw it at the wall and see what sticks. And the other tab, if you go into the Advanced tab, it offers you more granularity. So you can search on track names. You can search on track descriptions. And especially if you're doing encode work, you can search on all of the metadata fields that are available in codes. You can look on tissue types, targets, all the different parameters that you could sort of slice and dice and code data. You can limit your search that you can find just the particular data sets that you're interested in. So this is another way that you could be searching in the browser. Hopefully, I haven't already overwhelmed you. Just to give you a sense of the long list of options for any of the different things, if you know you want a particular data format, that's a way that you could limit your search. Maybe you know you want a particular type of cell or tissue or any of these other variables, geo-excession number, a particular type of mapping algorithm. These are all variables that you can search on in this track search tool. And as you, I'm sure, know from having worked within code data, it's necessary to have something like this because the data is so voluminous. You can't really scale the way that we would search through our native browser tracks, through encode tracks, just because there's so many of them. And so you could go to the portal and find something that way, display it in the browser. If you're trying to find something among our data, you could also use the track search. So just some ways for you to find stuff in the browser, if you're trying to search on more than one thing, you can click and then shift click for subsequent things to add terms to your search. So all just ways to kind of explore in the browser and to just give you some ideas of starting points and where to look. So if we get outside of the thinking about track search, the other thing that I think is really, really important to take away from this, if you don't already know, is there's two different ways to find more information. I really want everyone to learn at least this one thing and that is how to find information about the things you're looking at in the browser. If you can figure that much out, I think you can pretty much figure things out from there. So there are sort of two levels of describing things. There's a description of the whole data set, which is usually sort of a high level view of the methods that went into generating a data set, credits and references for papers, if there are papers associated with that data set, as well as a description of display conventions, maybe what the colors mean or what symbols mean, that kind of thing. So if we click on, as we saw before, the gray bar on the left or the track title, it takes us to a thing called the track description. And if we click on an item within a track, so if we wanted more information just about that particular gene entry or SNP or whatever individual thing, that can also take, clicking on that will also take us to a separate description of just that item in particular. So if I had clicked on the track description, here's an example of one track description, which is a description of a whole data set. And so you can see when we look at one of those, we get the display configuration up top, as I was telling you about, and some text down below telling you what it is you're looking at. If we clicked on an item in that same track, we get something for gene tracks anyway in particular, we get something that looks more like this, which has information specific to that gene, to that item. So when you're looking at the main display, if you don't know what something is, there's a good chance you can click on it, and especially clicking on it usually takes you to a description page of what it is you're looking at. And this is extra relevant for encode information in particular. Just as I was telling you before, there are track and item level descriptions, and because encode has sort of all these nice groupings, because there are several data sets that have to do with some particular type of question you might be asking, there's also sort of one step higher level of description of all the transcription factor binding tracks or all of the regulation tracks. So clicking on that will take you to one more flavor of description page, basically. And you can also, again, click on the bar or on an item in the thing, and that'll take you to a description of the item. So these are all just kind of, to give you a sense of where the information lives, where to click and what you might see so that you have some idea of how to go and find information as you're looking at stuff and hopefully not getting too, too lost. The encode resources in particular can be a little bit daunting, there is so much stuff. The last thing to kind of note about how encode stuff typically displays in the browser is because most of these studies have been done across several cell lines, several factors, several whatever variable it is, it's not possible to just list to have a short five to 10 list of data sets at the top of any particular description. So if you click through on an encode track, what you'll typically come to is one of these huge matrices, which is sort of dissecting the data, separating it out into cell lines or factors, different things like that so you can choose which components or slices of the data that you want and it's just a way for you to sort of navigate to it. So just so that you're not overwhelmed by what is this matrix thing, that's what it's there for to kind of organize what would otherwise be sort of a long, long list of data sets with any given group. And it's particular to encode because encode has all of these things done across broad swaths of different types of variables. So the item description pages for encode things just like item description pages for anything else in the browser, you would click through and see more description of whatever it is you're looking at for that particular item as well as it'll also load the description of the whole data set down below so you can also read about how the data set as a whole that that item comes from was generated. And you would get to that by clicking on the item in the main display. And finally just a quick, quick word about some of the resources that we have for navigating encode stuff. If you go to our main page, genome.ucse.edu and on the left hand side down near the bottom there's a link called encode. It takes you to this page where we've gathered a bunch of stuff, some tools for exploring the data, the encode data that lives, the older encode data that lives just in the genome browser, but also a link to the portal. So if you happen to be looking for the portal but end up at our site first this is a great way to get there. If there's a link a little bit further down you can kind of explore some of the other utilities and tools on there for navigating through encode stuff. And just one more look at the track search so that you can see that it's a nice way to kind of figure out what you want from the encode data that we have and also from if you have a hub that's loaded to look through that as well. And so some of those tools, they're specific to encode, they're developed for encode to address the specific problem of having so much data and finding the specific things you want. Usually people want a particular tissue or cell type and there's just so much stuff. And so really as was mentioned before a thrust of now is just of the workshop is to figure out and teach you guys how to find what it is you're looking for because there just is so much stuff to sift through. So the other thing that I wanted to show you was how to get your data into the browser and what that means and what those things are. Before I can talk about that, I was just going to talk very, very briefly about file formats because I know that can be a little bit tricky sometimes. So as we saw in pretty early on, just from looking at a basic browser shot, the different types of data, they just, they look different. And you could tell that you're looking at several different kinds of data because one looks like a graph, one looks like bars, another one looks like weird alignment things, maybe you don't even know what those are. And I'm not even gonna talk about all of the file formats because there are a lot of them, but just some that are gonna definitely come up with some of this encode stuff. So in particular, bed files are meant to sort of eliminate presence or absence of an item. So you'll see them especially for gene prediction tracks, but also I think within code, you'll see them associated with peak calls. And I think that's a little bit counterintuitive sometimes for some people because you think peak and you wanna see something that visually looks like a peak, but for these data types, what you're being given are the determined positions of where the peak starts and where the peak ends. And so for those files, even though what you're picturing when you think of peak is something that looks more like what we call a wiggle file, which is a graph, what you're actually getting is just a bar that eliminates where this peak is. And we'll see that in a couple of slides. So bed is just typically bars. You can vary the height of the bars. You can have a little chevron showing the direction of transcription, but basically just showing you where something is located. It's giving you a location. The wiggle format is giving you some sort of signal strength. GC percent, methylation levels, the sedulation levels, different kinds of things like that, but it's giving you some sort of continuous information. Something else that'll come up a lot in NGS analysis, BAM files, their alignment files will come out of RNA-seq pipelines, different things like that. And those tend to look more like this. And then finally VCF variant call format for delimiting a variance that you found and where they are. And I'm not sure if you can see it because it's so small, but each of these little things is showing you the alleles at that position for that variant. And if you click through, as I was telling you before, you can click through on any of the items. If you click through, it gives you more information about alleles that were found and allele frequencies and that kind of a thing. And so just sort of in one snapshot, you can see what these four pretty major formats look like. If you wanted to load this image and see what it is, and this is in the slides as well, I've made a session for those of you that are familiar with the session tool, you'll know how to load that. I'm gonna show you when I start into the exercises, how to load a session and then you can come back to this URL if you really wanna kind of see this image again. And not to belabor it, but just to reiterate the difference between the different file formats. With the wiggle, we have some kind of signal strength, we have good examples, DNA's hypersensitivity, also chip seek signals. And then something that can come out of that, there's often an associated bed format file, which we think of as the peak calls. And those are the locations of where the peaks are. So where the peak starts and where the peak ends. And so what you end up with is one of these bars that's showing you a location, even though what you're thinking about is signal strength. BAM file is usually for alignments, most commonly encountered if you're looking at RNA seek stuff. And finally, as I was saying, very call format for SNP data, indels, copy number variants, structural variants, that kind of a thing. And the last thing to say about file formats is that, so you'll remember I was telling you about track hubs, and those are this sort of weird thing where you can display stuff in the browser, but that data, the files underlying the data don't actually live in Santa Cruz, they live somewhere that you put them on the internet. And so for a particular subset of all the different file formats you could have, these four are certainly not all of the file formats you can put and see in the browser, but these four in particular, because a lot of the data now are occurring in these huge, huge files, it's been necessary to generate this mechanism whereby you could display them in the browser and see them in real time. If you're not used to thinking about these computing problems all that much, it might not be that obvious, but when they get to the size that they are for these types of file formats, it's not possible for you to upload the whole file to the browser every single time you're trying to move to a different region. If the browser had to read in your whole file just to then grab whatever piece it was, whatever genome of region you were looking at, it would take too long, and the image wouldn't respond in real time the way you're used to browsing the internet. So as a consequence, or just sort of out of that, came this need for this subset of file formats which are called index file formats, which are basically these larger, these file formats that are optimized for larger data formats where they've been indexed, and so what the browser will do is when you give the URL to the location of the actual file, the BAM file, the VCF file, Big Bed or Big Wig, you put that URL into the browser, and I'm gonna show you in a couple slides what it means to put it into the browser. The browser then knows where that file lives, and it just goes and grabs the snippet of that file that's relevant for whatever spot you're looking at on the genome, so that the whole file doesn't have to travel back every time, because that would be too expensive. So for index file formats, as we saw, let's go back one more time, BAM and VCF are already into their own kind of index file format. Bed and wiggle format have existed since the ages, they're older, and so more recently that we've developed Big Bed and Big Wig formats, which are the indexed version of those same things. So conceptually, and the picture that you should have in your mind, it's still this same, the data that comes out the other side still looks like this. Wiggle looks like a graph, Big Wig looks like a graph, it's just been optimized for displaying these larger wiggle files. Similarly, Bed is a bar, it's some kind of location of something, and Big Bed is the big version of that, which is sort of a slight variant that's been optimized for displaying really big files full of bed locations. And so as I was saying, it optimizes the speed of display, and we really encourage people to sort of move more towards this remote hosting paradigm where you have your files, and we're sitting on the internet if they're not too sensitive, and are building track hubs, using track hubs like they have in the portal, just because then you maintain control over your data, you're not handing it over to the browser and hoping that nothing happens to it or coming back to us in a couple months and being upset that it's not there anymore. You should always be in control of your data because we all know how important data is. So we encourage people to do that, the display is better, there's a million different reasons, and also for those of you that might have seen that histone track that we have, it's a multicolored sort of an overlay of transparent wiggles. A lot of people are really keen on getting that going for their own data, and that's only possible in the track hub thing that I was telling you about, which is sort of based on this remote hosting paradigm. That said, what I'm gonna show you now are custom tracks, which are sort of an earlier, an earlier way of precursor, I guess, to track hubs. So this is sort of the most basic way for you to get your data into the browser. In the case of VCF, if your data is in VCF BAM, BigBed, or BigWigFormat, you could put them in as a custom track, but what you would be pasting into the input box when you go to the menu would be not the actual data itself, but just the URL to the file location, and you'll see what that means in a second. So if you're, I went through the file format stuff pretty quickly, when you're coming back to this later and wanting to reread and refresh, the places that you could look for that in the genome browser, if you go up to the Help tab and go down to the FAQs, one of the FAQs, because a lot of people get confused about this, is data file formats, and that'll take you to a page where there's a long, long list of all the different formats. Some of them are encode specific, some of them are just general, and each of them describes what that format is for, what it looks like with the syntaxes of it, and especially for this meeting, there's a separate section just for the encode stuff. If you're trying to come at that from the portal angle, there's also a similar page on the portal describing all of these formats, so you can unconfuse yourselves or figure out what it is that you have, because as somebody that's worked on the mailing list, it's, a lot of people get confused about what kind of format they want, whether it's, and I didn't even talk about bed graph as a separate thing, or a slightly different thing, but there's all these different formats, so there are lots of resources for educating yourself about what kind of format you might have. All of them are types, pretty much all of them are types that you can load into the browser, it's just a matter of getting your data tweaked slightly to match the syntax of what we're going for. And so the reason that you would bother is because you could load them into the browser as a custom track, and then take advantage of this long list of data that we have to see what data we have that might be relevant to whatever it is you're studying, and so for that, we have something called custom tracks, and you can get to those by clicking on one of three places. There's a custom tracks button below the main gateway, below the main display, or if you go up to the top and you click on my data and custom tracks, those all take you to this one menu that will look like this. And the custom tracks entry spot, there's a blurb of text up at the top with a hot link to each of the different file formats that are accepted and documentation about that format, so the time that you would actually be thinking about what format do I need would be when you go to upload a custom track. So there's a link there for you already. You don't necessarily have to go and find the file format document on its own, and you can click through to one of those and it would show you how to set up your custom track if it's bad or how to set it up if it's BCF or BAM. And each of those help documents looks a little bit like this. There's gonna be a description of all the different fields in that particular format and then down below that, there's usually a toy example of that format so you can practice loading that format of data into the browser and convince yourself that you have, you understand how it works before trying it with your data which is probably gonna be a much bigger file. And how would we actually load something into the browser? You would grab your data. So in the case of a really, really simple example of a bed custom track with just one data point in it, it would look something like this. Nominally I have to give it this first line which is kind of like a header line for those of you that are familiar with that language. You have to tell it, basically I'm putting a data set into the browser so I have to tell the browser what I wanna call that data set so I give it a name and you kind of give this little declaration to tell it I'm gonna give you a chunk of data and then this one line is actually basically one thing in this data set and this is enough to make a custom track thing, basically a data set with one thing in it in the browser and if I pasted this into this box and they hit submit, it would look like this little chunk of the browser display down below and as I was telling you, bed makes bars and so when I load that into the browser, it looks like this black bar. Not that much to it. It labels it as gene one. It's not very interesting but it gives you a sense of what the process is. You would paste something into that box and it would appear in the browser in some fashion. In the case of bed it's very simple and because it's not big bed, it's just bed. We're actually loading the data. When I paste this line of text, this is the data. If I was loading something that was big bed rather than pasting actual data in here, actual locations of things, I would instead be putting in a URL to the big bed file and so especially if you're doing a lot of the peak call stuff, I think you'll probably have that set up somewhere and this would be the mechanism whereby you're trying to load stuff into the browser. I think tomorrow you're gonna be seeing, I think it's called Chrome HMM, sorry if I'm butchering that name but I ran through the demo that was sent out and one of the outputs is a list of custom tracks you can then visualize in the browser so now that you've seen it, you know how to load a best bed custom track into the browser, you can try that out and you can also see, you can click through to the output from that and convince yourself that the output you're looking at is of the same format as this just that there's more lines of data. So great, we have a bed in the browser, so what? Basically we've told the browser, here's a position, a region that we're interested in or maybe we have a list of regions, hopefully we have a list of regions that we wanna grab data from. So maybe you've run an experiment and it generated some signals and from the signals you decided where the peaks were and now you wanna know, well what overlaps with my peaks cause that's what's biologically interesting. What are the SNPs that fall, where my peaks are, what are the genes that fall, where my peaks are and so we have other slightly more antiquated tools in the browser for doing that but we very recently released this thing called the data integrator. It's on the poster that I'm gonna be showing at the poster session. You can ask about anything in the browser when you come to the poster session, I'm not gonna say no. But this is certainly the most recent and most interesting because you can load one of these bed files into the browser and then you can go to this thing called the data integrator which you would get to by going to the tools menu, clicking on data integrator and it'll take you to a menu that looks like this and so what this is is basically it has you choose some data sets from the browser to add to whatever output you're getting. So as you're sort of envisioning it, think of sort of a table and so you have your list of the first few columns are the coordinates of all your regions of interest listed and now what you might want are all the SNPs that go with each of those regions or maybe all of the genes or all of the some other encode data that we have within the browser. Oh yeah. Oh no, just that I find that, sorry. So you would pick them from here and that's how you could annotate your bed file. I'm gonna speed up slightly to try and get the rest of this in. So you could pick them from these pull downs and you would hit add and you can kind of walk yourself through the exercises to see the steps in using the data integrator and the other thing I'm gonna show you in a second is the variant annotation integrator. It's kind of similar in that you can walk yourself through the exercises that are in the slides so that you can see how those work. Basically I'll just skip ahead to what the baked cake would look like if we were on a cooking show. So you go through and you pick the stuff that you want in your output. Remember I've put in my list of regions and now what I want is, so there are my list of regions that I uploaded as a custom track, imagine. And now maybe I wanted the genes that go with that. Using the data integrator I've added these columns and this may not seem that magical but what you can do is you can add up to five different annotations, types of annotations to your things. You could also have the SNPs you could also have something else that we have a data set for in the browser. So especially if you have that data like that it's a great way for getting at that. And then very, very quickly I wanna tell you about if you had a VCF file and you wanted to annotate that. So remember VCF is one of the things you can load as a custom track in the browser. One of the exercises walks you through how to do that. This is an example of what a VCF custom track might look like. The only thing to draw your attention to is just down at the bottom, notice that a URL is part of the input and that's as I was telling you you would have your VCF file sitting somewhere that the internet can see it. So we would paste these things into that custom track box and it would load it into the browser. Once you've done that you can go to this tool called the Variant Annotation Integrator. And the way to get to that you go up to tools and you pick Variant Annotation Integrator and each of these sort of click through steps is also in the exercises. And once you get to the interface this is what it looks like and what you do and this is the thing that I wanted to actually show live in my last like 2.5 minutes. If you go up to tools, pick Variant Annotation Integrator and I personally think this is very slick. For any of you that have wrestled with a table browser you might recognize the subtlety of why this is so magical compared to the way that it used to be back in the day. But you would load your VCF track into the browser and then it would appear in this pull down of things that you could choose. So I have already loaded a toy example in there. That's my VCF custom track so maybe you have a VCF file from your data analysis and then there are all these annotations that you can add in. You could pull in transcription factor binding sites which is an encode data set. You could also pull in functional implications of your variants so whether or not they fall within coding regions, UTRs, that kind of a thing. So each of these things hosts a whole section of things that you could add to the annotation. If I just take all of the, and if you work on cancer we also have cosmic annotations as well. So you can add them or not add them and then we're gonna just check out what the output looks like. This gives you a little warning that you're not supposed to use this for self diagnosis, for legal purposes. And what it gives you basically is a big table where you've got your variants listed on one side and then all of the different annotations that you've chosen to include in the output. So you could include gene names where your variant falls and it kind of pulls it all into one place which I think is sort of really slick and easy to use. It's nice to have that all sort of in one place. You can also grab this output in Ensembl's variant effect predictor format so for use over there as well. And the last thing to say in like my 30 seconds remaining is just to tell you what this Track Hub's business is about. So if I only got to show you two slides about Track Hubs it would probably be, we already saw the menu, what you need to make your own Track Hub, you need data obviously in one of the formats that Track Hub supports. Remember it's BigBed, BigWig, BAM and BCF at least for right now. You need a couple of text files to tell the browser where those files live and how to name them, that kind of thing. And then if it's on a genome that we don't host you would also need the genomic sequence. And so sort of schematically this is what that might look like, you'd have a directory with some of these files sitting in them and I'm gonna show you an example in a second so you could see what this is. So I give this demonstration to a lot of audiences where they see me coming from UCSC and assume that I'm like a high powered bioinformaticist and that's why I'm able to make a Track Hub and that's actually not true. I'm a biologist by training and if you can just be comfortable moving files around a little bit this technology is very accessible and it's great because you can do things like having the fancy multicolored wig overlay, you can organize tracks into big subsets that you see and encode all these kinds of things and they're all possible with Track Hubs. And if you work on an organism that we don't even have in the browser you can load that into the browser and have it display as though it was one of our tracks. So just to show you what a hub looks like, if I grab this URL and I copy it and now I go up to My Data Track Hubs, go to that My Hubs tab now and I'm gonna disconnect from the hub I was looking at. Oh and it's gonna do that, that's not right. Okay so we're clicking on My Hubs and it's loading in a weird way because I think the internet is slow. And now I'm gonna paste in this URL and click Add Hub. So now it's loaded my hub into the browser and you'll remember we don't host Arabidopsis at UCSC. Now I can pick my hub as though it's one of the things in the database even though these files live somewhere else outside of the UCSC context. Technically I work at UCSC so the place that I've personally stored those files is physically a UCSC but this is technicality. So now I'm gonna choose Plants and click Submit and so with the files that I've set up you can see it looks like an Arabidopsis browser even though this isn't within, it's not something that we're hosting at UCSC as one of our genomes. So for a lot of people that are working on non-model organisms this is a nice thing but also if you're working on particular iterations of human genome sequences this is also something you could attempt. If you're worried about security come and talk to me about genome browser in a box because you can actually use that tool in concert with this to run your own genomes off your machine and have them maintained in a private context. So you can see that there is an Arabidopsis browser in here because I've loaded my hub and if I now instead of pasting that into the genome browser just in Firefox I go look at that URL that I put in to see what these files look like. This is that hub file, it's five lines long you can make it in any text editor it's just a series of what we call name value pairs so we tell the browser what to call the hub, what to put as the label and who to list as sort of the reference contact if it breaks and then it has a couple pointers to other files. If I go up one level to look at this directory to see what's sitting in there there's a couple other small text files that you could see are quite short and if you go to our website you go to the hubs page there's a link to the help documentation for constructing your own hub. There's a template which is this plants hub and you can kind of mimic that and use it as a template for your own data if you're trying to set up your own hub. I know they're strongly encouraging it for encode especially and so the thing that you've been looking at and I think this is confusing for some people is when you go to the portal and you choose to visualize something it's generating one of these hubs on the fly they've got some fancy code in place and so that data is actually sitting with the DCC but it's displaying at UCSC. So just to give you kind of a better handle on where the data lives, why you're seeing it in the browser but it's living over here this is the mechanism it's called a track hub and especially if you're setting up sort of a complex suite of files for encode analysis it's a great mechanism for doing that keeping everything organized and then having control over updating stuff. So unfortunately I talked so long that we ran out of time for exercises but they're in the slides and they're just walking you through the same things you just saw me do which is how to annotate a bed file using the data integrator and how to annotate a VCF using the variant annotation integrator. So it's the stuff that you already saw but broken down into a step by step click through so if you have time later you can try doing it if you get stuck and you want to come find me feel free to do that as well. So I think oh and I should I'll get in trouble if I don't show our funding sources. So you can just look at those while you think about if you have any questions these are the people on our team this is what we look like and here are our funding sources and one final plug and that is if you have a moment if you could take this survey that just asks you questions about whether you thought the material was too hard or too easy things you might like to see in future genome browser workshops hit up that survey link it's just bit.ly and encode workshop all one word and it takes you through some very simple set of questions and it helps us for getting funding for training in the future. Thanks. Thank you for doing this.