 So it's my great pleasure to introduce Cricket Sloan. Cricket is the lead data wrangler here at Stanford at the ENCODE DCC. And if you don't know what a lead data wrangler does, she's pretty much the most important person in the whole ENCODE project. So I'll let you. She's going to talk today about our website and our methods for accessing the data. I'm going to have one moment. So I am excited to talk to you guys about the ENCODE portal. And I wanted to start out with the fact that unlike the previous talks, I actually have already put out demos. This is intended to be a workshop where you actually are poking around in our site. The demonstrations are really simple. They're just making sure that you have the functionality to interpret what you're seeing in our site and navigate through the site. So the first part of my talk will be without demonstration. You can use that opportunity to follow along with the URLs and to get any sort of issues with your computer like connectivity or things like that handled, which brings me to my first slide. I'm starting with the acknowledgments because all of the people here in the DCC are also super important. I know Ben's saying that the lead data wrangler is most important, but that's just we're a huge team and we're all important and we're all here. So you will see people wandering around that have these little dots. Maybe they could stand up. That would be great. Yeah. Those are the microphones and the moving tables. Right. So if at any point you're unable to connect or you're not understanding following along, those are the people to talk to. So excellent. Oh. So the other thing here is I have reference material. So these two links are leading to the slides as they are, which have lots of URLs on them and to the demos that we're going to do. The slides are not advancing. Oh. My slides are not advancing. Oh, so here's the people. There you go. So the guy told me that I should rub my tummy to make the slides advance, which is a true story. OK, so here they are. Now they're advancing. I can make them work. So there's everyone in the DCC. There's the reference materials. Great. And I wanted to start out as I try to do all talks with the goals. But before I get the goals, I would like to get sort of a feel of who my audience is. So I have three questions that I'd like to see or raise of hands for so I can kind of understand. Before you were enrolled in this meeting, who here had already gone to the ENCODE portal? Awesome. OK. And who here has a really good concept of what a DCC is and what their role is in a large consortia like this? OK. Less. And who here is planning on their method of interacting with ENCODE data being primarily programmatic that they're not really going to be navigating our site, but they're going to be downloading piles of data? OK. All right. So let me give you guys a little bit of orientation here in that those of you who are planning on just dumping all the data programmatically, the first part of my talk might seem irrelevant. However, I'm hoping what you're going to glean from that part is the data model behind what you're seeing. So even though you're like, oh yeah, this isn't really much detail, I'm hoping you're going to understand our data model. For those of you who are primarily going to be navigating our site, when I get to the programatics where I'm actually putting little snippets of code up there, I actually am doing it in a simplistic method so you can have a take home to the programmatic people you work with to say, ah, yes, we need to know about JSON, REST API, this, that, and have a list. So OK. Excellent. So my first part of my talk is an overview of what is a DCC. The role of the DCC is pretty wild. And it's a parallel process to publication where the data's coming in from the production labs. And the production labs and the DAC give us a lot of information about what they want their standards to be. The DCC looks at all of that data, looks for anomalies, and gives feedback about what is and isn't meeting those standards. Once we correlate all that data, we're distributing it to other relevant resources. And our central role is to disseminate that data for the scientific community. That would be you. I'd like to just address the fact that at our project, we're integrating not just ENCODE data, but we also bring in the roadmap of the genomics and the modern code and modern data. We also are bringing in the genomics of gene research, or no, regulation. And we may have other projects in the future. One of the tasks that we do in here is we integrate all these data using ontologies and controlled vocabulary so that what one person is calling heart here and another person is calling cardiac tissue here is coming together as one item. One of our goals is to collect very rich metadata. We collect metadata on biosamples, on the antibodies, on the files themselves, on the software and the pipelines that are being used, the libraries that were created to be input to a chip seek experiment or an RNA seek experiment, and on the donors and or strains. One thing you'll note in our system is we are originally thinking of human. So mouse strains are called mouse donors, which I think is really interesting. And I think of them signing off on their consent forms. So one of the things we do is we identify those reusable resources. So we give them actual names. So people can talk about a biosample as ENCBS000AAA. I have highlighted here in yellow the encoding system of the accession. So an experiment has an SR in it, a file has an FF in it, an antibody AB. All of these items are so that you can share these items. And when you're talking about a pipeline, you won't be saying, oh, it's the ENCODE chip seek pipeline. You would be saying, oh yes, it's ENCODE pipeline 00AAA. So I'm gonna take a little detour on identifying biosamples. I find that a lot of people coming to our site have a little bit of trouble understanding when they see our list of biosamples. So we don't consider a cell line K562, a biosample in our system. K562 is a biosample term. You can have a K562 sample. And in this case, let's see. Here we go. So here we go. Here is a cell line. It has two independent growths. Each of those independent growths will have their own identifiers. Those would go into a library, each into their own library. And those libraries themselves would have individual identifiers. Similarly, over here in a human, there might be heart tissue and liver tissue that come out. That liver tissue might be further separated. The heart tissue might not be separated but then have two separate libraries prepared. Every one of these are getting their own biosample or identifier even if their biosample term was all heart or liver. Okay, so one of the key things that we care about in the ECHO DCC is tracking the provenance of the data and where it came from. I'm sure all of you who have had this experience where you've downloaded a bunch of files and they say things like heart filtered, trimmed, chip and some coded target name. We are really trying to get away from that and have every single detail of what went into making that process file. So here you'll see that there's a process file that you download like Zping was talking about like a peak file. We tried to track the pipelines and the software that went into that file and to indicate all the process files that went into that process file. The software that used to develop that going all the way back to the raw data, the replicates structure and the particular biosamples and which experiments it belonged to. So this is a key slide to refer back to for those who are programmatically accessing and trying to navigate through our metadata. So those of you who have never heard of a DCC which seemed to be like most or not heard of but did not understand what a DCC was doing which seemed to be most of you. We do curation, we do data integration, we try to do data standardization, resource identification, we track provenance in context, we promote reproducibility and our ultimate goal is data sharing. All right, so our next topic after you've been oriented to our DCC is the site navigation. So I'm hoping that all of you are on includeproject.org and are looking at our very first page. I think you've seen this on Mike's page and slide in the beginning. The key things I wanted to point out and we'll start in a circle here is we have a quick help in the bottom that will help get you started. We have recent news where we put things like our updates. We have the keyword search or the text search which we'll talk a lot about later and then we have each of these menus. This entire bar is persistent throughout every page and these menus are persistent. Okay, so in those menus, we have first the help and resources. I just wanna point out that we are, these people that we showed you here are all live all the time at the help desk and when you email the help desk, you're getting one of our smiley faces answering you. Also, if you want more information about the data, there is an announce so you can get announcements as they're made. I believe you've been told before that we have our tutorials and our release policy and I move over because I feel Mike Payson handled this. Additionally, I think that he handled our documentation. We have our publications and our experimental standards. We also have more information in the tutorials. The publications pages, when you get through the publications, there are some publications where we know the data set that went with them. You can see this link here that will have a data set and this will list all of the files that went into that publication. We try to do this to promote reproducibility. Here is our biosample page. That was under the materials and methods and that first page, you'll see materials and methods and biosample. We're not gonna delve too much into the biosamples and antibodies other than I'd like you to know that they're there and that we have these pages that have all of the details that we collect with our goal towards deep and rich metadata. Again, here's the antibody page. You'll see that under your materials and methods, you can move down and you'll see the whole list of antibodies. You can drill down further to see a particular antibody lot. You can drill down further here to have characterization, information, and drill down further to get the exact characterization. The next item on that bar was the encyclopedia access. So I believe Zepin has already talked about things like the pedia. You have both the about, which will lead to the whole section she talked about earlier. And then we have a matrix that will identify individual annotation files, primarily in bed in bed, big bed format that you could download. Okay, so starting here is when we're gonna actually try to search data together. So the first, or what I'm trying to get here is just for you to understand how you can browse and search through our data. Oh, look at that X. When you're at our site, there are three primary views that you can see experiments in. And so now we're on that front page you saw, the, there's the bar on the far part, there's the one that says data. And there's a matrix view and a search view. This search view is our primary page, it was our first developed one. We have the action buttons up here. We have filters, which we're gonna talk a lot about later. And the key thing I'd like for you to understand about this page is it's per experiment basis. So each one of these rows represents an experiment. You can use these little view changers here to get to the report view. What has happened here is we've collapsed into a tabular view. The features that come with the tabular view is that you can sort by any of these. And so if you were looking on that front page and you got to the report view at this point, you could click on accession and sort by there or target and sort by there. The other feature that you can do is you can click on this columns and you can adjust what the columns are that are in that report view. And then the third feature of this view is that you can download this whole page as a TSV. The next view is the matrix view. What is key on this view is that it is collapsed by biosample term. So we've just discussed that I have biosamples where every growth of K562 is its own biosample. This here is collapsed by a biosample term, which means that this row of K562 not only represents every growth of K562, but any K562 that has been modified in any way. The top here in the matrix view is organized by assay, sort of giving you a perspective of which biosample terms have been covered in the most different assays. I believe Mike and Zepin covered that we have DNA assays, we have Rampage to discover transcription start sites, we have various chip seek assays as well. So as I said earlier, we're going to hit this search box pretty thoroughly here and that brings us to on our bit.ly, the demo one, the free text search of encode. What I'm gonna have you do in that free text search of encode is to enter skin into your search box. And if you wait just a second, I'm gonna bring it up on my machine so I can make sure I'm with you. Cool. All right, so you hit skin on that or you hit, you type skin and you hit return. When you've typed skin and you hit return, the first page you're gonna see, you'll get an option of what type of data type that you want to click on. So we're gonna look at experiments as that's the primary way to access encode data. However, there are many ways to access it. So you'll click on experiment and it will come down to a page that looks like this and we're going to filter by encode data. This is a place for me to point out once again that we're incorporating data from other projects like mod encode and roadmap. Those projects although are incorporated are not necessarily done in the same uniform way as our data is from encode. Another thing to note here is that the skin that you entered earlier wasn't just a text match. It is an ontological search. We are using all of the relationships that we have brought in from CL and Uberon to recognize that a fiber blast of dermis is a data type that you might want to be looking for if you're looking for skin. All right, on to demo two. So that was one way to enter is just put in a term of interest and search and see what you get. Another one is to use our browsing and filtering features. So this starts out that if you start from search or from data here, I'm pointing data to search, you'll get to our search view as I discussed earlier. And if you want to make decisions here, you can look at these facets and make decisions like just clicking on, oh, here we go. Make decisions like clicking on the project and the assay and the organ. And I believe that we have, not on the screen, but I'm pretty certain you'll see that in the demo, we've asked you to choose encode and we've come to that same group of data by selecting skin under the organ facet. No one has any questions, their site works perfect. Yay, making sure I'm with you. Okay, so now we come to the how you would combine this data together. So on the demo three, it's a combined search and filter of the encode data. In this case, we start out by putting a search, putting skin in the search box. You select experiment. When you get to this page, we're gonna select RNA seek as our assay and we're gonna select adult as the life stage. And if no data has changed in the last week, I believe you will be down to 14 items on your list. In this way, you can start with, I think we have 10,000 experiments on our site and very quickly narrow down to what you're looking for. I just want RNA seek. I just want encode data. I just want adult. We went from 10,000 to 14 in just a few moments. We were modeling after Zappos. Okay. When you then can click on one of these RNA seek, which I hope I have the right one, click on the RNA seek of the melanocyte of skin. That brings us to the experiment details page. So on the experiment detail page, there is a lot of information. So much so that it takes two slides for me. The top of this page has two sections. So on this side, we have a lot of details about the experiment. And on this side, we focus on what we call attribution. What project is from, who did it, who was the PI for the grant. Other places where you could find that data, if it came from encode or roadmap or geo, it may have links to other sources for that data. Over here, you're gonna find details like the biosample summary and the target, what controls you're using, more of the biological information. The next section is our replicates. So we try to, in encode, to replicate our data. And replicate structure can vary significantly for many reasons. We list here the replicate with the biosample link and information about the library. And then again, the summary of what that data is. The next, actually I wanna go, I wanna explain the files and then come back. So let me move a little bit down. You'll see down here is the list of all the files that go with that experiment. So the files that go with that experiment, we've divided into raw data. These are fast cues or array files. And then we have the process data. This is data that has been, in some way, processed from that raw data, whether it's a mapping or a mapping and a peak calling. Then we have, for each of those, you can filter here. I'm hoping you guys can see those little boxes. If we have multiple assemblies available, you can click on that, oops, back. There you go. You can click on that and you will see a choice of all assemblies or GRCH30A or HD19. And that will filter your files for you. Going back to this image now is given context because what it is, is it's trying to show you how all of those files are related. So in this case, we have some fast cues that go into a processing. They get mapped. We have a genomic mapping and a transcriptome mapping. And signals are called off of that mapping. I'd like you to take a moment to start clicking on that graph and you can see underneath. Here, you'll start to see information about that particular file or that particular software. Yeah. All right. So that graph is meant as sort of a visual cue of how the experiments related and to allow you to delve deeper into pipeline and software information. The next part on this table is the control by. This, or controls, sorry. Some experiments are used to control other experiments. A lot of the RNA-seq experiments are used as controls for the Rampage experiments. And so that relationship is indicated there. And then the final page here is the protocol documents. It has all the protocols that the lab has given us regarding maybe library production or, sorry, library production or how the biosample was collected or how the software was handled. I wanted to take a little side note on a slide that I don't have, but it realized listening to Zeeping. Zeeping was talking about the annotations that we have. So in addition to experiments where we consider an experiment to be a wet lab experience where we have taken some sort of tissue and we have harassed it in some way and then we have done sequencing on it. The annotations objects are for holding maybe what you might consider a computational experiment where they've taken all of this, what she's called ground level data, combined it in some way and come up with an annotation like, or not candidate, enhancer-like regions. So there are objects in our system where you will come to pages that look very similar to this, but they will be annotation pages. Another thing that I wanted to, so I think that we had a backup site for it. Okay, thank you. So we had a contingency for if we had too much access too much access on the page at once. Okay, awesome. So I'm gonna pause for a minute while we start out that, and okay. But so she's saying that Stanford requires you to reconnect every once in a while to the Stanford visitor, but I think that that is distinct problem from not being able to access the site. I believe that I was told this morning when I was worrying about this that the answer to my worries is to not do live demos. So the user sites up. I have no way to enter it or write it. So users.production.ncodedcc.org. This is our backup site. You'll see a red bar across everything indicating that it's not live. So are we on? Well, wait till you're all on it. Ben, should I direct some people to test? So you can also go to test.ncodedcc.org as well. If people want to be on that one, so we have sort of a distribution. So are we back on target? I have no idea how long ago I lost y'all. Much further. So about here, should I go back? Okay. So this is a great thing to talk about URLs. So you can just ignore this whole first part that says ncodeproject.org. You have whatever URL yours is and you can add slash experiments, E-N-C-S-R-Zero-Zero-C-U-R. Or you can take that number, E-N-C-S-R-Zero-Zero-Zero-C-U-R and just put it in the search box and you should get this experiment coming up so you're looking at the same page. I also want to take this moment to point out the reason Ben was really excited to announce me is because that meant he wasn't giving this talk. Okay. All right. So some of you can go back to the site. Some of you can go to theusers.production.ncodedcc and some of you can be on test. We have three different versions of this site up and going and now we've recovered. All right. So I'll try to more quickly go over this, which is that now that you've gone through all your search and you've narrowed in on your experiment, I wanted to orient you to what you were seeing and what kinds of data are available for those of you who are just gonna access it programmatically. So we have a couple sections, the attribution section. I believe we discussed that has what project it's from, who owns it, links to other resources for it and the release date. And then we have information about the actual bio sample, the controls, the more scientific core information of that experiment. This section here is the replicates and the replicates structure and that has the specific links to the bio, the specific links to the bio samples and a summary of the bio sample and all the details that go into it. Also, if this were a Chipsick experiment, the antibody would have a link here as well. And then we discussed how you have files here, which you could filter by assembly or all of the assemblies. And that this graph here is designed to show you the relationship of all of the files that you see in that list. So sometimes you see these giant lists, but it's really unclear how the files are related or what you are looking at. This is a dynamic graph that tries to use the derived from information and the software processing information to build a graph to show you what those relationships are. Okay, that's a fun question. So what she's asking is how final is the data that is out there? And the simple answer to that is that for process data, there will always be a newer, faster, better version. We are working right now to bring all of the human data that we have to GRCH38 in a fixed pipeline. You can ask us in particular, what is the most recent pipeline? That's one of the reasons we identify pipelines so that you can recognize if you have one set of data that is done with a pipeline that's different than the others. You will see that our ENCODE2 data has not been brought up to the level of ENCODE3. We are working on that. The goal is for every bit of data that has enough reads, enough replicates, meets all the criteria of ENCODE3 data. We will run it through the ENCODE3 processing pipeline and we will bring it up to a uniform with the ENCODE3 and bring it up to GRCH38. So you will see a diversity throughout the site of some data at HD19, some at GRCH38, and with mouse we have an even more of a complexity in that you will see some at MM9, some at MM10 minimal, and you'll see some new ones coming out at our full MM10. So, yes. Okay, so this is an experiment is core around the actual experimental data. So consider the experiment accession as we did this experiment. The processing may keep improving and that processing will have an accession that is the pipeline itself. So if you were to be poking around in this page, you'll, if you click on any of these blue boxes, which I can't do for you live, but if you click on the blue boxes, you will see somewhere where it says pipeline and that will have an accession number for the pipeline and our name. Now that I'm not seeing it, let me, I can pull it up to tell you if it's a name or a, so the answer to the question is we don't plan on changing the pipelines frequently and dramatically. They kind of come in larger chunks. So for encode three, we have pretty much a fixed version for the RNA-seq, the chip-seq, and we have maybe two versions for DNAs. They will be labeled with different pipeline versions on the software, however, the experiment is actually accessioning the experiment itself. So imagine the wet lab person at the bench and that's what that is accessioning. Does that answer your question? Was you, right? Okay. All right, so you've had a little time to navigate through our graph and the site's still working. Excellent. So I wanted to introduce you to one other thing and I threw this slide in late because we haven't really got this developed for human yet, so I wanted to switch over to mouse just so you can see the difference. So mouse, just so you can see, one of the other things we're doing is we're generating these more curated sets where we collect data together for a particular reason. So in this case, over in the attribution, you would see a related data set. If you hovered on it, it says this is a reference of a genome for embryonic facial prominence at 15.5 days. So this is a collection of data that we intend to keep up to snuff with the IHEC standards for what a reference of a genome is and we have selected as close of data as we can to represent that reference of a genome. And so when you click on that, you can see other experiments that are related in that data set. Other relations we have is organismal time point. So it could be that a chip seek assay was done on the same tissue, but over different developmental time points. We can collect all of that into a data set. And so then it would say, oh, I'm also part of this data set. So you can see this collection of data together. Okay, so where are we with time with all of them? Okay, good. Okay, so on we go to try to search another thing, which is we have just released our first version of region search, which I believe both Mike and Zepin showed you. If we were to go to the main page again, you would see the data and you could go down to search by region and it will come up with this page right here. So as this is our version one, I wanna tell you some details and caveats about it. It takes as input the gene ID or a HGNC symbol. If you start typing in a symbol, it will try to guess for you and say, oh, are you thinking about this or that? You can also put in a coordinate range like CH1 and give it the actual coordinates. It takes RSIDs and ensemble IDs. And then it will take your identifiers to this genomic coordinates and then it will convert them to a specific assembly. So this is version one. Your only assembly choice is HD19. We are working on being able to allow you to work with mouse and GRCH38 and mouse MM10 and MM9. This then goes through all of the available bed files that we have and looks for an intersection with that coordinate with a file, a bed file of that type. Again, this is version one. So we only are intersecting with TF chip and DNA seek at the moment. And I know all of you are like, oh, wait, why don't I have the histones? Well, it will be there. When all of this has happened, it's gonna return for you a list of experiments that have files in them that intersect. So I believe you have a demo that takes you through BRCA1 and then has you filter down by MCF7. Just a few days ago, one of our colleagues and users of ENCODE data pointed out that they had been doing research on this SNP. You can see the reference here for the abstract and that if we put that SNP in that she interestingly found DNA seek overlaps in nine different cell types that were interesting for her. So I wanted to show you another example of how one might use this. I also hear the idea that that's great if you have one SNP, what happens if you have 50? What happens if you have 500? And again, we have been prioritizing the width of the search before the depth of the search. So we wanna make sure that we get all the histone data and the mouse in and then we can start working on the interface for what does it mean when you ask for 500? Do you want it to intersect all 500? Do you want that to be an and or an or? So we will work on that as well. All right, so we've, yes, okay, great. So now we're to visualizing and downloading the data and you have your next demo which is the visualized data demo. When you have filtered the data, whether it's by region search or by using the search box or by using these filters, once you get down below, I think it's 100 at this point because if you try to visualize more than 100 on the genome browser, it doesn't really make that much sense. Once you get down below a particular number, you will get a button that says visualize. When you click on that button, it will, if there is an option for you, it will say what are the assemblies you would like to visualize on and that depends on what data you've selected. And then you select that and you are magically transported to the UCSC genome browser. How is that working? Wish I could like see a mirror so I could see if everybody's pages were on. Okay, so once you're at the UCSC genome browser, great, thank you. Once you're at the, oh, you're getting all yellows. That's interesting. Oh, I actually, you. Okay, okay. So once you're at the UCSC genome browser, hopefully you're seeing something more like this screen and not like his screen. And each one of these tracks represents a file. And if you look below, you'll see something called hub search, which is their track hubs, which is a bingo word in case bingo is still going. So the track hubs aren't working if you're at the users.production. Yes, do they work at test? Okay, so they'll work at test and they'll work at our site that's backup, which is encoproject.org. Okay, so we have one thing I want you to get clear if you're used to working with the UCSC genome browser is this comes up as a track hub and each experiment comes up as an individual item at the track hub, to which you can click on those and you can see metadata that will link you back to the experiment of choice. I wanna put a little plug in here that right now, one of the issues is that for the short names, because you can only have 16 characters there, we only, we have the ENCFF numbers. However, if you do hover over it or if you expand it, you get a longer name that tells you more detailed what that file is. And that this summer we are working on improving our track hub interface of making better track hubs. All right, so were people able to get to the UCSC browser? Yay. All right, so batch download of data. One thing I didn't talk about earlier is any given file, if you get to that file, there's like a little button, you can click on it and it downloads, you can download the individual product, like protocol documents. However, this is batch download, I wanna download huge amounts of your data. So there is a demo six for that. Again, once you've filtered to the collection that you're interested in, you have a download button. That download button is going to give you a one singular file that is a list of all the files that you're interested in and a HTTP to a metadata link. I recommend not using that metadata link. I recommend using JSON and the REST API, which we're gonna talk about to get the metadata. But for those of you who are not used to using JSON, it does come in the TSP. So you click on the link, you'll see this box. It will talk to you about where to get help. It'll talk to you about the command you're gonna need to use on your machine. So some of you will be set up to use this right away and say, of course, I use curl all the time. And others of you are at this point are going to say, what, where do I type in command line? Okay, so you click on download and you'll see a little files.txt. And you will then get a file that looks something like this. Part one, the metadata. Don't take files without the metadata. That's sort of my number one lesson to everyone. Right here is the HTTP for a link with a TSP of the metadata for everything that's in this file. However, you would also be able to get it using JSON from the link that we were just at. And I will show you that when we get to REST API. And then part two is the actual files back to this box is you now have this file downloaded on your site called files.txt, sorry, files.txt. You put this command in and it starts downloading the files. I don't recommend everyone does that right now because it's a lot of files. But I wanted to give a clarity on how one might go about that. Is that good? Okay. So finally we're at the REST API and I also see my minutes clicking down. So we'll try to get through this somewhat quickly. We do have a help on our REST API which I start with right here. For those of you who are not familiar with what I'm talking about. So all of our metadata is kept in a database. It's served up as JSON to our portal. And our portal is interpreting that JSON to make those pages for you. You could access that JSON directly with queries to using our REST API to get all of the information that's being used to generate those pages. So I wanna reiterate that all of the portal content is in JSON format. And so right here you'll see this link. And if you see at the end I put question mark format equals JSON. The link where you once were seeing a target, now you're seeing all of the details in JSON. Some of you are gonna click on that link and you're gonna see a bunch of gobbledygook and you're gonna say, wait what? I recommend that either later today or soon or even right now you go to Firefox, Chrome or Safari and download yourself a pretty printer for JSON. And that will magically transform your gobbledygook into something organized like this where you can click on the minus and plus signs and shrink up or decrease the data and you can look there at your model. A slide that I just realized that I should have had in here you will find in our help docs is we do have a profiles page that is slash profiles that has a list of all of the fields that we use in JSON. Okay, the other take home message that I want you to learn here is that the search queries when you're clicking on these little boxes here that's building a URL for you. So I click on RNA seek here and you'll see over here that it now says assay term name is RNA seek. I click here and it says assay term name is DNA seek. I think when the site went down I missed a bit which is that when you click two items within a group that functions as an or and if you click two items within different groups that functions as an and. So this will get you anything that is RNA seek or DNAs and that will also have to be mouse. So here are some examples of searches. I just wanna get encode three data. I want to get all data from encode. I want to get encode three mouse data. If you're trying to do a search through our data and you're having trouble navigating our profiles in our data model please let us know. The Wranglers are usually super fast at generating those. Thank you. So the thing to let you know is that your JSON can be retrieved with HTTP requests. So here is an example where I have a small Python bit of code and this should be in your take home messages where okay I can import requests. I can construct my URL. In this case I just say I want this particular experiment. I can use the request library to get that URL. I have, now I have this experiment and I turn it into JSON and I can use that response as a dictionary to query in any way that I want the metadata. Here is a more complicated one where I've used a search. I've built an entire search. I want assays that is chip seek. I want human. I want the target investigated as a transcription factor and I specifically want in vitro differentiated cells. So my hope is that those of you who are planning on accessing my site completely programmatically you're like oh look I can just build these URLs and if I'm confused I can email you and say I'm trying to get human transcription factors but just these particular transcription factors and we can help you build that or teach you more specifically how to build that. So yes. So the URL is the way to do that. One of the things we are discussing is building some sort of shopping cart where you can basically you've done the query at a particular time and that query has now become the particular list of files that you can save permanently. Yes. However right now you just have the URL which means that if we have future in the data, in the future if someone has added more data that URL if you went there yesterday it might have 14, you go there tomorrow it might have 15 because someone has added data. And so yes we are talking about a shopping cart feature where you would be able to save a fixed list. So wrapping up I hope that you guys got information about what is the DCC. I hope those of you who have not been to our site can navigate through our site. I hope you understand our vision of browsing and searching so that you can get through the 10,000 experiments to the experiments that you find interesting. That you understand our vision for downloading and visualizing data that don't download all the files without the metadata. And that you understand where to get the resources for the REST API and to build that REST API or to use that to get information by building the URL to get the information that you're searching for. And tomorrow we will discuss the other fabulous things that the DCC does which is that we build these data processing pipelines so that we can try to have all of the data analyzed in a uniform way which is not what you found from ENCODE 2. If you go to our site and look at most of the ENCODE 2 data sometimes you'll find uniformly processed files but ENCODE 2, every single lab processed their own files. So there was a lot of variation and how the processing was done and we are working on trying to bring that all up to the same harmonized vision so where everybody's gone through the same exact pipeline. So tomorrow we're gonna have the Pipeline Workshop and as always I want you to remember our help desk and tomorrow night at the poster session all of the DCC people will be around for help desk sessions. You can talk to us about pipelines, you can talk to us about data access, you can talk to us about features that you would like, you can talk to us about errors that you find on the data and you can complain about the fact that the site went down today. And again I wanna acknowledge my staff that we all work together and we are fabulous and I love this team and my PI and my colleague Ben who did not give this talk and that's it. Questions, I'm hoping most of your questions were answered. Yeah actually, you need to go, I actually can't hear him. That's okay. I got the first part but I couldn't hear the second part. Can you use this JSON tool to automatically download files? Not just data and experiments but the files that you showed in this list with the code rule command but from the JSON script and get particular files that you are interested in from given experiments. So the answer is sort of yes and no in that if you pull the JSON it will have the address to pull the file but JSON itself isn't a method to pull the file. You will have to use something to pull files, right? Okay so the JSON stores the metadata for the files. One of those pieces of metadata is not the file but is the HTTP link for the file. So it has the HTTP link if you can then go through and find a way to pull it that's great but it's not like it's embedded in the JSON which is different for our documents or smaller like our protocol documents. They're actually embedded in the JSON so when you pull down the JSON object the file itself is embedded in there. Different we didn't embed 25 gigabyte FASQs in the JSON. We gave HTTP addresses. Yes, I'm gonna turn that to Ben. Should we rate it? We shouldn't have to. You... That was, we believe that you shouldn't have to. However, it wouldn't hurt at this point. We're working really hard in increasing the stability of our site. We very recently increased the complexity quite a lot and the fractures are showing so we're sorry. Other questions? Oh, one more. So for mod encode I believe we have all of the FASQs at this point that we're gonna get. For roadmap, the FASQs that are not behind DBGaP that have no accessibility issues, those are starting to come in. We got our first 100 of them last week. So those will start to come in and we'll have those links there. GGR is giving us all of their FASQs before they give us the process data and the same with modern. Does that answer that? Roadmap's coming. Well it's coming and of course in roadmap there's a bunch of data that we can't have it because it's not freely accessible.