 I'm going to reload that. So if you go to here, you can see. So it's been running for 38 minutes. So actually, it might be done. Indeed. So if you got your job started, when mine does, or within a few minutes of it, you can see that the two alignment jobs are done, and the two BAM to BigWig jobs are done. But the quantitation from RSAM is not done yet. Maybe yours is done, if you got it done the first time. So we can look here in this sort of graph plot that Seth was showing for the chip seek one. So the RSAM right can't even start until the star finishes. So you can see in this here, you can actually see it lists all the inputs for all the steps, which is overwhelming, I know, in all the outputs. But sometimes you need this. Otherwise, you can look at it. And so this, I'm holding my hand, you can see it. This is just a bunch of numbers. If you wanted to look at that, that's what that JSON formatted string was. I just thought I would bring it up again. So we're going to go to this step here. So it doesn't matter a whole lot which one you pick, since they both are BigWigs and they both be visualizable. But if you want to follow on exactly, you'll want to pick the BigWig stranded that was generated by star, which will be the one that was started immediately after star finished. And top hat, as you can see, took maybe 50% longer. So if you just click on that, there's one way to get there. So now for this individual step, so the step which is run the output, you can see the input here is the star genome BAM. Again, we have the chromosome sizes, which it needs in order to calculate the BigWigs. And so as I mentioned before, there's four different BW or BigWig files that were created by this step. And it's four because it's a stranded calculation. If it was done with unstranded, then there would just be two. There would be an all and a unique, but we have a plus strand all, minus strand all, plus strand unique only, minus strand unique only. I'm just going to go ahead and do the plus all. So what we're going to do is I'm going to tell you ahead of time and then walk it through. So we're going to get a download URL from DNAnexus. We are not going to download the file. It's not going to work. It's too big. But we're going to just cut and paste that URL into the Santa Cruz genome browser, just as Pauline showed as an example yesterday. And it should be only a few steps. You can see as the file is, this is a very small big wig file. It's only a small chunk of the genome. But it's still a megabyte, and there's 300 of you, so it's not going to work. So this file is selected. You can tell because it's blue, and it's got a check mark. Everyone with me so far on this part? Or if you don't have a file, then you can follow along later. We're going to click Download, which doesn't download the file. It gives you an option to download the file later. You can click Get URL. Don't download it. I don't know how the thing will tolerate it, the wireless. So this gives you a temporary URL, which is, and note that it decays. Because as somebody asked earlier, if you're dealing in PHI information, this is now a live link to your file. I don't know how that works. So just be a little careful about this. Right, warning. Anyone with this link can access this file. It does go away in 24 hours. So I'm just going to clover click, copy the link address, right? Now I had some issues with Santa Cruz, so I'm going to try this one here. Now we go to just the genome browser page. I think that this is the default page, the gateway, if you just go to sanacruz.ucsc.edu. And we're going to click Add Custom Tracks. Get a paste in that URL with clover v. That's it. Like it actually can read it. Let's read it again. Make sure we're not doing the wrong one, right? So chromosome 21 hemi, star genome plus all. Submit that. Takes a second as Santa Cruz makes a connection to the DNA nexus system, which goes to Amazon, which then ships the file back. And faster than you can say it, there it is, right? So it's got my file in this Custom Tracks. I'm just going to go to the genome browser. And it's actually there in the top in a format that's hard to read, but it's a little easier to see if you go to Custom Tracks here. I'm not the browser expert, but I can click Full. Refresh that. That refresh button. You can see RNA-seq of superoxide dismutase. OK. Maybe it wasn't that abrasive. Coincidentally, SOD happens to be on chromosome 21. And that's the UC Santa Cruz's default position. But also, you can go to another gene like amyloid precursor, which was my other example. Click amyloid precursor. Well, while this is doing its thing, it's always nice to see when your RNA-seq lines up on your exons of your gene. Did you have a bigwig file? Did you get a bigwig file? Or are you stuck somewhere else? Oh, it no work, huh? You're on chromosome 11. Oh, if you go to your, I should have mentioned this because I actually forget this all the time. If you go to your own genome browser and you use the UCSC genome browser regularly, it stores all the information of where you last were, including your last genome reference, including your last position. So you may have to either reset it to HD19, or it can specifically go to view, reset all user settings. Obviously, if you want to keep your user settings, you shouldn't reset them. You shouldn't then figure out how to visualize it on this browser. So this is a no problem, just because it's a couple steps, but they have to be done perfectly. So you probably want to reset. I'll just reset the browser so that you guys know what I'm talking about there. So this now, I'll try to reset the browser if my mouse will cooperate. So you can either go to My Data, Custom Tracks, or if you're on this particular window, you can put Add Custom Tracks. It's a custom track as opposed to a track hub because it's fetching it directly. A track hub requires a little bit more funnesting. I assume you're going to have to set that up yourself. Paste it in here. If we wanted to, we could go back and do this. Repeat this for all the big-wig files or for various things. I'll submit that to upload it again, just wasting bandwidth. So once you're at this case, the Santa Cruz has found your file and decided that it is indeed a big-wig file that you set it to. If we go to the genome browser, it should draw it. Because I reset, it should set me right to SOD again, which is the default gene. But it should work for any gene located in about this central region of chromosome 21, which is about half that I picked. I think it's $12 million to $36 million in coordinates. So how many people here actually got big-wig files out of this? You all deserve a beer after this. What time is it? So people wanted to know how to get a FASCUE file from ENCODE onto DNAnexus. Should I move on to that, or do you guys want to mess with this a little bit more, the browser? Sure. So I think what you want to know is you want to know what's the FPKM value for each gene that you calculated? OK, so that is you. That's from the RSEM output, the quantification files. I think I can show that pretty easily here. If I can pick my right window here. So this link has expired, so we'll close it. So I'm going to go back to the, oh, look. It looks like my job is done. There's no number one here anymore. So that means my job is finished. Look, ran successfully. So the quantitation RSEM output, quantitation step, has two output files. And they're both tab-delimited files, right? Because it's bioinformatics. So this is for every isoform in GenCODE v19, which we inputted it. And this is for just the genes in GenCODE v19, plus the tRNAs as well. And there are no spike-ins in my sample input. But if you do do RNA-seq with spike-ins, then you want to include the spike-ins sequences into your genomes that you are aligning to so that you can quantitate relative to the spike-ins. So I can't really, I don't have a visualization for this kind of file. I mean, it's a 9 meg file. But I think I can preview it, sort of show roughly what this file looks like. It's going to be mostly zeros, because it has an entry for every gene in the human genome. And not all of them are in the middle of chromosome 21. But you can see that it's got, these have funny IDs, because they're tRNAs at the top. I can go down. All zeros. Oh, there we go. There's some genes. Right? So this is the length of the gene. These are still all zeros. If you download, if you get this file, this RSM file, and you download it, it's a little bit tricky because of the format. But if you look for genes that are on chromosome 21 by the ensemble IDs, you can actually get a number. Do we put in the gen code version 19? That's what these IDs are here. This is the ensemble gene ID, and these are the transcript IDs for that gene that it's quantitating across, based on gen code 19. But again, when you, oh, you guys can't read that at all. I'm sorry. This says ENC, ENSG000. And some of these, this one actually looks like it has this little bit of a, no maybe not, still zero. You have to find the ones, because it's a toy example. If you, on your own time, run a real example, it will have values for genes across all the chromosomes, not just 21. That's the output of the RSM, which in this case is just called a results file. It doesn't have a file format. It's just tab delimited. So the header, the first line of the file is a header reporting what, it's essentially, I don't remember all the columns offhand, but it's essentially the ensemble ID of the gene, the comma delimited list of ensemble isoforms, the length, I think another length based metric, and then like FPKM, TKM, something like that. I don't see any frantically waving hands. I thought, because this question came up and it seems timely. So if I wanted to run, let's say I wanted to rerun just, let's say I was really interested in this particular, or okay, here, this is how I'll do it. So this example here is an experiment SR368QPC from Barbara Wald's lab as total RNA-seq on some IPS cells. Let's say I wanted to rerun this experiment, for example. So in order to rerun it on data nexus, I have to get the FASQs from, I don't have the slide, but from Amazon, from our storage system over to there. So what you can do is start on the experiment page, or there are various other ways you can use the search portal, I don't really have time to go into them, where you can find files. I can go back if you do what you want. Down here at the bottom, right, so, right, here are the raw data FASQ files. So they're organized here by biological replicate. You can, this is sortable in this case, and typically for an experiment like this where there's four FASQs for each replicate, then we have a, our system concatenates them all, right? So we have, I have a step that's not in that test pipeline I gave you, which when you give it a list of FASQs, it actually will concatenate all the FASQs and use that as the input. But to get this, instead of, if you click download here, it will actually will download a two gigabyte file, so don't do that. But if I go here, copy this link, similar to getting the files to Santa Cruz, I can go back to my DNN access window. I can click add data. Now there's three ways you can get it here. This is very critical for getting files into DNN access. One is literally I could upload, if I had a FASQ on my laptop here, it would literally just upload it, which would take a while. I can get it from another DNN access project, which is how I got the ones for the demo example. I grabbed them from my, my other folder, or I can get it from a server, which is some URL. I could paste that in there. And since this is not using our local bandwidth, I can go ahead and add the data there and it will start to download it. It will take, this is a big file, so this will not be done in 10 seconds. I can close this and actually you'll see that it's this built in DNN access program called app URL fetcher is running and waiting to connect to my server and download the files there. And there are ways to get in like a large number of files from DNN access. So there's some people who had questions or no. There's a question there. You wanna get a mic, go back. Okay, can we get the comment line of the, you know, out of this whatever the steps it was done in the pipeline with this first thing? Yes. And we will talk about the third visualize in a moment or because it's showing some errors that the BAI files are not created, so. Right, so we did not build our system to interact with the DNN access visualization. They sort of maintain that independently. And so like the step doesn't actually create BAI file. So that's why I can't visualize the BAM files there. I think that they are working on a couple of ways that they could visualize some of the files that we produce. But there's, it's just not how we do it in, we don't expect people to be, I mean I think that's what you would do if it was your own data in your sense. You could add the index files. We should, I'll have to consider adding an index step or saving the BAM indexes. But it does add up disk space wise so I can do it. Your second question, oh, you wanted the command lines. So when I mentioned the applet source, so just to go, here's fine, right? So in here, clicking around a little bit, sorry. Let's just take this star paired N1, right? So just as a, let me know. So in this black box is what DNN access calls an applet. An applet is just a piece of code that is effectively in this case a shell script and then the specific executables that that shell script might need, right? So, and there are various ways DNN access has of getting like top app binaries or star binaries or SAM tools binaries to the virtual machine where it runs. And so if you go to our GitHub for long already seek, you can read these shell scripts and it has the exact command line that we use to run it. It does, you do actually have to be able to read shell script to get that out, right? And I think that we have, yeah, do we, we don't store the command line that we run, right? Oh, it's in the log also. I wonder if it's in that one. Yeah, yeah, that's what I'm gonna see if it's in the log. We'll see. Should be, right? That's still still uploading. So this star one, this goes, because like, hey, I actually say, did anybody's actually job actually crash? Like, did it go red? Did you try to restart it? No? Okay. But you can view one, what the, if you ever are developing your own workflows here on, yes, I actually think this one doesn't have the whole command line. Might though. But when they do fail, what you can do is like, like you were running it on your Linux workstation as you just go read the log file, see what's going on. So this just sort of tells you what's happening. But actually, yeah, this script doesn't, this script doesn't actually report it. I didn't think that it did. But apparently the chip seek ones do tell you the command line. When I look at the R-SAM file, it's a bunch of zeros, can you go over that again? Yes, it's a bunch of zeros because the tiny input that we gave it is only half of chromosome 21. And that applet that we wrote for R-SAM is designed to quantitate across every gene in the gencode file, the whole genome. So if you, you have to do a little bioinformatics because you, if you look through that file with grep or something, or AUK or Perl or Python or find the FPKM columns that are greater than zero or greater than 10. So they are in there, but there's only, I don't know how many genes there are on chromosome 2100. Any guesses? Should we take a bet? It's like jelly beans in a jar. There's more than six. More than 21 too though, that's a good guess too. So the chromosome 21 genes are the only ones, not even all of them, the ones that are in the middle are the only ones that will have any value because to create this fake example, I took a real example and just took the map reads out of it. I hope that was clear. Does anyone here do stuff that's not on mouse or human that they might want to use this for? Okay, because if you may have noticed while we were doing this as we, the only reference genomes we had were for the ones that we'd run. It's because we're not really mandated to run. We have some warm and fly data, we don't do it. If you wanted to create your own indexes, we have the code here, although it's a little bit cryptic how to do it. Oopsie, if you hit the arrow, you go here. But we'll go here. Oh wait, is that the one that ran? Doesn't matter. So in here, there's a bunch of steps that are called prep. Okay, here. So prep top hat, prep star, prep arsem, and merge annotation. These applets, which can be run individually, independent of a workflow, take input and create indexes, right? So for star, right? It takes a transcript GTF file. So like the gencode file that you download is a GTF. It's GZipped. So if you look here, by the way, I didn't go over this, because I was trying to get people through. These actually tell you what type of file extensions are accepted by the applet, because the code in here will actually unzip the file and then run it. It takes a fast day reference genome. It's optional, see how now this one is white as opposed to yellow. So this is an optional spike in RNA sequences, which if you don't have spike ins in your input, you just leave a blank. And then you would just run this applet for whatever strange organism or useful organism you have. And it will generate a tar GZIP file, which is the index for that genome. And you would have to do that for each of the three indexes that you need to create. You can, if you want, also actually modify the workflows and add that so that it runs at the beginning and then run it all at once. So that was a good question. Or that was a good thing like that. I'm not gonna go into making workflows. Think we're about done? You wanna just finish up? Yeah. Yeah.