 So at this point if all is good you should have something that looks like this on your desktop Okay, if you if you don't It's okay now because I'm not gonna ask you to shut down your lids because that would be a very bad idea But I will continue with the lecture And then soon we're actually gonna do something else There are other ways To do this there are many ways So what are we gonna do next so we're gonna? Talk about getting data in and out of Galaxy doing operations in Galaxy understanding the user interface linking multiple steps into a pipeline And doing sort of standard bioinformatics as well as next-gen sequencing bioinformatics and sharing these So there's this slab. That's a separate two-pager. That's in your handout And What we're gonna do right now actually I sort of anticipated this We're actually gonna use the public version of Galaxy because what we're doing is very We're not doing next-gen stuff first. Sorry to disappoint you. We're gonna start easy. I'm starting you easy So we're gonna use Galaxy so you set up a separate tab Different browser whatever you like for a separate tab and type in use galaxy org and we're gonna do This slab here, which is basically from open helix, which is a company that does for which actually is contracted by UCSC to put out these Free tutorials out so we're kindly I'm acknowledging them and I'm using one of their things And we're gonna get data out of Galaxy. So what you have to do first is is Type in use galaxy org Into a new browser window. Don't touch your your Amazon one. Leave that one going Leave it up. It's it's what we're doing is not gonna affect that Or it's just gonna warm us up on on using Galaxy So that later on when Emily has you doing really really complicated stuff It'll be very easy because it will get It will get complicated. I want I kid you not okay, so everybody should have This page and if you click on get data So there's also I don't have it with me here, but in the binder There is So what we're doing is these two pages in the binder After the lecture We're gonna basically do go through this. So what's on the screen and what's on here? It's not quite the same This is an old handout some of the version numbers and And the numbers might be different, but the operation are exactly the same So we're going to start with page one, which is the second page of page one and We're going to go through this. I'm going to go through it on the screen here I can go through it live on my machine as well and We're just gonna follow Okay, so the actual operation we're doing right now is is a little simple minded But it's just to go through the mechanics of working on Galaxy and and what we're doing is basically the same for for all things so You look at this pic the first page says Go to the Galaxy site, which we did actually use galaxy.org is the same as what they wrote there Galaxy dot PSU dot edu is the same place Get data and then go to click on the USC main. So we're gonna get data and The second item is UCSC main and then lo and behold We have the UCSC browser that shows up Let's see if I can make this So it fits in a screen And it's legible Okay, so By default. Yeah, so it actually looks a lot better than on the on the big screen than on my minor So by default, it's human, which is what we want it's The hg-19 which is also what we want it's also It's the group It's actually not the group we want so we don't want gene and gene prediction we want Variations and repeat so what we're doing is we're gonna go look for the snips related to a specific gene Okay, so we want variation snips variation. You know that meme shows And then we want it for the gene we want is Bracka one BRCA one so we're gonna go it's not that position. So we're gonna type in the the name gene name BRCA one We're all gonna type in the same gene Okay, this is not the part of the lab where you go type in your own favorite gene. Okay, not now later, but not now and Then we're gonna do so we have BRCA one and we're gonna do lookup. So now we're running Galaxy in Pennsylvania, right? We're not it's not the cloud one. It's it's So what it does if you look go look at this whole page It is BRCA one is multiple copies of it multiple variants and there's RefSeq gene and there's Non-human genes blah blah, so I'm gonna make this very simple for you Just take the first one of the first at the top of the list. Okay, so everybody clicks on the chromosome 17 interval there and Once you do that Then you should come you should see what it does by doing the typing in the gene name and doing a lookup a God knew the coordinates of that gene, but it now knows chromosome 17 That's a coordinate for that gene. I bet you that that doesn't include the promoter, but that's another That's another story Okay, and so the folks at USC and the folks at Galaxy are actually all right. Did I hit the second one? Journal Club missing Journal Club Huh, okay, actually it doesn't really matter your numbers will be different, but let me so you want me to do it again, okay? You don't trust me. I mean, that's okay BRCA one Look up So I hit the first one because the first one so if you hit the first five hundred versus four six eight Which one is yours? anyway, I'm gonna click on the first one again and So you get an interval For that gene and then I'm gonna get output So this button here click on this and then you get this window and Here I'm gonna say send query to Galaxy So the first the reason I know this so if I go back I'm gonna go back a screen is that up here I told it to send it the guy that I'm sent output to Galaxy So UCSC and Galaxy folks, they're they're talking nice to each other So whenever you want to send the data to Galaxy it knows that this query came from within Galaxy So it's gonna send it back to Galaxy Right normally when you go to the UCSC browser It doesn't say send the data to Galaxy because you're not on Galaxy. You're just using the UCSC browser so I'm gonna get output and I'm just gonna leave things as is. I'm just gonna get send query to Galaxy So this is gonna so a green box is a good thing. So if it's green, it's good red. It's not good Okay, and right now my box on the right is grayed out. That's not good either And so basically this little clock here saying it's waiting to start. Okay, and then once it's running It's yellow and once it's done it turns green So right now it's still waiting. So that's not a good sign One thing I can do while it's doing that so the eye is to see and it's yellow. That's good I like yellow so now it's running and so instead of a little clock. It's got a little tumbling thing Okay, and And now green even better. So it's not finished So the eye here is to look at it. So if I click on the eye, I see the output Here's my data output in the middle pane like I told you before it would be and so it's chromosome position in all these intervals So four one two four nine three six three and three six four So they're all one nucleotide intervals. They all have an RS number. That's a snip number DB snip number They all have a zero here, which doesn't mean anything. They all have a plus or minus here Which tells you the plus or minus strand? Okay It's a it's an empty column. They stuck zeros in them. It doesn't mean anything. So don't worry about it It's everything's got zero So so this is the output So the name of the query or the the results file is this name here So you can also click on it once and when you click on it once it gives you the first five lines and the first and all the columns and Sometimes it actually has So it knows the file format. So it knows that column one is chromosome number column to start column 3 is and Column 4 is name Column 5 is nothing and column 6 is strand So you get that information on here. So you click on here it disappears a Little pencil here next to the eye you click on that and Then this gives you the opportunity of renaming that your file So instead of having this long use CSC main blah blah blah. I'm gonna call this BRC a one space S and P S. Oh, yeah, that's not good. Thank you one Brca one snip and then I do I go scroll down and I hit save So now if I click on that So I have Bracka one snip, that's my name that I gave it I could delete it if I want it There's a pencil and there's the eye to look at it So if I want to click on it the data hasn't changed. It's still the same data If I click on it once It also tells me I have 99 rows or regions or whatever intervals positions of one Yes These are each each at each of these position. There's a snip on that in that gene. Yes Most of them what? They're most most of them are length one forty seven to forty eight ninety nine to a hundred sixty five fifty Sixty-six snips could be more than one but most of them are one. Sorry, which one? Now is not the zero zero is not as nothing to do So to this one zero. Yes, so that one Could be an insertion point. Yeah But I could look up you can go into DB snip if you're very curious I'm not right now Because I'm in the middle of a class if you go look up this RS number and and you can find out Okay, I encourage you to do that Curious mind and you can tell us report back to us what the answer is so So that anyway, so we have a list of snips at that one with one nucleotide Well, we actually want we want we want to see the 50 nucleotides around this position. Okay, so Next step, let's read our instructions Bracket one is it blah blah blah Tell hyperlink Okay, so one thing you can do also This is a note-taking good thing to do is History oh What did I forget to do? I forgot to log in Because I was trying to write down So to I could remember in my history, but since I'm not logged in it's not going to remember my history so so here we're on the UCSC public server So most of you are new to Galaxy have never logged in here. So you'll have to register I Logged in before so I'm just going to log in But I think register is pretty fast. So If you've registered before Wrong password so now if you click on the user it knows who you are so Registration is a I think you register right away, right? It does right? Okay. Good. I've been such a long time Okay, so now Yes, so I have unnamed history you can click on that and what we're gonna call this is we're gonna We're gonna call it gene flanks and if we were so enthused We're also gonna add some tags We're gonna say snips Bracken one So I got a couple of tags. So if I'm looking for them later. I have I have things to go look them look for them so I have Unnamed history didn't remember gene Flanks so now So we're actually finished part one new part two She was logged out which we haven't done blah blah Operate so we're gonna operate on genomic intervals so actually Let me see make sure So we're gonna we're gonna get So you have to find operate on genetic interval here, but a shortcut here is you have this search window So I'm just gonna type in flanks and actually It's there it goes it pops it right up to so it's really easy to find so there's so many tools on the sidebar instead of finding them Why Michael see you see you soon have a safe trip home. Thank you very much. Thank you So operate on genomic interval click on that Sorry get flanks and so if you type in on the On the side so hide toolbox Show toolbox or show tool search tools So if it doesn't show then you have to click this little round thing to show it and Then I just typed in flanks and then it's get flanks is what I was looking for And so Galaxy is really smart is already guessing which file I've lost you Okay So Right now. We're gonna get the flanks of this one position on that file that we have So get flanks is in one of our tools. It's in the operate on genomic. So let me Do it the other way? Yes, I am I'm on page two page three the Step two First get the flanking region of the snip in the left tool column Cling and the heading operates. I just showed you a trick Which is not an editorial so Which is to I just look for the word flanks But if you don't want to do that you can look for the word operate on genomic intervals Operate on genomic right here it is and then in within that There's get flanks and then you click on that and then it's that shows up in the middle to get flanks. Yes The what? So, okay, actually that's that's a good break to let's go look into that right now So I'm in Firefox. I think it's supposed to be So if you come back to this window everybody For this window So this access Galaxy should be dark So who has a grayed out access Galaxy? One two three, okay. Oh Well a few more. Okay. Keep your hand up. Just so I have Okay, you know of a problem. Oh Some people started it again. Are we are we've run out of volumes? Okay, so we've run into a problem see this is Okay, so this doesn't affect the other lab that we're doing right now But it will affect other lab so for that other part what we'll have to do is we'll have to team up I'm sorry. So you'll have to find somebody that has a working Cloudman version of Galaxy and you'll have to work with them. Okay? Sorry about that. We have to get more volumes next time So how many how much volume did we have? You had 144. Oh, yeah, take some notes. Okay, so I'm gonna go back to this one here It I would prefer if they followed me unless they want to go faster than me, but some people are having a hard time Okay. Well, then I'd like to hear about it What where they? So where they stuck which position loading data, okay Yes, so let's so if we start from the beginning so I'm gonna start from scratch I'm gonna start. I'm gonna clear I'm gonna delete Everything and we start from zero so I have nothing so so I'm gonna get data I'm gonna use CSU main, which is the second thing under get data. Oh What did I forget to do? I forgot to log in. I'm already logged in. Are you logged in? Is everybody logged in? somebody not logged in huh Then no, that's it. But if you don't have an account it says what does it say it says? Register if you're not logged in then you register if you know it's a login register register you put in your email and Your name and a password so then you're logged in and then We're using use galaxy org Right, we're not on the cloud table So the the manual says Snip 130 or something like that we we're using 135 because that's what is the most recent one the group We're looking for that's important here is is this variation and repeats and over here You highlight this delete what's there and you type in Bracka one BRCA one and I do a lookup and Then so it does you see a C does a lookup for the gene name Bracka one And it's got Bracka one is in multiple multiple. It's a very common gene a lot of people love that gene Lots of work done on that gene and so it's present many time for the simplicity of the exercise actually tells you to look for a specific one We're gonna avoid that part. We're just gonna take the first one on the list if you hit the second one. That's okay Somebody who can't click like Ian And then you get a numbers back See the numbers you get back is the chromosome number in the interval of the gene. You just select it Okay Once you've done that then you get output And then here again, you just send query that you don't change anything you just send you I want the whole gene I'm not saying I don't want the promoters and so forth So I want the whole gene as I just send the query to galaxy and Then you get this green box in the middle, which is good so it's sent the query off and so now it's in the back end waiting to execute and It's gray right now. So it's not started yet. It's in queue to get processed Yellow means it's running in green means done. So instead of green if you configure the query wrong or something like that It came back red and I believe me. It will happen to you. It's happened to me That means you did something wrong and you have to go back and fix it But the fact that you got a green box is a good thing So the first thing you should do is you should Well, they can do first a couple of things you can do first one You can look at it and the one we can do is we can just change the text right away Like I suggested before which was you highlight this whole thing and you say bracket you rename it bracket one a snip Snips plural and then you hit the save button, which is down here somewhere. There it is Save and what that does is just renames the file. So you can click on that You see the first five or six lines and it has a and on the black tab Or a black strip there that actually has the names of all those lines It only has five or six here and if you click on the eye You have the whole file the whole output Coming here in this case actually the whole output is only 80 or a hundred lines or what have you so it actually shows You everything but if the output was actually a million lines It would just show you the first hundred or some of that and it would warn you that this is only a part of it You I'm not showing you everything because the file is too big But in this case, it's only 99 We've renamed it and so forth. So that's part one Or the other part about part one is you can change if you've logged in Then you're able to actually name the history and we're gonna name this history get flanks And I'm gonna return that so it's saved it then I'm gonna add some Some tags this is not required, but it's just makes it easier later on if you're looking for things So snips is one tag Another tag is a bracket one So I click somewhere else. So now I have two tags I have a name of my history and I have one data file called bracket one snips These are all the snips For that position on chromosome 17 Around that bracket one genes. Okay So is everybody got that does anybody not have that The history are you are you logged in if you click on user? Do you see your name logged in as Okay, so you are logged in so then on this on the blue on the sidebar in the blue bar there It says some what does it say below the blue button? Okay, so click on that And once you hit return it'll save you have to return yes, okay, so And what's the next part? So we're getting What are we getting The flanks so there's two couple ways I showed you so one is show tool search I just type in flanks Oops, I can spell So get flanks. So I click on get flanks and then here according to step Make sure on the right page Get flanks step three Change the location of flanking regions to both so right now Downstream upstream both we're just going to click both And everything else I leave the same and then I do execute green box a good thing Brown box over here not so good. I mean just waiting That means it's going to be 50 on on on the other side yeah either side I think it's either side. Yeah both sides Yeah, 50 on either side not upstream or downstream, but on both sides So it's got a little clock waiting anybody's yellow or green here Green okay good. Ah, there we go yellow and green. So I'm not going to rename it, but I can look at it So now If I look at the intervals here either side that's 50 Surrounding it's not 50 plus 50, but it's 50 In the middle basically so next step extract genomic DNA sequence so we're done first sequence Click the tool extract genomic. So we do step four here That sequence so we go Let's go back to the menu here. I'm gonna click this away so fetch sequences extract genomic DNA Okay And the thing here is I'm going to change not fast day, but I'm going to call it intervals That's important thing to do here. So not There's two choices here fast day or intervals. We're going to go with interval Which is basically a tab separated format and then we execute again So you're getting the hang of this right so So here so extract output data type Interval or fast day So I don't want fast day. I want intervals You see it right here So over here on the left on the left panel you have to get fetch sequences and extract genomic DNA using coordinate from assembled or unassembled Genomes, huh To this one here, it's a so remove flanks if you type flanks before you have to delete flanks and Then you have to scroll or you can type fetch we can type extract or you can Scroll down to fetch sequences extract and so forth and then execute So mine executed so if I look at it So I now have Those darn lower and upper cases same answer that Malachi gave before repeats and and non repeats, but now I have Also, I can move I think I should be able to move this There we go Squeeze squeeze this a little bit here Squeeze this a little bit on this side Come on Yeah, so the artist number is what oh, they're repeated so they may As same coordinates. Oh Yes, segmental duplication except it's the same interview on us is different intervals So it could be part of a CNV. That's a good question. I actually don't know. Yeah No, what upstream and downstream is is for the Did the DNA like to the left or to the right? Yeah, but do we have 26 6010 Oh, they're all in pairs. You're right. I think maybe you're right. Yes. Yes. I think you're right. I stand corrected. Yeah It's actually 51 way and 50 the other way Yeah So each one is yeah, they didn't tag him together. It's not a hundred, but it's 250 base pair region Okay Good good observation So fetch DNA sequences flying blah, blah, blah Okay So yeah, so now we have so it's in a tab delineated format And if I look at if I click on this one on the right here, let me put it back like this bigger again Sorry, so so I have chromosome start and name Five is zero doesn't mean anything strand and seven is the sequence. It is not named But basically it's a fast-paced sequence. It's sort of that. Sorry, not the fastest sequence the nucleotide sequence Okay so and then The latest data set but like for the any sequence This is page three turn to page four and now we're gonna Actually Repeat what we just did But So this first we're gonna make sure that step three that all Check box. Okay, so options History column click extract workflow so we're gonna go so you click on this on the top dial here on the right and Extract workflow you select that one every Don't do as I do as I say not as I do Let me start over So basically So three steps type in the workflow name so I'm gonna call this one Get flanks Now I don't want to restart my computer. I'm just gonna delete everything else Workflow now, so I'm changing. I just changed your workflow name And I'm gonna create a workflow. They're all checked And I'm just gonna create a workflow. So if I go look at workflow So I have lots of workflows from various things But the get flanks there the top one you shouldn't have that one. You shouldn't have all these other ones Especially if this is your first time So once you've saved it You have your workflow So now we're gonna what we're gonna do is we're gonna go over here back here And we're gonna erase everything basically So it's blanks out everything and now we're gonna go look for another gene So there's a gene here, but you can if you have your favorite gene that you know is there You can look for your favorite gene. So how do we start there? So we get data So let me yes, so You're gonna upload a file But it would be So it would be a follow-up coordinates So you could have it in tab delineated format Yeah It would look like are the thing we got from the UCSC, right? Yeah, so it'd be snip coordinates or yeah, so you could do it that way too So but now we're getting the data from UCSC But if you did an experiment and you had your own file, then yeah So we're gonna Go to UCSC main And now I'm gonna go look for a different gene name I'm gonna go look for the gene they tell me to look for because I'm sort of that kind of person but Want to look for your favorite gene? I encourage you to do so. I'm gonna take the first one on the list because I don't like to complicate my life and Then I get Output click once send query to Galaxy and Away it goes then while it's getting it I can I'm gonna rename it and This one's gonna be called clock snips and I'm gonna save there we go. It's getting it's working. It's getting its stuff and It's green. I can look at it Same file format as before I click on it It's 59 9 lines and so forth, but now I Don't want to redo the same thing we did over again since we've kept our workflow I'm gonna click on the workflow tab and I'm gonna get Flanks workflow the workflow that you save to whatever name you save to So I'm gonna run it So you can edit the workflow Yeah, and then you create it and modify it and do different things and save it to a different name. Yeah You'll see what we'll look at a very calm this workflow. I will look at this workflow later Basically what I could do here I could just run the workflow and it's basically getting the Intervals and it's getting the flanks getting the flanks getting the sequence Without me because it just remembered the workflow because I just renamed it While so I'm gonna let it go right now. So it's got three steps It's one step. I did by doing the lookup then it's doing the other two steps automatically So this is a very simple workflow You can remember how you did it you save it to a workflow all the steps you did to do it And then you give it a different data set basically So we did it on bracket one gene and now then I looked up Clocked gene and then I gave it the snips I found and I gave I'm pushing processing it through the first step I did manually So that part of looking up something up, you know, I have to type it in have to go look at it Accertain but if it's something like Joe I was mentioning if it's a file that you already have Then you can just run on that file again. You don't have to look up that file You just feed it that file you load up the file So and it's it's here's my third step and It's it's done the same thing again. So what we can do also is you can go back to workflow Look at the first workflow and instead of saying run. I'm gonna say edit and In what it does it actually brings it up and it's a very simple workflow three steps I can Click on every step or it's two steps getting get the output data get flanks and oops And then it's got all the money. It's actually got the details of that workflow So you can change it here too if you want to and and so forth and right now Only the third the last step so you can move these things and you see the connections between them Only the last step has this little thing highlighted So it doesn't show you all the intermediate steps right now But if I wanted to see if I'm debugging something or there's intermediate steps Which are quality controls or tests like that then I would highlight those steps so that I would get also see that data So you can either see every step of your workflow It's assuming that you don't want to see it You only want to see the last one But if you do want to see it you just click on it and it will say it'll show you that that step So I'll be part of the output that it shows you if you can look at So this was a really really really simple two-step workflow, okay Did everybody get that? Yes It ignored the so that getting the data was not part of the workflow you're right yes Make that part of the workflow Intuitive to get the data I Guess getting so the workflow works on a data set And so the first step is you're getting data uploading data Starting on something so which data do you want to do so we got some data on which we sort of applied that And if you if you remember when I said save workflow Let me see if I can go back to that page I'm gonna say leaf page, okay No, not this one So So actually when I saved the workflow it didn't have that first step as a step It was basically it was those two other steps. It showed it to you as such I Can't go but I mean I have to run it again. I'll maybe it later on if I have time. I'll run again show you Okay, Emily Like a hundred fold on your own in Galaxy So now let's now we're gonna go into the cloud Again, wow, that's a lot of people So we have basically everybody's got a pair up Okay Should what they should see They should see a get If it doesn't work for you Remember click No, we're not we're not gonna have they're not gonna be happy Well, the thing is it's gonna I mean you're gonna bug so if we do that So this is an example of on-kind use of Of shared resources Which were not really meant for next-gen sequence analysis although although that said The files we're working with are relatively small So those of you we could try so those of you who Do not have them and it's basically the same thing who do not have the browser That's sort of the cloud version working should try So It should be so first thing to do over there on the cloud is to log in again And since this is a new instance, you'll have to register because it doesn't know you so did with we did all this so you saw so I showed you so this is our here Ian and Actually immediately he's gonna walk you through this so I'm not gonna do this So this is the workflow that we're gonna run are gonna we're actually gonna create one of these workflow So the pain here that I'm looking at Is a small part? So this is the whole workflow here, right? And this box is just one part of the workflow. You have to move your box. You want to see the whole workflow So this file for this workflow is on the wiki So you can actually download it when you go home and run it on your own instance of Galaxy somewhere on your home computer on your server at U-Pen on Never so you have it so you can do whatever and this is what it looks like and Basically We're gonna copy Files and we're gonna we couldn't run this workflow, but Emily's gonna do it a better way This is she's gonna run some of the steps. So basically after you've done that just so I'm gonna remind you I'm just gonna sort of a plug for another Project that I I know some of you may have heard about it's called genome space and genome space is actually Some of you did the pathway workshop last week and some of you may have done some other workshop Genome space is a workspace that brings together side escape galaxy gene pattern Genomica IGV In silico DB and UCSC genome browser So all in one sort of workspace is this it's another free tool from the folks at the bro they're trying to take over the world and They're they're well in their way on the galaxy website. There's lots of videos on how to do things I recommend you look at those Don't forget your files try out genome space And then here I have links to the galaxy project Biostar which is really sort of a nice sort of Q&A so people can ask questions about next-gen sequencing stuff Open helix they have a really great blog lots of a good Twitter a very active Twitter account UCSC browser Also good documentation see cancer is another good place for questions on the technology the lab and and the bioinformatics of things Okay, so we are going to try to reproduce the lab such that That we have done this morning So for the early sick analysis So basically what we're going to do is that I'm going to a guide you across different first step on how to upload the data how to to rename the data and How to start the analysis, but then you will be On your own and trying to reproduce all the different steps of the labs are running top hat cuff links cut if and so So as Francis says all these different steps are available on the Wiki so you will have if you want to So later if you want to download the workflow you can download it and then open it and View it in a galaxy so you will have the complete workflow corresponding all also on all the different steps that you have seen That you have done this morning So if you didn't do it yet you have to register because if you don't you won't have access to You won't have access to different Features such as renaming Histories or Opening workflows. So it's important to to register first. So is everyone registered into Okay, yeah, so Yeah, so the first step would be to create To create the history in which we will be working so you can just rename The history if your history is empty or you can create a new history by selecting this little like will hear and create a new history and Then so you can rename it by just clicking on the name So here I'm just calling that module five And then you press return to to Save the name of the module so the other step is going to be to download To upload all the files that on which we have worked this morning So what we have done is that for that to be a little bit easier is that we have Concatenated all the normal and also tumor Fast you put together and so you have the you will have the fast you fall of the normal read one normal read two as a tumor Read one and two more to so you will have only like four four fast you files and two for each two for normal two for two more and So what we're going to do is that so we are going to So yeah, so you select get data and then you click on upload file So now we're uploading so contrary to what we did this morning or just later in the last lab We got the data from UCSC now we have data files. So we're going to upload these data files So the file that you want to upload are And Lab so you can just select all of the file so all of the URL at the same time and you can paste them into the Your text box here. I'm not I'm just using a post file get data You cannot notify. Yeah, it's a it will it will unzip them. Yeah, so that's not a problem So then you have so you can let everything by default except that here You will select a genome which would be hg-19. So you can just write hg-19 and Select Wow, that's very and select so this one hg-19 genome and then you can click on execute Okay, so you get the green box in the middle, that's good now we may have another coffee break Hopefully oh it's good. See so we're all running separate instances now We're all not going to the same place So we're all running basically the the limiting factor is probably the bandwidth The bandwidth between your laptop in there. It's not the actual So now the files are being downloaded. So what happened is that normally? Galaxy will be able to detect the format of the file so it will Know if it's a BAM file or a best file and it will And an annotate them accordingly, but we are going to check that So the first step will be to Rename the files because here as you see your galaxy save the entire URL as a file name So that's not very easy to look at so what we're going to do is that we are going to edit the All the different file names. So we're just clicking on the on the pen icon To edit that attributes and you can like just delete the beginning of the URL like that and You can check here it has detected automatically the kind of format But if it's not the case you can select the type of format and and save it as well So you can do that for all all of your file. Can I scroll down? Yeah, yeah, I have to it's safe, but I don't have access to the save button. So Yeah Yeah, and there is working as well. Oh, yeah here, you're right So now you have all your files downloaded and renamed So we can That's so the first step with their galaxy when you want to analyze a fast you file is to use Fast fast a tool called fast you groomer. So what this tool does is that is use a standardized Fast you format for a quality score for this fast you format. So you can download to it large variety of different kind of Quality score. So if you have an old Illumina machine or if you have solid reads That's not a problem. You just use fast your groomer and everything will be converted to a Quality score format that is understable by by a galaxy. So for doing that We're going to Yeah, you can search for tools so but for that you have to select the show tool search In the parameters and you can type groomer And we'll be this one So you can run groomer on every of your fast you file. So first you Select two files on which you want to run groomer And then you just click on execute and you do that for all the first thank you fast you find that you'll have Sanger, yeah But it's I think the last Illumina sequence emotion uses Sanger's corner Yeah for each fight Yeah, each fast you fire So you have normal or read one normal or all read two Two more or read one and two more or read two groomer use standardized Quality score formats It's it's so if you have let's say if you want to use so data or Illumina data You don't you don't need like different Different version of the same tool everything would be encoded in the what you call Sanger format And then all the tools can be run on the same On the lab, I think I provided a link on the galaxy website that explain a bit more about this tool This all Illumina data, but the Sanger is just it's just a way of encoding a quality score and It's a way that Galaxies choosing for analyzing for using all the tool. So you want all the quality score to be under the same format that justice No, I have to you have to run fast your room of thought time on every on every file so you select it and you select the file you want to run the Yeah, okay, and so to be sure that you have done the right thing And my advice will be to rename The files that you have created so here you can see that it's a fast queue groomer on data one So that I won is normal or read one that fast queue and you can check that If you click on the little information Icon here you can see that what you have done exactly so you can see that you have run as groomer on this particular file You can see exactly which type of parameter you have chosen for running the analysis And so you can what you can do is that you can rename Rename the file to follow what you what you have done and call it for example, I know groomer and Then the name of the of the previous file And you can do that for the full file to be sure that you will be using the right one when you will be doing the comparison of normal and Okay, so once you have finished you will have finished this step you will be able to to Do so the labs that we have done this this morning on on this particular data sets. So yeah So we're just starting by using a top hat So when you are running galaxy top hat So yeah, so think about using the groomer Tool as a groomer fast you another one that's not the first one So the so one that you have just created right now with the what the rights quite is core format and also think about trying to rename Once you're finished a step and you have a result the file Think about renaming it so that you can keep track of what you have done all along Oh Yeah, so you can just erase what is written in it and and it will yeah And now you can like for example search for top hat And you will have Still there now you can select from 13 different But yeah, the galaxy is quite smart for that because once once you will try to to load for example top hat It will for example if you are looking for a fast you fight It will only show you the fast you fire and here you only see the groomer fire So you won't be able to launch it on the other so it shows that you need to do that as a first step for using top hat No, so one thing is that you have to create several Histories like for every analysis you are doing so and you can Download the histories as well. So we can save what you have done And then import Any and import it in another instance as well once you have done all the time and saves them You can import it in a new instance if your instance is I am a broker now I know you can see Oh Oh Oh Oh So in this case Oh Oh Oh Oh I Show me you don't need to have an index you just provide the first year of fun, and then you Okay, I didn't realize those things Yeah Because first we need to align the wits to the genome and then we will use the alignment for with Cuff links to reconstruct the transcripts Yeah, but you have to run top hat twice one on two no more than one other on the two more Here it shows use one from the history So yet also for top hats Several things everybody please pay attention important stuff here So yeah, so for top hats several important things so if you want to reproduce what has been done this morning so Here you have you will only select chromosome 22 another entire reference genome because we have selected the reason we map into the chromosome 22 so you select Use one from the history and then here you will automatically see the chromosome 22 the faster that you have done all that You can you can console every step that are not okay like with a cross Then you can you can select per end as well because you want to run it as a in a per end mode And so you can select the normal for example read one to normal read two and then what is their mean in a distance between Make both but was it remember that number? Yeah, I Wasn't even here From last week so you can you can execute that one and and do exactly the same for the tumor Yes, because it's it's a way of checking the If the reason are respecting the distribution of the inside size, so yeah It's sort of limits it makes it faster to look for the Space to look for the pair then So if there's nothing that matches there, then I'll go look somewhere else So At the word I Yeah, so you can do all the Cuts cut columns I'm reaching in there in the work for that's why my Once you get to the What do you do? Do you basically stuff that's not in the company? I just think that there's an entity in the bar square where there's some of the products that are in the company. You have to be able to work long to do something that's great to do. Access must be completely different. As soon as the tool is in the company, you can start selling your products. Then you can get rid of all the products. So, one thing as well, so if the top hat step is not finished, you can run cufflinks still because it really knows that the step is not finished yet and it will wait until the step is finished to run cufflinks so you can run several steps at a time to just be waiting until the top hat is finished and then it will run cufflinks. So, yeah, except here we will use reference annotation as we have done in the last morning. Not for this case, but you could if you were doing that by your own. If you want to repeat exactly what we have done this morning, you can change it. Yeah, so while these are, you can do both the part where the tool is not finished yet. So, while they are queued out, we can do the two accepted hits and the main things where we need to get the UCSC gene reference. So, that's more of the positive way of doing that. Yeah, yeah, yeah. Okay, and then there's nothing else like Max and Tron like that stuff. You could do that for your own. Yeah, yeah, yeah, but for this one, just leave it as it is. Oops. Oh, this morning I think we chose a parameter for the end width. We did. Yeah. Okay, so we should do it now. I didn't do it, but I don't think that's... Yeah, okay. Leave it as it is. Leave it as it is. How do you do it? Okay. Oh, you can... Yeah, so here, for example, if you select this one, I've renamed it, but here you can see that it's a BAM. It's a BAM format if you click on the link. Oh, you don't have it because I renamed it, but it's accepted. It's accepted, it's BAM. Yes. You think I should go? Yeah, maybe, yeah. I mean, the thing is if they start the workflow, it's going to take, they won't be able to finish it. So, but it'd be good for them to see, I think, see how it feels. So, has everyone finished Top Hat yet? Is someone having some issues with Top Hat? Is anybody listening? Who's not finished Top Hat? Of those who are on Use Galaxy. Yes. Okay. If you explain the steps. The pipeline. Yeah. Okay. So, so once you have run Top Hat, you will obtain four different files. You'll see one that is called acceptedheats.bam. So, this is a BAM file of just a mapping of the width of the genome. You have the insertion, the deletion, and the splice junction file, which makes four files for the tumor and four files for the normal. So, once you have that, you can run cufflinks. And so cufflinks, the only thing we have modified is that instead of having no for use reference annotation, I've selected use reference annotation. And then it should be like UCSC genes.gtf, which is a reference annotation of UCSC. And you can do that on the tumor and on the normal. Yeah. Okay. That's fine. That's why you have to check as well the format of the files that you've denoted that have been correctly guessed by Galaxy, because it's a way it's knowing which file format is corresponded to. So, if I didn't have that problem loaded in already, would I have a chance to load it in now? Yeah, you could also create it using UCSC tables. So, like this one. Oh, this one. And you can create the file right here. Okay. Do you have a waffle, or actually that one? I don't know. Do you have a waffle, or actually that one? I don't know. That's fine. Okay. Okay. Yeah. Okay. Do you have a waffle, or actually that one? So what you are going to do is that we are going to open the workflows that I have created with also different steps that you've been doing so far painfully. So all the steps that you have been doing this morning as well, so I've created a workflow with all of these steps so you will be able to reproduce all of these steps like that. So I'm going to show you how to open it in Galaxy. So you go to the top panel here and you can see you have different possibilities and so one that interests us is the workflow so you click on workflow so yeah so now you can download the workflow from the wiki, so you have to save it to your computer it's a small file save it to your computer then we're going to upload it to Galaxy, well I'm sure you can upload it directly from the URL but anyway we don't have to save it like Galaxy. So everyone has found the file, so now you can upload it so it's a button on the top right side, upload or import workflow and you click on choose file so you select the file so the extension of the file is GA and then you can click on import and now you should see that so you won't see that and if you click on the module name you will have different kind of options so you can see that you can download, share, you can rename a lot of different things but what will interest us here is click on edit so if you look at the bottom you will see a little window here that shows you the real size of the workflow and you can grab the square and look at it and if you look at it like basically the beginning is really so what the steps that we have done right now so importing the file using fast queue rumor then using top hat here I've added as well a quality control on the reads using fast queue and then you have their cuff links so you can you can kind of look at it like that and you can access the parameters when you select so you click on a box and then you can see on the side all the parameters that have been selected for this workflow so you can see that we have selected the distance so that you will have to change it if you want to run it on another data set and you have the same for cuff links so yeah this workflow is quite large because so what we have done is that we have stopped so by top hats and we have cuff links then we've run cuff diff and then we are doing some like cut and select paste and different kind of units command that you are also that you can also do in galaxy so that's what I've added here so you can cut a different kind of columns like that and and you can reproduce exactly the same thing that we have done when we are past the cuff diff and the cuff links result to look at different kind of data so yeah yeah so it's for this for this I've done an entire workflow for everything tumor and almost have to have two sides if you look here so at the beginning so one side will be so one side will be as the normal should I have written this somewhere yeah so this one is an is a normal top hat for normal soft and you can also add some step to rename the file so that's what I've done here so if you click on the top hat box you'll see the top hat parameters and then at the end you see that edit step actions have also what I've done is that I've I've created another step that does that rename automatically the the files once I've been created so that you don't have galaxy one extra like you would that you had but you rename the file with the files that you're with the finance that you're that you will be sure to recognize once it I've been created yeah sure yeah I think the best way of doing that would have been not what I've done here but actually separate things a workflow in different like doing like for example three different workflow one workflow aligning or maybe I don't know for example cuff links and that you can repeat several time on the different samples then one more flow cuff diff to compare to two results of this previous workflow and then the last workflow like for example for doing post processing so in this case I put everything under the same before you to download it but that's that yes would be a better way because then you will be able to reuse the workflow block on different on different data set this renaming step that's something that you've done separately no I don't think so that's enough but once once you're saving you so once you're exposed like extracting a workflow from a history then you can select the different boxes and add the re is a renaming structure of them I can't they keep the same name but say from one fire you when you're running top hat they should keep the same name and just get a top hat yeah but well I guess it gets it's maybe it's a way of yeah I don't know but that's so maybe you can email them or you can eat that is that would make sense that I would make it too easy right but it would be different name yeah but if it was one thing you were taking and doing something but if it's x y then you're doing it on then it's x y is that okay so I mean to me this looks like a great interface and I almost like I really want to use it so why should not happen oh so yeah there is such a thing there are obviously some issues with using a galaxy so one of them is that so so one of them is that you can't for example if you run it on your cluster on your own machine then you're limited by your own machine so that's one issue if you run it on the clouds at the same so you have you have to ask for a big instance so you'll see today we have selected a large instance so that can this kind of but if for example if you run it on the class on the cluster as we're doing at the OICR the problem is that we can't select the number of processor on which we want to we want to launch the jobs so it's a unique choice that we that we will do but so we can't we can't choose that is a parameter so I think they will work on that but for the moment for the concept to run on one processor versus a hundred processors so to adjust that on a per processor you have to use you have to pre-select that front I want a hundred processors and even though you may need one or two for certain steps you blocks a hundred for you and the other thing is that so you have so now I think it's kind of they kind of have found a way to do that but for tracking the version of the tools that you are using sometimes kind of difficult like you before you couldn't have for example two version of top at for example you wanted to try the new one that is just like that I've been just published so you would have to use the one that is under galaxy except if you know how to to modify galaxy for like importing your own version so yeah so the issue is not tracking the version numbers it's tracking multiple versions of the same tool but I think now they kind of because now is there have so many people they have top at two in the in the main instance of galaxy so yeah sure you can do that yeah we have we have done that here like we have used a lot of toolshed in OSR for importing the tools that we're interested in and yeah you can you can do a wrapper around the tool you're interested in and it's like if it has an input and just it's a simple program it has an input and outputs and we can do like kind of different parameters such as top at and cuff things and you can do wrapper around it and and add it in a galaxy no you you can't under galaxy you can't specify how many notes you want to do right now yeah is it still under development so may change you have to choose a value by default okay well once you have so yeah so you can download the the workflow and look at how how the file looks like yeah but it's not a script it's it's it's only like a way like which tool would be run with which parameters I'm not I'm not sure do you do you know that being how can we know which version of the tool has been run by a some tools gave you the information oh yeah here right here but this is a very like this is not the newest instance so that's uh from March 2011 what about the standard what it gives is that when everything is running okay you don't have that but when is there is a box and the window turned red and you can click on the you have a little like a box like an icon you can click on and then you will display all the information about what went wrong and what was the error file they must be there it's just they're not readily accessible if everything's okay gotta see display the result if there's an error gotta see this play the standard error instead of the result and can anybody comment on differences between galaxy and see where we're using one but sqa is just like command line combo all the command line so you have to be at the like every time you have a tool that you want to add in sqa you have to like build a file like a xml file describing all the actions that your programming is doing but the thing with sqa is that this is kind of following what is happening on the cluster so it's following if the job status if it's if that if your if your the job that you have launched is finished or if it's is everything run like okay and you can yeah maybe you can but it's everything is with uh command lines and you have to to write xml it's helping you building a workflow like for example if you want to use topize and cut things and uh i don't know cut if you can tell sqa first i want to run this tool with these files once have the result i want to launch out here and it's checking that every step is finished and and that the the result file is what was expected so yeah i think about also maintain the jobs the meeting is the cluster job the meeting jobs monitoring the sqa take care of um paths for you something a set of files and those files there's no reason to get moved sqa maintain their own database you you cannot i don't think sqa allows you to delete files right now you don't have permission to delete files generated by sqa you cannot you cannot move them they have their own database meta database to track all your files sqa oh sure so you can do so you have several options so one is uh downloading one more is my oh yeah okay so first option you can download the file so just you just click on the save here so you download the data set and you can download the index as well so the index has been created at the same time or you can use uh the ucsc uh genome browser for visualizing your results or you can use the ensemble genome browser you have a link for starting uh i gb but sometimes it doesn't work but um and in other ways using their own system of visualization which is called trackster so which is right here so here after that you you can look at her you can look at the data and it's kind of working the same way than ucsc genome browser because you can add uh you can add tracks so for example if you are visualizing right now the the bam file and you are you want also to see the junction so the insertions file um you can add tracks and select the ones that you want to visualize at the same time like that so for example you want to see the tumor and the normal file then you can select both and but already load the tumor so i'm not loading it again but and if you want to see for example uh the splice junctions for both you can select them as well so the good thing is that it should be fast because it's um it's using the directly the Pfizer tones are now it's loading so it's another way of looking at your data because you want us to question so yeah a few minutes do the survey on top of the wiki page yeah i won't allow you to leave until you do the survey you have to tell me the last word the last question is the survey you know if you get it right okay can you speak up under galaxy oh okay yeah so please please don't do that but if if you select run but don't do that please because if you maybe ask me like don't like uh run it you can look at it but don't run it uh but once you click on run uh on the on the workflow you can you can see how it looks like so globally it's it's uh it makes you choose the different files that you want to to use and so at every step you can scroll down and you can see and so what i've done is that i put annotation on my on the on the workflow i've created so that every time i i know what i want to have in this uh at this position so here for example as an annotation i've had no more or only two that pass you so now when i'm when i'm looking at the workflow and i want to launch it i know that i want this file at this position but so and you can scroll and and look at the different at the different uh step like that and every time you have a um a choice of um inputs and you can select the input that you want and if you step are just internal steps that don't don't need any other kind of data then it's just you you don't have any things to select so so like for example so here you can see all the different steps and yeah so as i've selected all the all the steps already for all these steps here you don't have anything to to select so this is for the workflow and if for example you want um you want to to use only a part of the workflow you can clone it so and and work on the clone and delete all the steps you are not interested in for example so i don't know for example you don't want to do the fast qc then you just you just delete it yeah like for example so i'm going to maybe to add a step here let's say i want to filter um top hat results before um running cuff links so i want to select on these uh reads that are for which both pairs are on maps so i want to remove also reads for which only one pair is mapped so you can do you can add a step so this tool is called black stat for example so here you have this new little box no that's not black stat i'm doing a mistake that's some tools name of the tool already oh yeah so it's filter some sorry about that so it's filter some and you can use for example here you want to filter the band file so you take the like the band file is here so you take the arrow here and you just move it here into the box you want to to use and same thing on the other side when you go back to cuff links and then you have a new step and you can delete this one but this is more tricky because there are a lot of things here so now you have a new step and you can can select the you can select the parameter and then next time you will be running the workflow we have this new new step added it depends so if you're you can run your own instance of galaxy on your own laptop and then it depends on your computer but you can run galaxy on the main website that you went to but this is quite slow so well it depends on the type of data you're analyzing if it's a really big like a really you have a for example you're analyzing your fastq file with a lot of reads then it might be really too slow and you have also a limit of uploading data to the main galaxy websites so you won't be able to open i think you won't be able to upload to to their website or a file a too big file i don't know which which is the limit maybe one one gigabyte or something like that i think it might be 10 gigs on the main website main galaxy websites they're they used to have no limits but then people started uploading fastq files so they have limits now but i mean it's sort of we talked a bit about that at the beginning of the workshop and that if you have large data sets and you don't have a compute farm in your own institute then you have to consider like amazon or there are other sort of cloud services you search there are other ones that are free but they don't have you may not find galaxy on there so you didn't have to upload yeah it becomes sort of an administrative nightmare and they don't have so we did a little finicky stuff a little bit you know getting galaxy running yourself with that they don't have any of those tools basically user interface to get an instance running on the cloud so it's it it's a bit of um it's around something new so not you know it's it's in a lot of places are developing and making and development there is a um for tools and one of the infuse you know looking at this the same way we look for a galaxy on the you know community amis there's a bio i should mention that there's a bio cloud bio linux which has it's a linux but with all the biofrags tools already preloaded and so that's a useful one to to that has mostly tools mostly updates so far so the same we just search for bio linux and you'll find that one um so but that's on amazon and i think it might be so if you search for the bio linux website they may tell you where else you can get their their instances so that they make it available for you so it's a minute at this point at the high computing end it's either you build it and you have it in house or you go and buy it somewhere either way it costs so i think the the the sort of the the concept that all of this is free is not true it's a it costs something somebody somewhere there are salaries involved in maintaining these resources there are salaries involved in and so i mean for amazon they're one of their biggest costs is cooling so electricity for cooling and so uh so the when you cost out how much it costs to you know do a transcriptome or do a whole genome or you know whatever it's sort of not that the similar the markup in the prices that amazon charges is not very different than what it would cost you if you had for one year enough jobs to fill up a whole computer cluster and the cost of the cluster is the cost of the employees they're not very different except that most people don't have jobs for a whole year for a whole cluster all into themselves and then it becomes a shared resource and then it becomes a sort of measure of of the sort of the various costs that are involved so universities maybe they have a shared infrastructure um compute canada has a lot of various you know shared clusters across canada as an interior would be seen in the maritime areas and so forth so there are lots of compute infrastructure the nice thing about this one is that it's there's a lot of people the whole galaxy staff is like about 15 20 people a fraction of which is making sure that the cloud you know the gassy version works in the cloud and so the software is free but the time you know so basically they get money to get grants but then you have to spend amazon dollars so it's a and i don't know how much you know how much would it cost to do a full transcriptome there'd be thousands of dollars it wouldn't be tens of thousands that's the order i think you're but this week these two days for the whole class we've probably spent about four or five thousand but we did partner chromosome 22 and we tried and some mistakes and we tried again and so forth but this is just giving you an order of magnitude so we spent so the one week workshop that we did a couple of weeks ago for cancer genomics where we use a cloud for maybe two of the four or five days so we used heavily used it for a couple days and we spent four thousand dollars in that one so i'm here i think we actually use it a bit more so i'm guessing we'd probably use four or five thousand and a grant is for