 Hello everyone, let's continue with the remaining tutorial of Arabidopsis whole transcriptome analysis You can always quickly access the trading material from here first scroll down and click on the transcriptomics topic and then Go to the bottom of the page and you can find our hands on Here So far you have analyzed the micro RNA sequencing data as Described here in this upper part of the figure So you first started with the micro RNA reads and then did some quality control using fast QC and multi QC and Then trim the low quality reads and creep for the adapters using trim galore And then you quantified the micro RNAs using Miradip tool tool and then you continue with desec 2 for differential gene expression analysis and Then you selected the up-regulated micro RNAs out of them And then now the this the second part of the tutorial deals with the mRNA data analysis So you start with the mRNA reads here and then do similar quality assessment as before and then this time we use Salmon for quantification and Use desec 2 for differential gene expression analysis now we will extract the downregulated mRNAs and finally we will use Micro RNA target prediction program called target finder to find put it to targets of Upregulated micro RNAs in downregulated mRNAs That's the overview of the workflow now. Let's jump into the hands on If you haven't already, please import these four fast Q files that correspond to the mRNA sequencing data into your history You can either follow these instructions or you can watch the previous tutorial If you're using the history from the previous tutorial, you should already have these fast Q files in your history I'm continuing my analysis in the history from the previous tutorial So I should have them in my history already. There they are First things first, let's add some tags to our data sets Similar to what we have done with the micro RNA data set collections For this collection we give tags control and mRNA For treated samples we give tags PR and mRNA Now let's go into the training material and directly jump into the mRNA data analysis First we do some quality control using first QC tool So just click on this and Then we now select the collections from control mRNA data sets first and And then click on execute while it is running we can Rename the outputs, so just copy this text from here and rename the first QC raw data set collection. We go back again and copy the Text for web page and rename the collection of the web page Now first QC jobs are complete now Repeat the same steps for the PR treated samples now select the first Q file collection on PR treated mRNA and then Click on execute now go back and then we copy the text to rename the files again So but this time we should actually change the control to PR treated now we have to Copy this and then Changed other collection name To the web page Now we are done with both the first QC jobs And now we merge the collections of the first QC outputs we select the Raw data collection from the first round and raw data collection from the second round and then click on execute Now we can use multi QC to summarize this fast QC results in one single report So select fast QC Here and now choose the data collections and then click the click on the merged collection We should also rename the title of the report to mRNA quality check so copy this and then press here Click on the execute the multi QC reports are ready now. Let's visualize them Here we see some general statistics Okay, so if it's not loading just click again. Yeah, here we see some general statistics on JC content millions how many millions of reads duplicates and If you scroll down, we see the number of reads the absolute number of reads and Sequence quality histograms looks fine And the Persequence quality scores also look fine And the GC content also fine And they are also not many duplicates But it seems there are some adapters. So it's Illumina Universal adapter present in the reads So they're present in all the samples but Not too many So it's less than 2% of the sequences contain the adapters and everything else is fine. Let's Go back Try to answer these questions while you're doing your hands on if you cannot you can always use a solution from here Although there are adapters in all samples. We are not going to clip them the main reason for this is Usage of the Solomon to quantification Solomon does simultaneous mapping and quantification That's not to any base-to-base alignment, but by mapping and quantification to infuse Which reads are originated from which transcripts? As the KMS that corresponding to the adapters in the reads are not part of the Reference transcript home. They are not counted Hence, it's not mandatory to Clip the adapters when using some let's now perform mRNA quantification by Solomon one tool choose the reference transcriptor from the history And then choose our transcript on first the file leave the cameras as they are And now we select our fast queue file starts a collection So we start with control mRNA collection And then we See yes to validate mappings Then then we have to select the File contains mapping transcript genes that is our gdf file and then That's it I guess we have all the options set now We go back and click execute while Solomon is running. Let's rename the outputs first copy this and Rename the corresponding Collection then go back and rename the other collection to Add gene We are done Now we have to repeat all the steps again For the we are treated samples. So click on someone contact in Select the transcript on first the file Now this time we select Collection that belong to the BR treated mRNA samples Set the valid mappings to yes Select the gdf file Then click on the execute button. We have to rename the files again. So copy the names again paste them, but this time we have to Replace control with BR treated and we also have to do the same for the gene quantification files, so replace control with BR treated and Add gene So the song run will take Some time to finish Solomon produces two output files first one is quantification and the second one is gene quantification file Quantification file summarizes the quantification at transcript level and the gene quantification file Summarize the quantification at gene level if you look at one of the files It Contains these five columns So the first one is the name name of the transcript or name of the gene the second one is the length or sequence length of the transcript and The third one is the effective length that is calculated by Solomon So it is kind of correct a length which takes sequence specific bios gc bios and fragment length distribution into account And then Solomon also Calculates Anomalyzed Expression levels in TPMs, which is transfers transcripts for million TPM calculation is based on the effective lens and I'll solve one also Outputs the absolute number of reads but transcript in the last column now, let's go back to the Tutorial So far we have done with the Quantification of the mRNAs now we are Ready to perform differential Gene expression analysis using these two So first we need to give a factor name so as we want to know the effects of the Resinolide We Give it that name and the factor and the factor level is What we are interested in first? So the present right is the factor level the first factor level and then we select the samples that are treated So go to the collections and select The gene quantification files Right control for the other factor level and now we select the control Collection then we Tell the second to that the values Coming from Salmon, and we also give Annotation file so that maps Transcripts to genes and then we click on execute Our DC2 is running. Let's rename the outputs So click on the edit attributes and change the name to desect plots mRNA Save it and then change the results file to De-sector results mRNA Save it Let's inspect the de-sector plots The first plot you see here is PCA plot As you can see the samples from the same experimental conditions are Clustered together signifying High similarity Between the biological replicates you can see the same trend in the heat maps too As you can see in red even with this sub-sample data set we found some differentially expressed mRNAs Let's Go back to the Tutorial So all these plots all the results of de-sector are actually generated from sub-sample data. So Before continuing further with the filtering we get the results from the complete data sets. So copy this link and then Upload the data So click on the paste fetch and then click start We should rename this file to mRNA de-sector results table save it and then First we need to add all the tags that are related to the mRNA data analysis. So we add First our mRNA Br and control so mRNA Br and control should have been renamed already. Okay, let's save it again. Let's go back to the tutorial So now what we do is we filter for Differentially expressed genes up regularly to microRNAs and downregulated. Oh, sorry. I'm ready to my mRNAs and downregulated mRNAs so Select the significantly differentially expressed mRNAs You put C7 less than 0.05 and these these contains Significantly differentially expressed mRNAs Now we run filter job again to select upregulated. So now we have to select the this filtered list of genes. So Then C3 Column that contains the four changes with greater than one that is Absolute two-fold change and then minus one for downregulated. So now we still select the List of differential express genes now rename them to Differentially expressed mRNAs first And this one has Upregulated mRNAs and this one has downregulated mRNAs Save it. Now you can answer try to answer these questions by yourself So first we can look at how many genes are differentially expressed. So this is like 4176 genes That are significantly differentially expressed and there are 778 upregulated and 328 downregulated genes. To answer this third question. What is the most significantly? Upregulated gene and what is its function? You have to copy this gene ID You can search for it. You probably find a result that is in their database So it encodes for P450 enzyme that catalyzed the production reaction production of Resinolide. So it seems this is a very relevant gene for our analysis And it's no surprise that this is differentially the most differentially expressed mRNAs. Let's go back to Tutorials now we have all downregulated microRNAs and Upregulated mRNAs Now it's time to find targets of the microRNAs that are in the downregulated mRNAs So to do this we first need to extract the sequences of the microRNAs and mRNAs We have to first extract the IDs of the Genes and then extract the sequences. So we start with the microRNAs first use the cut tool to cut out the first column which contains the gene ID from the upregulated microRNAs first So we cut the first column which is C1 and then Click on execute this as upregulated microRNA IDs The job has started and finished so we can rename it Save it If we inspect the file they are on total 16 Mature microRNA is a microRNA star sequences IDs So now we have to filter for sequences using filter faster tool So once we select the mature microRNA sequences first the file and second time we select the star microRNA sequences So we start with the mature microRNA sequences first the file So this file contains All the mature microRNAs and the first format you have the ID and then in the next line we have the sequence So select the mature microRNA first the file and then for filtering we Provide the list of IDs and we say we have to select upregulated microRNAs and Then match IDs by default, which is simply a greater than symbol followed by ID then we click on Execute so there are nine sequences out of 16 which are mature microRNAs And the remaining seven are the micro are from microRNA star sequences so we Repeat the filtering step again But this time we select first the file from star sequences So select star microRNA sequences first the file Keep the list of IDs as upregulated MicroRNAs and then we can execute this file should contain Seven sequences Once we got both the files we concatenate them Using cat to see here seven sequences that belong to MicroRNA star sequences and then nine mature microRNA sequences So we concatenate them So we first select the results of first filtering and then select the result of second filtering And then click execute now we rename the output to Upregulated microRNA sequences here say Now let's inspect the final ones so we have Total 16 MicroRNA sequences and now we go back and continue to Obtaining the downregulated MRNs. So now we do a similar step as before to extract the IDs so this step he rerun this step We select the first column, but this time we select downregulated MRNs So now we are selecting the first column that is the IDs of the genes from the differentially expressed and downregulated MRNs First rename it to downregulated mRNA IDs Here if you inspect it. So we have like 328 IDs So 328 upregulated MRNs total and This time it's not so straightforward. So we have all the sequences in the transcriptome first file Transcript term first the file header is a little bit different than the microRNA first the files we've seen before but this has a bit Bit complicated header. So it has a transcript ID first and then we have Gene ID What we have to do now is we have to select this gene ID from here But it's not at the beginning of the line. So we cannot directly use it The default option to extract the sequences we do trick to extract the sequences here You again use filter faster tool for extracting the Sequences out of transcriptome first the file. So but this time we have to write a kind of tricky regular expression To extract them. We'll see how to do it. So first We select the transcriptome first the file and then a list of IDs And then the IDs are downregulated mRNA IDs And then now we as we can see the ID is not at the beginning of the Faster header, but somewhere in the middle. So we to match it properly We have to write so-called regular expression So we have to match This which is in the middle of the hell So if you look at the gene ID start with the Gene is equal to and there's ID of the chain And the ID always contain ET at the beginning and then followed by seven Alphanumeric characters So this is what we are going to use Regular expression is gene is equal to a ET dot Seven so we copy this so what we are doing here is we Want to select Everything that is in this brackets. So Now we are matching everything that starts with a T and then has Anything the dot corresponds to anything any character seven times 80 followed by seven characters This is why we are going to match so Gene is equal to 80 followed by any seven characters. Now we rename this status to downregulated mRNA sequences now we have already Upregulated to microRNA sequences and downregulated mRNA sequences so we can Continue with micro on a target prediction with target finder so here we select upregulated micro RNA sequences and here the downregulated mRNA target sequences we So in this case we are a bit lenient and then we say predictions go off to set to 5.0 then we select the tab delimited format which is More condensed format than the default output format. We'll see what we get It's ready now if you visualize the file that target finder found targets for some micro RNAs but not all so you can see there are No results found for some of the micro RNAs. You can also see some micro RNAs have multiple targets For example, this first one has multiple targets So the first column is a micro RNA ID second column is this second column is the target ID This is the position on the target where the micro RNA is aligned to and That's the strand And the next column is the score So as we put the scorecraft of 5 everything below 5 will be there And then we have the sequences of the micro RNA and the mRNA And the middle is the actual alignment between the micro RNA and the target mRNA sequence So you can click on this each of each of these links to know more about these target genes We propose the following Hypothesis from our results. So if you inhibit any of these genes the Plants can have higher resistance to draft condition as a validation experiment You can acquire the mutant seeds and wild type seeds and then grow them on top to control conditions one With watered and the second one with drought stress And then you analyze the plant weight after 33 days and you'll see if the mutant plants have improved resistance to drought condition So that's all For this tutorial and if you want to analyze more data We provide an optional exercise So if you are done clean with this analysis, you can analyze this later Do it now or on the last day of the workshop where you have more time That's all for now. Please give us your feedback and Thank you for your participation