 Welcome! In this tutorial, we'll try to answer two main questions. Firstly, which microRNAs are regulated in response to bracenosteroids? And secondly, which gene are potential targets of bracenosteroid-induced microRNAs? Bracenosteroids are a group of plant hormone essential for growth and development, as well as for coping with biotic and biotic stress. Structurally, are polydroxylated sterol derivates with great similarity to animal hormones. Multiple studies suggest the great technological potential of this growth of compounds, not only for the ability to mitigate the negative effects of biotic and biotic stress, but also for the potential on the improvement of pran growth and crop yields. MacroRNA precursors are synthesized in the nucleus by the RNA polymerase 2. Unlike inonimals, their preprocessing also takes place inside this organelle. The transcripts, after being processed by different enzymes, are methylated and incorporated into the risk complex. Finally, this complex is exported to the cytoplasm, where the RNA-induced gene-silencing process takes place. Forming factors used to define why the microRNA are considered master regulators. First, a large number of microRNAs are regulated in response to environmental conditions. Second, computational prediction estimates that each microRNA regulates hundreds of genes. Third, most plant microRNAs regulate transcription factors. And finally, the targets include not only coding genes, but also non-coding RNAs. In this tutorial, inspired by a paper by Parker Hall, we'll analyze the interplay between Brazilian asteroids and the microRNA-mediated silencing pathway, considering one of the most versatile regulatory mechanisms in plants. The training is structured in two parts. In the first part, we'll analyze the expression of microRNAs, starting from reads obtained in two different conditions, control and Brazilian asteroid treatment. In the next step, we'll perform the quality assessment by using different tools. Then, we'll quantify the transcript by using Miradip2. And finally, we'll perform the differential expression analysis using the Sector2. In the second part, we'll analyze the expression of total mRNAs, from reads obtained under similar experimental conditions. Similarly, we'll first analyze the quality of the sequence. Then, we'll quantify them by using SalmonQuant. And after that, we'll perform a differential expression analysis by using the Sector2. Finally, we'll use the target finder tool to identify potential targets for microRNAs induced by Brazilian asteroids. In this training, we'll use three groups of datasets. MicroRNA reads, mRNA reads, and different datasets. The microRNA datasets consist of six fast-q files, obtained by using the Illumina GAX2 sequencing platform. The plant samples were obtained from wild type WS2 seedlings, treated with moch or 1-micromolar epiviracinolite for 90 minutes before harvest. On the other hand, the mRNA datasets consist of four fast-q files, generated from the Illumina GAX2 sequencing system. The plant samples were obtained from wild type Columbia seedlings. Finally, we'll use two additional datasets, a reference Arabidopsis transcriptome, and the PRM database, which includes more than 20,000 annotated microRNA sequences. First, we should download the reads from Zenodo and organize them into collections. Then, we are going to copy the data table. Next, we should select upload data, then click on the rule basic tab, select upload data as collection, paste the data table, and click on build. Next, we are going to specify the type of data for each column. To do this, click on rules, add modify column definition, and add the definition for each column. The column A corresponds to the identifiers. The column B corresponds to the names of the collections. The third column corresponds to the address of the file hosted in Zenodo, and the column D are the data types. Now, we should select apply and upload. In this way, we'll obtain our reads, organize it according to their data type, and the experimental conditions. With this, we have imported all the RNA seed reads necessary for the analysis. Next, we are going to add the tags, which will allow us to easily identify the samples through the process. Let's click on the tag icon, and we are going to use VR tag and mRNA tag. Then, select the control mRNA sample, and add the tags. Now, we should do the same with the Brassianosteroid treated microRNA data set. And finally, we are going to add the tags to the control microRNA data set. Perfect. Now, we can assign a name to the history. For example, Abelabidopsis transcriptome analysis. Next, we are going to import the rest of the data sets. In order to achieve that, we are going to copy the table again, select upload data. Now, we should replace the previous table with the one we just copied, select upload data as data set, and click build. Again, we must identify each column. Ok, the first column corresponds to the name, the second column to the address in cenotl. Finally, we should click on apply and upload. Ok, let's continue. As indicated above, for this training, we are going to use a subset of the original samples, in order to reduce the time needed to perform the analysis. To be more precise, we are going to use 1% of the total reads, randomly selected. Once we have obtained the necessary data sets, we can start with the analysis. First, we are going to carry out the quality assessment of the microRNA sequence. This is a very important step, since the sequencing process is a sort of error, which, if not identified, can lead to inconsistent results. It's also important to evaluate the presence of adapters, which can also interfere with the subsequent analysis. We'll start with FATQC. Click on the tool, then select the collection corresponding to the control microRNAs, and click on run. Now, we are going to repeat the same procedure with the brassinosteroid treated samples. In the meantime, let's rename the data sets. Although it's not absolutely essential, it can help us to avoid confusions. Now, we are going to rename the data sets of the brassinosteroid treated samples. The next step is to combine the data set generated by FATQC for each of the treatments. This is necessary in order to be able to visualize the whole set of samples using MiltQC. We should select the corresponding raw data set generated by FATQC for each of the treatments, and click execute. Finally, we are going to use MiltQC to generate the quality reports. We should click on the MiltQC button, select the tool with which we have generated the data, in this case FATQC, and select the collection that we have just generated. We can assign a name to our report. For example, microRNA and processed QC, and click on execute. Next, we are going to visualize the report generated by MiltQC. You should click on the next to the white page generated by this tool. First, we are going to display the statistics for the percentage of duplicated reads. As we can see, around 70% of the read code are pumped to duplication. The following plot shows the average thread score of our read in different position. All of them show acceptable values about FATQC. This other plot shows the relationship between the number of reads and the average quality. In the case of the GC content, we can see that three of the data sets have value far from the value in considered acceptable, which could interfere in the subsequent analysis. The duplication percentage are relative high. Finally, in the case of the garter content, we can observe that the values are about the acceptable limits. All samples include illuminous small RNA adapters in more than 40% of the reads. With this information, we can now answer the first question. Is it necessary to trim the reads? To remove the adapter sequence, we'll use the Tringolor tool, a cut-adapter wrapper. So, click on the tool, select the collection corresponding to the micro-RNA control samples, select Illuminous small RNA adapters and finally execute. Then repeat the procedure with the grassy nosteroid treated samples. In the meantime, we'll rename the Tringolor generated data sets. Then first, let's go to rename the control samples. Click here and we should paste and now we are going to rename the treated samples. Just replace control by VIA, perfect. Once the samples have been processed by Tringolor, we'll re-evaluate the quality of the samples in order to confirm that the quality parameters have improved after processing them. To do this, we are going to repeat the same procedure as before. Click on Fast QC, select Processed control samples and execute. Now repeat the procedure with the grassy nosteroid treated samples. In the meantime, we'll assign appropriate name to the data sets. As not before, it's not absolutely necessary to rename the samples, but it might avoid future confessions. Next, we are going to use the Merge Collection tool again to combine the results generated by Fast QC. Remember to select the raw data files. Finally, we are going to use MiltQC to generate a quality report of our sample from the statistics obtained by Fast QC. Select the collection we have generated and once again, it's advisable to give the report a title, so it can be properly identified. For example, process it QC and click execute. Our thought evaluation of the quality of the samples is a tedious process. It's an indispensable step to prevent the inclusion of the FSD reads which could interfere in the subsequent analysis. Next, we are going to check the quality parameters of our samples before and after processing them with Tringolor, using the Scrapboard feature. Let's compare the quality parameters of the reads. As we can see, the percentage of duplicated reads have been reduced after processing them. One factor that could explain this is the existence of highly conservative motif. In the case of the quality of the sequence, in both cases the patterns are similar. One of the parameters that show clear differences after processing is the GC content. As you can see, it has improved in most of the datasets. In the case of the read lengths, we can also see clear differences. The original samples have a homogeneous length of 35 base pairs, whereas after processing them with Tringolor, as a result of the removal of the adapter sequence, reads of different lengths appear. Finally, we can see that in the processed samples, adapter contamination is insignificant, with values below 0.1%. We have previously discussed some of the factors that might determine the high number of duplicated sequence. In addition to the possible existence of highly conservative motif, this could also be explained by different causes, such as the existence of overrepresented sequence or contamination, which foreign sequence. Once we have accessed the quality of our reads, the next step is to quantify them. For this, we list two different tools, mid-deep to mapper and mid-deep to quantifier. With mid-deep to mapper, we are going to group the reads with identical sequence. The quantification will be carried out by mid-width to quantifier. These take place in two steps. Firstly, the predefined match or microRNA sequence are mapped to the predefined precursor. And secondly, the reads are mapped to the precursors. Click on mid-deep to mapper, then select the control-processed reads, enable the selection of reads with no standard nucleotides, and in the collapse reads or map option, select collapse, and then execute. Then, we will repeat the same procedure with the samples treated with plasma steroids. Select, once again, mid-deep to mapper, now select the brassinosteroid microRNA treated dataset, enable the remove read with no standard nucleotide option, select collapse, and execute. While the process finished, we are going to runate the generated files. Now, we are going to carry out the quantification of microRNAs using mid-deep to quantifier. These two require, in addition to the collapse reads obtained in the previous step, three additional fastafies, the precursor sequence, the major microRNA sequence, and the star sequence. Then, let's select first the collection of the control-microRNAs, select the major microRNA sequence, now select the star microRNAsic fastafile, we can unset this option, and execute. Now, we are going to repeat the procedure with the collapse reads corresponding to the brassinosteroid treated samples. Let's runate the files. Now, let's take a look at the collapse reads generated by mid-deep to mapper. As we can see, to each sequence, this tool assigns an identifier, followed by a value indicating the number of time it appears in our reads. Before moving to the differential expression analysis, we need to make some modification in a result generated by mid-deep to quantifier. In particular, we need to extract the first two columns. But first, let's examine the content of the file generated by mid-deep to quantifier. The first column can respond to the identifiers of the major microRNAs. The second column contains the number of times each sequence appears. The remaining columns contain additional information, including the identifiers of the precursor sequence and normalized quantification values. In our case, we'll only use the data included in the first two columns. So, we are going to select the cat column from a table tool, select the raw data file corresponding to the counter-sample, and execute. Once again, we should repeat the procedure with the bracenosteroid treated samples. While the process is running, we'll rename the generated datasets. Now, we can carry out the differential expression analysis by using Desk2. Desk2 is a popular tool for general-level differential expression analysis. It uses the negative inability distribution, employing a slightly more stringing approach compared to other methods, yet having a good balance between sensitivity and specificity, reducing both false positives and false negatives. Then, select Desk2. Now, we should specify a factor name, for example, fxbracenosteroids. Now, specify the first factor name, for example, bracenosteroids. Now, we should select the bracenosteroid treated microRNA counts, specify the second factor name, CTRL, select the CTRL microRNA counts. Now, we should disable the files have a header option. The rest of the options are fine. It's not necessary to modify them and execute. Perfect. It's running. We are going to rename the generated datasets. Let's rename the result dataset and now the blood file. Let's pack. As we have seen, Desk2 generates two outputs, a table with the normalized counts and a graphical summary of the results. First, we are going to inspect the quantification table. As we can see, it includes seven columns, the gene IDs, the mean expression, low radio, standard error, world stats, p-value and p-adjust value. In our case, we are going to use the p-adjusted value and the low radio. Next, we'll inspect the graphical summary. The principal component analysis plot provides useful information to evaluate the similarity of our samples. The distribution of the samples suggests that there's not so much difference between both treatments. The sample-to-sample distance gives us similar information over the similarities and the similarities between our samples. Finally, the MA blood shows the log-to-fold chain as signable to a given variable over the mean of normalized counts. Next, we'll proceed to extract those genes whose differential expression is statistically significant by selecting those genes whose adjusted p-value is less than 0.05. We'll use the p-adjusted value because it allows us to take in account the false discovery rate. Applying the false discovery rate becomes necessary when we are measuring thousands of variables. We are going to use the filtered data on any column tool. Now we should select the file generated by the sector. We should select the condition column 7 less than 0.05 and execute. Now we can answer the proposed question. How many micro-urbanage shows significant differential expression? Let's have a look at the result. Unfortunately, we haven't identified any gene. The reason is that the su-sample dataset doesn't have sufficient read depth. In order to get a significant result we are going to import the DeskToAnalysis result generated from the whole dataset. Then, let's copy the cenotl link, click in Upload data, click in PasteFageData, paste the URL and start. Now we are going to rename the dataset and add all the microRNA-related tags. Let's copy the proposed name Now we can paste it in this field and save. As we can see, we have 442 genes. Now we are going to add the tags microRNA, resin asteroid and control. Like this. And add the tags. MicroRNA, control, resin asteroid. Ok. Now let's repeat the filtering process with the new dataset. In that case, we'll start filtering those genes with significant expression. Then we'll get the abbreviated genes whose low rate is higher than zero. As we explained it before in the first step, we'll use the PiJ asset value lower than 0.05 and then we filter those whose low rate is higher than zero and then we'll sort the result. Ok. Then set the condition column 7 lower than 0.05 and execute. Now we are going to get the abbreviated genes set the condition column 3 higher than 0 execute and finally we are going to sort the result. We should set the column 3 descending order and execute. Ok. Now we are going to rename the new datasets. Perfect. First of all, let's check if in this case we have got any gene whose differential expression is significant. Now we can respond to questions. In this case, we have got 39 differentially expressed genes. Now let's answer the second question. How many microRNAs show statistical significance of regulation? And what's the most abbreviated microRNA? In that case, we have identified 16 abbreviated genes. The most abbreviated is MIRI 150CG. Here we can see the low radius and the B adjusted value. This concludes the first part of the analysis. To recapitulate, we started from microRNA raw rates on which we performed a quality assignment. Then we quantified them and finally we performed a differential expression analysis in order to identify the mRNAs. In the second part, we start with raw mRNA rates, then we'll perform a quality assignment, we'll quantify them with salmon and we'll use the SAC2 in order to identify the unregulated mRNAs. Then let's start with the second part of the analysis. The procedure to carry out the quality assignment of the mRNAs is similar to the one followed in the first part. We start with the FASTQC tool, select CTRL-M RNA collection and execute. Then repeat the procedure with the bracenosteroid treated samples. Next, we are going to rename the datasets generated by FASTQC. Now let's merge the files, select the bracenosteroid and the CTRL-RowData files and execute. Finally, we can use mutQC in order to generate the quality report. Select FASTQC, select the merget dataset, now provide a title to the report, for example, mRNA QC report and execute. We just need to wait until the report is generated. It's done. Plotting the quality metrics of the different samples and comparing them using mutQC can help us to detect potential problems. This is essential, because if we start with poor quality data, the obtained result won't have any relevance around a biological point of view. As we can see, after using a tool, Galaxy suggests us potential useful tools for the subsequent analysis step. Those recommendations are based on predictions obtained by machine learning from other users' data. Let's have a look at the report generated by mutQC. In this case, as we can see, the duplicated reads are below 5% of the total, a very acceptable value. The quality of the samples are also above 30, which indicates that the reads are of poor quality. The IGC content is perfect. There's no overrepresented reads at all. Finally, we can see that the adapter content is also very low in all the samples, being found in less than 2% of the reads in all cases. With this information we can answer the question, are there any data that indicates the need to process the samples? Although some of the samples have adapters, we'll move directly to the quantification. In the following section we explain the reason. To carry out the quantification, we'll use the Salmon Quantool. One of the characteristics of Salmon is that it doesn't require performing base-to-base alignment. Salmon relies on the quasi-mapping concept, a new mapping technique that allows the rapid and accurate mapping of RNA's secrets. After determining the best mapping for each read, Salmon generates the final transcript abundance estimation after modeling sample-specific parameters and bias. If you are interested in additional details about the Salmon algorithm, I recommend you to have a look at this information block. Then, let's start with the quantification. We should click the Salmon Quantool. Now we should select the transcriptome FASTA file. Next, we should select the control mRNA collection. Enable the Validate mappings. Now we should select the annotation file and execute. This section explains why it's not necessary to process the reads when using Salmon Quantool. This is because if the cameras of the reads are not in the hash table, they are not taken in account. Now, let's rename the datasets generated by Salmon. We are going to repeat the previous step. Selecting in this case the collection corresponding to the Brass and Asteroid treated samples. Now, we will assign the appropriate names to the generated dataset. Salmon generates two outputs for each sample. The quantification per transcript and a gene quantification dataset. Each output consists of a regular dataset with five columns. The name, target transcript length. The computed effective length. The relative abundance of the transcript in units of transcript per million and the estimation of the number of reads mapping to each transcript. Now, we can respond to this question. Could you explain why did we not read the read before? The reason is that since the cameras which contain the adapter sequence are not present in the transcript from which the hash table have regenerated, they are omitted. Let's have a look at the dataset generated by Salmon. The first column includes the gene IDs. Then, the length and the effective length. As we can appreciate, both length differs. The reason is that the effective length takes into account all the factors that can affect the probability of sample fragments. Then we find the TPM estimated number of reads. Here, we can see the total numbers of gene quantified. Next, we are going to perform the differential expression analysis using the SAC2. The same tool we used in the first part of the analysis. Then, let's select the tool. Now we should specify a factor name, for example Brassian-Osteroid treatment. Now we should specify the first factor, Brassian-Osteroid. Select the gene quantification generated by Salmon. Now we specify the second factor, CTRL. Next, select the quantification of the CTRL samples. Now we should leave this option the choice of input data. We should select TPM values. Select Salmon. Select the annotation. And just execute. Again, Galaxy suggests us different tools for the next step of the analysis. Ok. Now let's rename the generated datasets. We are going to copy the proposed name. Just click here and paste. Now we are going to specify the Salmon outputs. Let's have a look at the result generated by the sector. Here we can see as before gene ID, mean expression and the adjusted value. Now let's compare the similarity between the mRNA samples and the microRNA ones. For which we'll use the scratchbook feature. As we can see, the effect of the steroid treatment is more prominent in the case of the mRNA samples than in the macroRNA ones. The heat mat of sample to sample distance reflects a similar trend. With the mRNA sample of each treatment clustered together. Now we can respond to this question. Here is the solution. Now as we did before we are going to import the desired result generated from the full mRNA dataset. Then let's copy this another link. Click in upload data now paste and start. Now let's rename the new dataset. Paste and save. We are going to add some tags mRNA control and brassinasteroid. As we can see the difference in special analysis includes more than 80,000 genes. The next step is to extract those genes whose expression is statistically significant. Then let's click the tool. Select the condition column 7 less than 0.05 and execute. Now we are going to rename the dataset and save. Perfect. Now we are going to filter. They have regulated genes column 3 higher than 1. Now let's rename again the dataset and finally we are going to add the genes which are don't regulate it. Column 3 less than minus 1. Let's rename the dataset. Don't regulate it. mRNAs. Ok. Now we can respond to the questions. First, how many genes show statistically significant differential expression? Second, how many of them are regulated? I'm downregulated and the third one is what is the most downregulated? Here we can see the number of differentially expressed genes. Here, the total number of upregulated genes and the don't regulated ones. The results match with the expected result. Now, let's check what is the most don't regulated gene. Here we can see the first row. Let's copy the gene ID. Ok, it's the same. In order to evaluate the biological function we can use the database. We should paste the gene ID and search this one. According to the database this gene encode a Cytocoron B450 enzyme that catalyzed the blood reaction in the production of Bracenolyte. In other words, it's involved in the Bracenosteroid biosynthesis To predict the mRNA target first we need a transcriptome sequence in faster format. Then we will obtain the sequence of the microRNA whose expression is induced by Bracenosteroids. It will require four consecutive steps. First, we will track the gene IDs from the upregulated microRNAs then we filter the transcriptome sequence from the major microRNA faster file and then from the star sequence. Let's start. We should select the column 1 and the file of the upregulated microRNAs and execute. We are going to rename the dataset. The next step is to get the transcriptome sequence. We should select the major microRNA faster file list of IDs select the upregulated microRNA IDs and execute and now we should repeat the same but in that case we should select the star sequence select list of IDs upregulated microRNA IDs and execute. And finally we should combine both files. Let's rename the merget file. We can have a look. You can see it just contains the gene IDs and the filtered files contain both the IDs and the sequence. It contains 9 sequences. Finally, let's rename the merget file. Ok. Next, we are going to get the sequence of the downregulated mRNAs, which we'll use to identify potential targets. Again, first we need to extract identifiers of the downregulated transcripts. Select the column 1 not select the file of the downregulated mRNA and execute. Ok. Now, let's rename the new data set. Finally, we'll extract the sequence of the downregulated mRNAs for which we'll use the completed abelocid tranquilitum together with a list of identifiers obtained in the previous step. In this case, it will be necessary to use a reject search pattern. Let's have a look at the transcriptome FASTA file in order to understand the reason. As you can see, the header name includes mod information. Let's compare with the micro RNA FASTA file. Yeah. The header line includes only the gene ID. Let's continue. Select the transcriptome FASTA file now list of IDs downregulated mRNA IDs custom reject pattern and paste the expression and execute. Now, let's rename the data set Now, we can use the target finder tool which is specially designed to predict potential mRNA targets in plants. Requires two inputs the upregulated micro RNA sequence and the downregulated mRNA sequence. Okay. Then, let's start. First select the upregulated micro RNA sequence Now, we should select the downregulated mRNA sequence Here, we can check the content of each of the files Now, set a prediction score of 5.0 and as output format select tap the limited format and execute. Okay. Now, let's have a look at the results. Okay, the output of target finder includes different columns the first is identifier of the micro RNA then identifier of the targets as you can see we have found 5 different potential targets then includes additional information such as the score value the sequence of the target and the micro RNA Here are the targets Let's check the biological function of one of them According with the Tyreotabase, these gene encodes allow affinity, sulfate transport and express it in the root cap and the setter cylinder We have identified 5 genes whose expression is potentially not regulated in response to Brazil asteroids In order to confirm the results we could acquire seeds with mutation in those genes and carry out an experiment under control conditions As additional activity you can try to repeat the workflow by using the additional dataset that we provide at the end of the training In that case we'll compare the gene expression patterns of mutant overexpressing the Brazil asteroid receptor BRL3 under two experimental conditions control and drug stress This exercise can take longer because we provide the original datasets As a final summary we have used a combination of different tools for the identification of genes potentially regulated in response to Brazil asteroid through a micro RNA mediated mechanism Here you can find the bibliography that we have used and we encourage you to leave your impression Thank you so much for your attention Bye