 whole transcriptome analysis of Arabidopsis tholiana. Before diving into this slide deck, we recommend you to have a look at the following. Let's start with the introduction. Assessal organisms, the survival of plants under adverse environmental conditions depends, to a large extent, on their ability to perceive stress stimuli and respond appropriately to counteract the potentially damaging effects. Coordination of phytohormones and reactive oxygen species are considered a key element for enhancing stress resistance, allowing fine tuning of gene expression in response to environmental changes. These molecules constitute complex signaling networks, endowing with the ability to respond to a variable natural environment. Bracenosteroids are a group of plant steroid hormones essential for plant growth and development, as well as for controlling abiotic and biotic stress. Structurally, bracenosteroids are polyhydroxylated steel derivatives with close similarity to animal hormones. Bracenosteroids have the ability to stimulate plant growth, influencing germination, rhizogenesis, flowering, senescence, abscission and ripening processes. In addition, several experimental results have demonstrated their ability to confer resistance to several types of abiotic and biotic stresses, such as heat, cold, salinity and trout. MicroRNAs, mainly 20 to 22 nucleotide small RNAs, are characterized for regulating gene expression at the post-transcriptional level. MicroRNAs are distinguished from other small RNAs by being generated from precursor harboring an imperfect stem loop structure. Unlike in animals, the pre-processing of plant microRNA occurs in the nucleus. The pre-microRNAs are then exported to the cytoplasm after methylation and incorporated into the Argonaut 1 protein to form an RNA-induced silencing complex. The microRNA itself does not have the ability to cleave mRNAs or interfere with translation, but it plays a role in scanning the appropriate target. Four factors determine that microRNAs are considered master regulators. The first one is that multiple microRNA genes are regulated under given environmental conditions. The second one is that computational predictions estimate that each microRNA regulates hundreds of genes. The third factor is that the majority of plant microRNAs regulate genes encoding for transcription factors, and the last one is that targets include not only mRNAs but also long non-coding RNAs. We will now introduce the experimental design on which the analysis is based. As shown here, we can divide the analysis into three stages. Differential expression analysis of microRNAs, differential expression analysis of mRNAs, and microRNA target identification. The starting hypothesis is that there should be sequence complementarity between up-regulated microRNAs and down-regulated mRNAs signifying microRNA regulation. Now we will provide some details about the data and the tools that are used in this tutorial. For the microRNA sec data analysis, we will use a total of six samples, three of which are mock-treated biological replicates and the remaining three are the biological replicates of brass and asteroid treatment. From mRNA sec data, we have two replicates of each condition. In order to simplify the analysis, the biological replicates will be grouped in Galaxy into data collections. Collections allow to combine numerous data sets in a single entity that can be easily manipulated. Now we will look at the tools that are used in the tutorial. We categorize both the microRNA and mRNA sequencing data analysis into three analysis stages, namely quality assessment, quantification and differential expression. On the left side you can see the tools used for microRNA sec data analysis and on the right side, tools for mRNA sec data analysis. We first assess the quality of the raw sequencing reads using FastQC and MultiQC. FastQC is used to quality control checks on raw sequencing data from individual samples. MultiQC allows aggregating the results generated by FastQC across several samples into a single report. Trimgalore is a wrapper tool around cut adapt and FastQC to consistently apply quality and adaptor trimming on FastQ files. To carry out microRNA quantification, we will use two modules belonging to the MereDeep2 tool, MereDeep2Mapper and MereDeep2Quantifier. We will use salmon for mRNA quantification. For the differential expression analysis, we will use DESec2, a package for differential expression analysis of count data based on the negative binomial distribution. Finally, we will use the target finder tool for microRNA target prediction. If you observe, the adaptor clipping step by trimgalore is not included in the mRNA sec data analysis. There is also no additional mapping step. The reason behind this is the usage of the salmon tool. Salmon quantifies the transcripts or genes without requiring to align the reads based by base to the reference. Because of the working principle of salmon, we gain little to no advantage of clipping the adaptors from the reads. Overview of salmon mapping and quantification. Salmon accepts either raw or aligned reads. When raw reads are provided, salmon uses its lightweight, ultra-fast mapping model called quasi-mapping for further abundance estimation. Then an online inference algorithm estimates initial expression levels and model parameters. Following that, an offline inference module learns the background bias models from the initial abundance estimates and corrects the effective transcript lengths. Finally, an expectation maximization algorithm is used to estimate the relative abundances. Now we will look into some details about quasi-mapping. Quasi-mapping allows quantification without generating any intermediate alignment files. It is faster than conventional mapping and saves a considerable amount of time and space. The quasi-mapping algorithm makes use of two main data structures, the generalized suffix array of the transcriptome, and a hash table that maps each k-mer in transcriptome to its suffix array interval. During the quasi-mapping procedure, a read is scanned from left to right until a k-mer is encountered that appears in the hash table. That k-mer is looked up in the hash table and the suffix array intervals are retrieved. Then, the maximal-mappable prefix is computed by finding the longest substring of the read that matches the reference suffixes. Owing to sequencing errors, the maximal-mappable prefix may not span the complete read. In this case, the next informative position is determined. The next informative position is the position on the read that has a unique matching base on the reference transcripts after a mismatch or a deletion. Suffix array search continues from k bases before the next informative position. Finally, for each read, the algorithm reports the transcripts it mapped to, location, and strand information. This figure is an illustration of the quasi-mapping of a read using k equals 3. Hash table lookup of k-mer a-t-t returns the suffix array interval b and e. The current k-mer and its matching suffix intervals are colored in green. The base g at position 6 has a mismatch with the c on reference transcripts, representing a possible sequencing error. Hence, the first five bases a-t-t-g-a is the maximal-mappable prefix and the suffix array interval of this maximal-mappable prefix is b-prime and e. a-t-t-g-a-c-t-a, which is in the red-colored box is the longest common prefix of the suffix array interval c and e. So the next informative position on the read is the ninth base t. In the end, the read in the above example most likely mapped to the suffix array at e. Salmon has several advantages and few limitations over the traditional alignment methods. Salmon provides fast and accurate quantification. Since the k-mer's which contain the adapter sequences are not present in the transcriptome index from which the hash table is generated, the adapters are not mapped. If there is no significant amount of adapters present in the reads, adapter clipping can safely be skipped. However, it is always good to trim the low-quality reads because of the large memory footprints of hash tables and suffix arrays. Generally, transcriptome faster file is used for salmon quantification instead of whole genome faster file. Therefore, it is advisable to use this tool on well-annotated organisms. Now we will look at some important details about microRNA target identification in plants, in animals. Usually, base pairing in the seed region of the microRNA that is two to eight bases is enough for target recognition. Plant microRNAs require more stringent base pairing. Generally, a near perfect pairing in five prime region and a substantial pairing in three prime region is necessary. Due to this strict pairing mechanism, plant microRNAs have a significantly less number of target genes. For this reason, we use a plant specific microRNA target prediction algorithm called Target Finder. Target Finder takes a small RNA sequence and a faster formatted file of the target sequence database. It uses the popular Smith-Waterman local alignment algorithm for aligning small RNA sequence to the target sequence database. A penalty of one is added for each mismatch, gap, and bulge, and 0.5 for non-canonical GU-based pairing. These penalties are doubled from positions 2 to 13. It allows at most one single nucleotide bulge or gap in the duplex. If the sum of mismatches, GU-based pairs, bulges, and gaps of a duplex is more than seven, then it is discarded. Duplexes with more than four total mismatches or four total GU-based pairs are also discarded. Thank you for watching.