 In the last few lectures we are discussing about one of the revolutionary technology next generation sequencing technology where you are given the concepts starting from the basic to the latest advancement happening in this area and we are very fortunate to have some of the very leading industries and their application scientist directly sharing their experience with you. So, in this light in today's lecture series today we have Mr. Rahul Sulanki a senior field application scientist from Premars life sciences who will talk to us about Illumina next generation sequencing workflows. So, let me welcome Mr. Rahul Sulanki for his lecture today. Let us say about protein sequencing what exactly the sequencing is to know the sequence of amino acids. If I speak about DNA sequencing, right, so why we need it, why it is needed, so why you need to identify gene, study mutations, so ultimately all the functions which are related to the protein ultimately those are coded by gene, right, it is correct. So, start I will start with the Sanger though we do not have the Sanger sequencer with us, but just to clarify the basic things I will start with the Sanger. So, there is the principle what you do is let us say you have a sequence you flow what you flow into the chamber you have four kind of NTPs and one contains DD NTPs, right. So, can you see the stretch of sequences the labeled, so those are DD NTPs which are labeled with a fluorophore. So, let us say first one is A added, right, so that polymerase cannot extended why because the base added is DD NTP, it lacks a OH group, okay. Then so on you will be having a different kind of fragments over here. Then after that you run it through a capillary, so capillary is nothing but a gel, right. So, what we do is after the sequencing is done we flow all the sequences through a capillary and you can see there is a detector, this all fluorophore is labeled with a fluorophore all the sequences last base, so which will be the first base will be flowing through a gel that first one the upper one smallest one, okay. So, through agarose gel that topmost sequence will be yeah it will pass away first, okay. So, let us say it contains A, so you will get a signal for A. Similarly, let us say the second base is G in the second strand, so G will be called, right and so on. So, this is the basic of Sanger sequencing. So, what is the maximum base pair which we can sequence using Sanger, good result quality data, 1000 base pair correct. So, see Sanger is still a matter of choice when you wish to sequence a gene and ideal length of a gene and human is almost around a KB, correct. But why we need NGS, now the point is why the NGS is required, great. So the right answer is throughput, right. Any idea about how long the first human genome took to get it sequenced? It took 10 years and almost how much million dollar was invested, billion dollars, how many of them almost 3 billion dollars was invested for our first human genome to be sequenced. So, what is the length of human genome, great. It is a common question of CSR net, I mean who appeared in the net I guess, okay. I hope you are writing. So, it is 3 billion base pair, 3 into 10 to the power 9, so you just think if you wish to sequence a human genome, how many reactions of Sanger you need to carry out, the length of human genome is 3 billion base pair, 3 into 10 to the power of 9 and the maximum length of gene you can sequence is 1000 base pair. So, 3 billion divided by 1000, how much it is? How many reactions you need to carry out? How many? 3 million. So, 3 million Sanger reaction you need to carry out to complete the human genome sequence and the sequence you will be getting will be covered at 1x coverage. So, even if you carry out 3 million reactions also, you will be covering your bases how many times? One time. So, the coverage will be 1x, so if you wish to sequence a human genome, do you wish to invest 10 years more? No. So, with the help of NGS technologies, you can sequence 50 human genomes in a span of 48 hours. How many? 50 human genome in a span of 48 hours. So, the key idea remains the same, the principle remains the same. What we do is, let us say you have a this human genome, let us say. So, what we do initially is, we fragment this human genome into very small fragments and just like cloning, we do not need cloning at all in the case of NGS. What we do here is, so I will be covering all the parts one by one. The very first step involved in all the sequencer are library preparation, which is followed by cluster generation, then comes is the sequencing and final is the data analysis part, ok. So, first is the library preparation, it is common for all the platforms, though the sequencer are different, but the concept remains the same. So, what is library first of all? In terms of sequencing, if I say what is the library? In Sanger, what you do? You clone it, that vector contains the known sequences, already it contains the primer for what the sequencers are known. So, using the primers, you can sequence your gene. In the case of NGS, what we do? We chop down the entire human genome into smaller fragments, but those are unknown, right, correct. What you will do? Where is the chop? Again the same question I am asking, let us see this is the human genome. In the Sanger, what we do? Any genome, human genome, big genes, what we do? Initially the first step is we fragmented into various small fragments, let us say human genome. So, 600 base pair we will chop it down into various millions of fragments, but these are unknown. So, to get them sequenced, what we need, right, you remember now. So, what we will do is, initially we will use a polymerase same. So, the tendency of polymerase is what it does is, if you have a this many fragments. So, initially we converted into blunt tendon, all the ends into blunt tendon, we will add a polymerase. So, the tendency of polymerase is, it will add A to all the ends, right, you know, and then we have adapter which contains a T over there, there will be a simple ligation step. So, those fragments will be converted into the fragment which will be ligated with the adapters. So, in library preparation, the ultimate goal is, the first of all we start with the DNA fragment, what we do is, we fragment it into various length of fragments, let us say 600 base pair, then we repair the ends, now it is converted into blunt tendon, and finally we ligate the adapters. So, what are adapters? Adapters are the stretch of DNA which for which the sequence are known, already known, right. So, this is how the library looks like, finally. It will contain a DNA insert which you wish to sequence in the middle, and now you can see there are the two regions, RD1, RD2, and these are essentially the adapters, we have two kinds of adapters, P5 and P7, all right, is it clear? So, I will tell you what is the function of all, this is the sequence, unknown sequence, all right, here you can see there are two regions, here our read one sequencing primer will bind, so this is called as read one sequencing primer binding site, flanked by both the ends are indices, what are indices, what are index, these are barcodes, so what is the function of barcode? So, in NGS what you do is, you can sequence, as I said using NovaSeq, you can sequence 50 genomes altogether, so ultimately with the help of barcode you can identify which sequences belong to which of the samples, am I clear? So, this is how the library looks like, the ultimate goal of library preparation is to ligate the adapters at both the ends of all the DNA fragments, clear, great, so there is all about library preparation, coming on to the cluster generation, so now what is cluster generation? So, essentially all the process happens on the flow cell, right, flow cell is not nothing but a glass slide, you can see there are two flow cells, this is high seek flow cell, which is meant for sequencing human genome and this is high seek flow cell for targeted sequencing especially, so it has eight lanes, it contains eight lanes and it has a common link, why we call it as a flow cell? Because the sequencing happens with the help of flow of reagent into this flow cell, okay, so the flow cell contains the loan of oligonucleotides, so essentially we have two kinds of oligos, one oligo will be complementary to P5 region and one will be complementary to P7 region, so before loading into the flow cell what we do is, we denature this double stranded DNA fragment into single stranded DNA fragment, so that it can go and bind to the surface of flow cell, right, so once you have denatured ed, this loans are complementary to those P5 and P7, right, already have denatured ed, so one of the fragment will go and bind to the flow cell, right, at this region you can see how it binded here in the cases with the help of hydrogen bonding simply because this region are complementary, you can see this is covalently bound to the surface of flow cell and this is because this region is complementary it will go and bind with the help of hydrogen bonding, okay and we extend this ends with the help of DNA polymerase, so you can see you will be getting a structure like this, then what we do is, we retain this original fragment because simply it is bound covalently to the surface of flow cell and because this strand is bounded by hydrogen bonding these are weak bond, so we denature it and we wash away this original template, we retain this one, clear, then what you can see is this is left over here and since this end also, this end and this end, it is also complementary to this oligo, it will slip over here, again we will add DNA polymerase, you will see a bridge kind of structure, okay, then again we will denature it and both this strength because they are bound to the surface covalently, you will see a two strand over here and this process will be repeated for millions of time and ultimately on the surface of flow cell what you can see here is we will be having a millions of fragment on the surface of flow cell, okay, so again we denature all the fragments, so essentially you can see it contains two strands, with strand it contains forward and reverse, all both the strand it contains, so what we do is initially we cleave the reverse strand and we carry out the sequencing for forward strand, okay, so now on the flow cell you can see which strand is retained, all right, so again there is a question, what should I do to prevent again, if I do not do a one step, what will happen, what is going to happen, again it will slip and bind to this surface, what should I do to prevent, again it should not bind to this primer, what should I do, good, so what I will do next is I will block the all free ends so that it won't flip, okay, up now you are done with your sequencing part, I mean cluster generation is over, now you are said to go for sequencing, so you remember in the library I have shown you a region RD1 region, so which is meant for hybridization of read one sequencing primer, so on the flow cell now you flow read one sequencing primer, so you can see that primer go and bind over there, then we are said to go for the sequencing, with alumina what we do is we essentially mimic the Sanger sequencing, just like natural phenomenon, so we have four kind of bases, so let's say this is your strand, we have added a primer, this you can see all these bases are labeled with a fluorophore, so one at a time, one base will go and bind to the sequence, it will be detected just like Sanger, will wash away the unbound bases and then we are said to go for the next cycle, so there is direct detection of the bases, so this, let's say you are running a sample of 300 cycle, so 300 base will go and bind, this is your one fragment, okay, so this is how if you are going for 300 cycle, 300 bases will be added and we are done with the sequencing part, so what is the key difference here is, we essentially mimic the natural phenomenon, there is no change in phosphate or something, the best part is, if you detect any biosecundary methods like phosphate or something let's say, so what will be the hurdle there, if one, if one base is added let's say there is one call for A in a cycle, so signal will be, if two A are added the signal will be double, but if you have multiple A or multiple G over there, so you can't resolve the signal, so in the terms of NGS, in the Sanger how you detect the quality of the NGS data, by looking at the peak, you get a peak, you remember your data, if the peaks are sharp, your call is good, so in the case of NGS we define the quality in the terms of Q values, quality values, so if I say Q30, Illumina use Q30 scores, so a Q30 score means is one error in 1000 base pair added, so the accuracy rate is 99.9%, okay, if I say about Q30 what is Q30, so we define our run as, let's say you performed a run, which contains a E-coli genome, so we will define 80% of the basis sequence were above Q30 values, getting, so the error rate in that 80% base calls were 99.9%, not error rate, accuracy rate, the error rate was 0.1%, am I clear, so what exactly the parent sequencing is, essentially you remember as Maam said, he initially we generate both fragments, if you wish to sequence only forward strand, what you will do is, you will cleave away the reverse strand and you will sequence the only forward strand, the parent sequencing what you do is, you regenerate both the strand, in the second time what you do is, you retain forward strand, sorry you retain reverse strand and you cleave away the forward strand, in this manner you can sequence both this strand of stretch of DNA, you have to, one is forward, one is reverse, so initially what we do is, we cleave away the reverse fragment, we retain, we sequence the forward, during parent what we do is, we regenerate, we de-block it, remember, so both the strand will be generated, this time we will cleave away the forward strand and we are going to sequence the reverse strand, so once we are done with read one, okay, so let's say my read length was, I have chosen 300 base pair read length, so this is your read one product and this is your sequencing primer and from here how many bases were added, let's say it was a 300 cycle run, so 300 bases will be added over here, those are detected and then what we do is, we de-nature this product, the product of read one sequencing primer will de-nature it and we are said to go for parent sequencing, what needs to be done is, we will de-nature it, again we will de-block this end, clear, so we will de-block this end, again it will flip because we have de-blocked the end and again you can see multiple clusters will be generated but this time what we are going to do is, we are going to cleave away the forward strand, initially what we did, we had cleaved away the and we have gone for the sequencing of, good, okay, so we will cleave away the original forward strand and this is how we are ready for sequencing for the second, so this time which primer will go and bind, RD2, you remember the structure of library, there was RD2 region, so there the read one, sorry read two primer will go and bind and just like read one we are said for the sequencing, right, all the region will flow, this is how you will sequence, just like the first one but the difference is here you are sequencing the reverse strand, this is all about the parent sequencing, so in all our platforms, essentially what we do is, we initially carry out read one sequencing, okay, then we go for index sequencing, barcode sequencing, which is essentially 628 base pairs long, then barcode two and then finally read two, okay, so initially we sequence read one, actually one question when I was in the Sengapa during my training, so one of candidate asked a question to trainer, I mean he was also not able to answer, so one candidate asked me, why you need separate read for indexes, right, let's say if you have this DNA fragment and you can remember the structure, so here is the indexes, so why you need this separate read, you can directly read from here itself, why you need a separate read for that, make sense, this question was logical, getting, what I am telling is what we do is initially we sequence the read one, then we go for read two sequencing, sorry barcode one sequencing, there is barcode two sequencing and then you are read two, finally read two, good, so your catch is right, in Sanger if you see, if you go beyond ideally speaking 6700 base pair, your peaks will be not that sharp, so as the read length will increase, let's say I have a DNA instead of more than 600 base pair, so as the read length increases, the quality may drop and for demultiplexing the samples from the pool, I need very accurate data, so because the barcodes are only 8 base pair long, so I should have very good data for demultiplexing the samples from the common pool, correct, so that is the reason why what we do is, we sequence the read one primer, sorry read one sequences, then barcode one, barcode two and finally we carry out what, read two that's all, there is the simple logic, so once we are done with the sequencing, the final stage is the data analysis, okay, so there are three kinds of analysis, first one is the primary analysis, so during primary analysis what happens is, you know all the bases are tagged with the fluorophore, right, so initially the camera will record the signals like this, it will be colored though, so red, green, yellow, but these are all your bases called, so what, there is a software which is called as real time analysis software, once your bases are added, those are detected in a real time, right, so once we are done with cycle one let's say, so you will get all the base calls file, so during primary analysis it happens with the help of a software which is called as real time analysis software, so it extract your intensities from the bases and convert those intensities into base calls, so we'll be getting .bcl files on the board, once the .bcl files are generated, the secondary analysis using my seek again start on board itself, so what it will do is, it will generate those base calls files into fast queue files and finally, using this machine tertiary analysis also you can get the reports on board, everything, so which are the pathogenic mutations, which are the silent mutations, every report you'll get, what you need to understand is what is primary analysis, during primary analysis the intensities of the base calls are extracted and you'll get .bcl files on the board, during secondary analysis you'll get all alignments and fast queue files, right, so I'll play a small video for you guys, so I hope it will clarify everything, the same thing which I have shown, so it will be shown sequentially. clustering is a process wherein each fragment molecule is isothermally amplified, the flow cell is a glass slide with lanes, each lane is a channel coated with a lawn composed of two types of oligos, hybridization is enabled by the first of the two types of oligos on the surface, this oligo is complementary to the adapter region on one of the fragment strands, a polymerase creates a complement of the hybridized fragment, the double stranded molecule is denatured and the original template is washed away, the strands are clonally amplified through bridge amplification, in this process, the strand folds over and the adapter region hybridizes to the second type of oligo on the flow cell, polymerases generate the complementary strand, forming a double stranded bridge, this bridge is denatured, resulting in two single stranded copies of the molecule that are tethered to the flow cell, the process is then repeated over and over and occurs simultaneously for millions of clusters resulting in clonal amplification of all the fragments, after bridge amplification, the reverse strands are cleaved and washed off, leaving only the forward strands, the three primings are blocked to prevent unwanted priming, sequencing begins with the extension of the first sequencing primer to produce the first read, with each cycle, fluorescently tagged nucleotides compete for addition to the growing chain, only one is incorporated based on the sequence of the template, after the addition of each nucleotide, the clusters are excited by a light source and a characteristic fluorescent signal is emitted, this proprietary process is called sequencing by synthesis, the number of cycles determines the length of the read, the emission wavelength, along with the signal intensity determine the base call, for a given cluster, all identical strands are read simultaneously, hundreds of millions of clusters are sequenced in a massively parallel process, this image represents a small fraction of the flow cell, after the completion of the first read, the read product is washed away. In this step, the index one read primer is introduced and hybridized to the template, the read is generated similar to the first read. After completion of the index read, the read product is washed off, and the three prime end of the template is deprotected. The template now folds over and binds the second oligo on the flow cell. Index two is read in the same manner as index one. Index two read product is washed off at the completion of this step. Polymerasis extend the second flow cell oligo forming a double stranded bridge. This double stranded DNA is then linearized and the three prime ends blocked. The original forward strand is cleaved off and washed away, leaving the reverse strand. Read two begins with the introduction of the read two sequencing primer. As with read one, the sequencing steps are repeated until the desired read length is achieved. The read two product is washed away. This entire process generates millions of reads, representing all the fragments. Sequences from pooled sample libraries are separated based on the unique indices introduced during the sample preparation. For each sample reads with similar stretches of base calls are locally clustered. Forward and reverse reads are paired, creating contiguous sequences. These contiguous sequences are aligned back to the reference genome for variant identification. The paired end information is used to resolve ambiguous alignments. All right, so I am sure by now you are very clear and convinced about the magic which next generation sequencing platforms have done for us. The pace, the accuracy, the speed, the cost, what one could accomplish by this platform was not even possible to think 10 years ago. So, really rapid advancements which have been made in this area are tremendous and now the major advantage one could see from this that many applications are directly reaching to the clinics. So, now doctors are pretty much relying on sequencing technologies and their results for the patient care and this itself conveys that a technology has reached to its robustness, to its maturity, to its accuracy to an extent that now it could be brought to the clinics and for the patient care. So, now you are getting introduced to different type of platforms for NGS. It is entirely up to you to think about what are the pros and cons of each technology, which technology offers you what more superiority and advantages, but I must say all these technologies are very good. It all depends on whether your aim was to do the whole genome sequencing, RNA sequencing, only looking at the panel of the genes or what exactly you want to address, accordingly you can choose the platform. There are many next generation sequencing technologies are really at the advanced level and it entirely depends on you which platform you can choose. Nevertheless, just you know keep in mind that this NGS is a parallel sequencing technology which have really changed the way we have seen how to look inside at the genome level and these applications have made tremendous revolution in the entire biological science and medical science area. With the ultra high throughput, scalability and speed, the NGS technologies enables researchers to perform a wide variety of applications and study biological systems at a level which was never possible before. Today, I hope you have learned about the basics of NGS starting from the Sanger sequencing to the Illumina platform using sequencing by synthesis method. In the next class, you will study another application of NGS using another leading technology platform and will continue our discussion in the next lecture as well. Thank you.