 Today we are going to talk about next generation sequencing technologies. Before we talk about this you know revolutionary technology platform, let me share a story with you. So, when I was a grad student like you, I was finishing my masters in year around 2000-2001, that time the various genome sequence projects were getting accomplished. It was you know big news that time, big breakthrough that time that the human genome project the first draft was completed. And many other model organism and their sequences were also getting completed at the same time. Just imagine that you know first time we were getting information that what you know makes it human, what our genetic composition is about, it was you know really big science breakthrough because first time we now know that how many genes are actually available to us which makes it a human which makes it so unique and different than other model organisms. Before that there were only speculation that you know what makes human and you know how many genes we may have, but the first time the evidence for genes and our genetic composition the genome was first time revealed right. It was definitely big revolution in the science field in general and differently in the biology and medicine area, but it is not end right, it is not finished that time. So many times a technology shows as a promise and then you know it fades away. But what has happened in this area of genomics that you know 2002 and 3 first time the draft human genome were released. But since then there has been tremendous advancement which has happened in this field of genomics reaching to where we are which we say as a next generation sequencing technology. A series of advancement happened for the first generation, second generation, third generation of regulation sequencing technology and I must admit that advancement made by engineers and technologist, big data scientist and biologist together has really brought where we are currently in our magnitude of doing sequencing based experiments. So what has changed from 2002 in 2018? Second thing which has changed is our speed, the speed at which we can now sequence a genome. Imagine the human genome projects took more than 10 years time to accomplish. Now within 2 days time or 48 hours time you can do sequencing of a given individual with the next generation sequencing technology that is humongous change. Additionally what took billion dollar to accomplish one human genome sequence now it takes may be only a thousand dollar to accomplish the same sequencing technology based work. So both cost wise as well as the speed wise as well as the accuracy wise the next generation sequencing moving from the Sanger sequencing based platform have made tremendous revolution. So I am sure now you are excited to know that what NGS technologies are which are the currently big industry player, big industry technologies which are leading this field and in this light we have invited a guest Dr. Atima Agarwal from Thermo Fisher Scientist who is manager a commercial service training and tech support at Thermo Fisher Scientific who will be taking this lecture and she will try to provide you the basics of next generation sequencing first and then give you little more detail and specs fix for one of the technologies of iron torrent. So let me welcome Dr. Atima Agarwal for her lecture on NGS based technologies and its applications. I am Atima Agarwal from Thermo Fisher I have been with this company since many years now so if you remember or if you have heard in pre 1990 era the sequencing used to be done on gels with radioactive labels yeah so and it used to be a very very cumbersome affair but then it had contributed a lot to the science in those terms that first human genome was being sequenced on these systems and there were many platforms many of these capillary electrophoresis platforms if you look into the pictures of Sanger Institute during that time in lines there were these kind of systems and they were generating sequences day and night and that's how the first human genome got sequenced and got assembled so those were a lot of like I would say that benchmarks and sequencing suddenly somewhere during 2004-2005 we started hearing about next generation sequencing. So if I were to talk about next generation sequencing it is more aptly called as massively parallel sequencing so with Sanger what you're doing is that either you are running it on 96 well plates or you're running it on strip tubes so one well is giving you one sequence yeah so one sequence which can which is a stretch of 80 gc and this can be as long as a thousand base pair 1100 base pairs and it can be as short as a 100 base pair fragment so the one well gives you one sequence now suddenly with massively parallel sequencing you can imagine what is happening is you have millions of wells and you have this sequencing per se happening in all those millions of wells and that is how you are generating a lot of data per run so that is massively parallel sequencing. So just to tell you that just because we are now nowadays talking a lot about NGS it is not that capillary sequencing has lost its charm or has lost its utility it is still a very user friendly technique for a lot of applications so for tech for things wherein you do not really require a lot of data things can be done simpler for a lot of microbial research or a lot of single gene based tests and all those things they can be still very well and very efficiently being done on capillary sequences till there this has been the gold standard technology for doing a lot of things like targeted so you have certain amplicons to be sequenced or you have a de novo organism to be sequenced to fill up those kind of assemblies then you have multiple there are multiple fragment analysis applications wherein you are not actually sequencing the fragment you are just looking into the size and based on the size polymorphisms you are telling you are giving certain scientific answers so this is about capillary sequencer coming to next generation sequencing so now as I mentioned that this is a technology wherein you have millions of wells so if you see that these are the chips which are used with iron technology and you have millions of wells on these chips so millions of wells so you can imagine that each well has a sequence and that is how you are generating millions of sequences in a run so the agenda how we have planned out the session for you is that we are going to talk about our new series of NGS platforms which is iron gene studio S5 systems the technology behind the gene studio S5 and some of the applications so this is the fastest sequencer which is available in terms of an NGS which can produce starting from once you have started your sequencing run it can produce data in as little as 3.5 hours from a sequencing run there are lots of different kinds of applications people want to work on NGS and lot of these samples are very challenging samples like when you extract DNA or RNA from FFP blocks when you are working with liquid biopsies like you do not you really are looking at cell free DNA yeah or you have very little RNA from a cell and still you want to study the whole transcriptome of the cell so our technology is such that it is compatible with as low as one nanogram of DNA or RNA and the backbone of this technology is AmpliSeq technology we will come to those details later so coming to the workflow now so how we are generating these millions of reads so you start with nucleic acid now this nucleic acid can be DNA can be RNA so where do you think you would be starting with the DNA what kind of experiments do you think you would be starting with the DNA on NGS DNA as a sample whole genome sequencing yes plasmids yes targeting resequencing so when you are not so targeted resequencing is basically when you do not want to sequence the whole genomes you just want to sequence some targeted regions of the genome that is targeted resequencing yeah so where do you think you would be using RNA as your starting material transcriptomes yeah again like targeted targeted trans you are looking at some targeted transcripts or you are looking at some targeted fusion events yeah anything else yeah RNAseq and RNAseq typically why would you do an RNAseq you want to sequence yeah so this DNA is transcribed into RNA and then hence there are more probabilities that this particular part of the genome is actually getting transcribed or further translated into a protein but that is just a possibility there are many long there are many non-coding RNAs also yeah like small RNAs yeah yeah which are yeah so which are there involved in lot of regulation activities of how the transcriptome is shaping within a cell yeah so people do a lot of different kinds of experiments now for your what you would look out for in your NGSS that all these experiments because the data requirement for all these experiments is very different yeah so when you are doing a genome it's a very straightforward calculation that you generally tend to generate a hundred x data so whatever is the genome size you multiply by hundred there are definitely ifs and buts around it will not go into those details but then the data requirements for these different kinds of experiments are very different yeah so that is where you need that flexibility that single NGSS system can cater to all those requirements so you started with your DNA or RNA yeah to begin with so you isolated DNA RNA from cells or from some culture isolates or whatever so now once you isolated this DNA and RNA the next step is that you prepare a library now library is basically a collection of fragments whether it is coming from DNA or coming from RNA which you finally want to sequence yeah so this can be an enriched portion yeah so for targeted sequencing you would need to enrich that those targets first and then only you can go for the sequencing for whole genome sequencing since your aim is to sequence whole genomes so you do not need any enrichment you will just extract DNA and you will go forward for library preparation so library preparation is basically collection of fragments whether DNA or RNA which you finally want to sequence yeah so what what is the basic aim in library preparation is that you are now these next generation sequences you we talked about that you are generating 200 base pair it reads you are generating 400 base pair reads you are generating 600 base pair reads so supposedly we take an example of whole genome sequencing now what you are going to do is with whole genome sequencing you want your aim is to sequence that genome as fast as possible yeah and with as bigger reads as possible so probably you will pick up a 400 or a 600 base pair chemistry and so what you need to do with the genome is you share this genome yeah you break this genome into smaller pieces so that can be done by enzymatic treatment or that can be done by sonication so once you broken down this genome now these are the fragments which you finally want to sequence but you do not know what is there so you need to have some adapters yeah or linkers which will help you facilitate sequencing these regions yeah so in library preparation what you are doing is you are enriching the region of interest and second is you are ligating it at both the ends you are ligating it with linkers these linkers are double stranded DNA fragments which are basically exploited so that the primer can then sit on these kind of fragments and then so you have your fragment of interest in between you link the adapters at both the ends and so why so that the now these adapter sequences you know because these are coming from a chemistry and we tell you that these are the adapter sequences and then the primer comes and sits on these adapters so that you can sequence this region in between yeah so that is basically so with library preparation what you are doing is you are enriching your region of interest and you are ligating these fragments to adapters yeah then comes template preparation so basically now these are the things which come which are specific to a technology so in template preparation what you are doing is that you are taking these library molecules and you are amplifying on to ion spheres so we are talking about ion torrent technology so we are using ion spheres as the mode these ion spheres get coated onto those or get loaded onto those chips and finally these ion spheres are carrying those regions of interest or those library molecules from which you are generating all those reads yeah so that process of taking these library molecules and amplifying them onto the ion sphere is called template preparation then comes sequencing now you have your ion spheres which are templated which have library molecules amplified onto them and now you want to sequence so you load these ion spheres onto the chip and then you generate the data so once you have loaded you initialize your sequencer which we discussed is going to take only 15 minutes and then your sequencing run goes for anywhere from 2.5 to 4.5 hours depending on the kind of read length you are targeting so talking about the instrumentation which is required in this workflow so we have so library preparation can be done manually can be done on ion chef there are some kinds some parts of library which can be done on ion chef yeah so this is an automated way of preparing libraries otherwise you have manual way of preparing libraries wherein you are using certain magnetic beads certain library preparation kits from us and the process generally for targeted sequencing or for whole genome sequencing the process is around 4 to 4.5 hours for manual and probably for RNSE it is a days protocol because you have to do certain levels of enrichment and you have to do more QCs in that so if you talk about manual preparing libraries manually so that is the kind of time frame otherwise you can use ion chef for preparing your targeted sequencing libraries automatically then comes the template preparation now as we discussed during template preparation you are loading these library molecules onto the ion spheres and then finally these ion spheres are loaded onto the chip now these loaded chips you take from here and you just load it in on this system and they are good to go for sequencing so just coming into few details on library preparation now how we go about library preparation is since you are targeting variations at the DNA level as well as at the RNA level you isolate DNA and RNA yeah then you reverse transcribe this RNA so now you have DNA or cDNA so you take this primer pool and you use this DNA and cDNA as your target all these genes yeah get amplified at one go and then since the primers are coming from conserved regions wherein you do not have any hotspots or any kind of variations to be looked into we partially digest these primer sequences and then we ligate adapters now when we say that you can do smaller experiments also so you have a 52 gene panel and you have some 12 samples to be done so what you are going to do is you index these 12 samples by 12 different barcodes now these barcodes so when we said that library preparation is all about enrichment of the targets and ligating adapters so when we are ligating these adapters you are using barcoded adapters to differentiate one sample from another yeah now this barcode is a 10 base pair unique sequence which is present in the adapter yeah which is which will finally help you to differentiate between different kinds of samples which you are loading on one chip so once you have once you have ligated these adapters you pull all these libraries now you are you are working with probably six samples on a chip so with based on the number of samples you plan to multiplex on a chip you are going to use the number of barcodes once you have done that now you want equal amount of data for all those libraries what you are going to do is in terms of their molarity concentration you will use equal molar concentration for all those libraries and make one pool of that but the aim is that when you are doing smaller in a sequencing for all these samples you are aiming that you generate equal amount of data for all these samples but because otherwise probably some of the samples have worked out pretty well and some will not so to ensure that you need to pool all these libraries in equimolar proportion so once you pool that after your library preparation step now you started with 12 samples or you started with 20 samples now that becomes one sample yeah and now how are you going to differentiate now because it is now one sample but you started with 20 samples so when you generate the data how are you going to differentiate yes we use barcoded adapters so the system is first thing it's going to use when it sequences it sequences the barcode it's barcode sequences and then it sequences your region of interest and then it makes the bins that okay this read belongs to this sample and it is in this bin once we are through with our library preparation then the next step is template preparation so in template preparation the aim is that you clonally amplify this so you have prepared you started with targeted sequencing or you started with whole genome sequencing or you started with an RNAseq experiment you have tons of target millions of target molecules which are to be sequenced so what we are aiming during template preparation is that one library molecule gets clonally amplified onto one ion sphere so this is the basic aim of this particular step and so this library molecule gets amplified onto this ion sphere and this is one clone which is getting amplified all over on the ion sphere then there could be some ion spheres which have not got amplified yeah so you do not want to load them on the chip so you negate all those ion spheres at the enrichment step yeah so what is happening is that we are using ion spheres which have an oligo which is linked to these so this oligo is complementary to one of the adapters which you used during library preparation so if we go back a few slides so we used two adapters P1 and one was barcoded adapter so and we mentioned earlier that we are basically exploiting these barcode these adapter sequences for us to enable these sequence to sequence these millions of fragments later on on one chip so we are using this particular P1 adapter sequence yeah for the library molecules to complementary go and bind onto the ion sphere so now if we go back to that slide so these ion spheres are loaded or are coated with oligos which are complementary to the P1 adapter so when these emulsions are being created you are basically making these water molecules which is surrounded by oil and you are trying to amplify these library molecules so when the this emulsion is being created this is a random distribution of these ion spheres with the library molecule and enough PCR components for it to amplify this library molecule onto this ion sphere so what is happening is a library molecule is coming it is getting denatured and it is sitting on the complementary sequence which is complementary to the P1 adapter and then it goes ahead and it sequences this part and that's how you sequence now this goes through a cycle of denaturation annealing and extension like your normal PCR but when while we are talking all this has been so you are not manually involved with all this process all this is being done by these systems either semi-automatedly or fully automatedly so then when when it gets denatured it then again binds to another oligo on that same bead and you then that's how you are generating you are clonally amplifying this particular molecule onto the ion sphere so what is if we go into the details what is happening is you have an emulsion oil then you have a library molecule you have some primers this is your ion sphere this library molecule gets denatured it gets bound to the ion sphere it gets extended again gets denatured now this template goes and binds to another oligo and finally you have this one library molecule clonally amplified onto this ion sphere yeah so now we said that there could be some molecules or some of those water droplets when the emulsion was being created they did not get a library molecule so they will not get amplified also but we do not want to take them on the chip because they are still going to occupy some space on the chip so why to lose on that data so what we are doing at the enrichment step is when we were doing amplification on the ion spheres one of the primer was bitonylated so now all the amplified ion spheres are bitonylated we are using strepidine coated beads to fish all the amplified molecules and rest all molecules remain in the solution yeah and then we just go through so this is primarily your ion sphere this will have these molecules all over because we clonally amplified so you use strepidine coated beads and this strepidine coated beads are magnetic so with the help of a magnet the all these are fished out and then you have only the beads which are amplified which are finally to be loaded on the chip so this is the enrichment part of the template preparation then what is happening at the sequencing level now I showed you that chip that chip has millions of wells and each well is working is generating a sequence so your so this is your chip it has millions of wells and if we are now to consider that this is one well this one well has one ion sphere which is loaded with one kind of template so what what is happening during sequencing is that when we are when ion chef is preparing the sample for the sequencing it is already adding the primer and polymerase to the chip itself now what else it needs for extension dntps yeah so what the system does is it is adding dntps sequentially on to the chip and then naturally we know that whenever a dntp gets incorporated or gets polymerized into a growing chain of dna there is a there is a bond formed between the phosphate group and the hydroxyl group yeah and this phosphodiester bond when formed this releases a hydrogen ion yeah so now we coated these library molecules on to this ion sphere so this same even so supposedly the first base here is an A yeah so you you had flown dttp so it will have a complement it is a complementary nucleotide it will bind and it will release hydrogen ions and these hydrogen ions get converted into a voltage signal which is finally being captured so this ion basically is bringing these release of so many hydrogen ions in one well are bringing up a change in pH for that particular well yeah and when the pH is being changed for that particular well that change in pH is being recorded as a change in voltage so basically you are converting a chemical information into a digital information and that is how you are saying that a base a signal has been received and a base has been called now supposedly now we said that T is your or A is your first base and the system had flown a dgtp so now that is G is not going to bind to A yeah so what will happen is there will be no binding no hydrogen ion release so no no change in pH no signal detected by the system so we will just see this by a cartoon so considering that this is your one ion sphere these are your adapters yeah this is the barcoded adapter and this is your adapter which you used for ligating these sequences onto the ion sphere then what is happening is your primer is coming in binding here you add the dntp sequentially now what will happen is this will bind you will have a change in pH which will be recorded by the system and the system knows that it is flowing a dgtp at this particular time and that is why it can associate this signal with an A yeah so we are not using any fluorescently labeled dntp so this is a simple chemistry now what will happen in the next so we had flown in an A and the next sequence next target base is G they are not complementary so it will not bind it will get washed off then again so likewise you will build up the sequence now what will happen at this step supposedly we are flowing a dgtp what will happen sorry yeah since there are two A's in a stretch and there's nothing stopping it it's a growing chain yeah so you will have double the signal yeah so there are double molecules double number of hydrogen ion release which will bring up a double change in the voltage and that is how it is being recorded as signal which is double and that is why you can tell that there were two consecutive A's in the sequence so then this by adding the dntp sequentially finally the system is being is creating a sequence which is finally being read as an ionogram so this ionogram is again being read by the sequencer this has to be read from left to right and for the number of bases so whether you get a signal or not that will define the sequence of the dntps the intensity of the signal will define how many of those dnt how many of those A's or T's were in a row so hope you got a glimpse of next generation sequencing technologies which has really revolutionized the field of zero max you were given the basics and some possible applications using this technology platform and one specific technology of ion torrent was also illustrated and discussed these concepts will be again covered in more detail in the following lectures when we have invited some more industry experts representing different technology platforms who will also provide you the basics and the current status of these technologies so see you in the next lecture thank you