 recording. Alright so welcome back everyone. This is an overview that I took from OpenStacks College from Wikipedia just to have like a zoomed-in view of how DNA looks in detail. So DNA and RNA are very very similar so it doesn't really matter if you look at DNA or RNA but head the two strands of DNA so we see the forward strand and we have the reverse strand and of course we have something called the five prime end to the three prime end so when we write down a DNA sequence we always write it down five prime to three prime and then we have the coding strand and the template strand and that is kind of arbitrary right the some genes are coded on the negative strand meaning that they are actually on the on the other side of the DNA but so DNA consists of two major parts so one of the parts is the base pairs which we always write down but besides that we have this sugar phosphate backbone and like modifications like epigenetics generally happen to the backbone so the backbones are modified but it can also be that the individual base pairs are being modified and these two strands are kept together by hydrogen bonds and that is something which the PCR technique actually relies on because PCR technique relies on increasing the temperature increasing the temperature means that hydrogen bonds starts start breaking and the two strands are able to separate from each other so we can do things like copy the DNA or transcribe parts of the DNA so if we look at DNA versus RNA so this is a kind of a different representation again you see the the backbone you see the different base pairs and the first thing that is very different between DNA and RNA is that there instead of a T so in instead of a T base in RNA we have something called uraseal so in a Q base so when you write down a DNA sequence you are using C, G, A and T and when you're writing down an RNA code on more or less or an RNA code you would write down C, G, A and U so the only difference between T and U is that this little CH3 group is not there when you are talking about uraseal so they are very similar one of the other things which directly is striking from this image is that RNA and DNA both more or less occur in helical forms so from there's a natural tendency for DNA to kind of form in a helix or a double helix RNA doesn't really do that RNA is generally single stranded there are there is double stranded RNA but double stranded RNA is very uncommon and for example eukaryotic cells will more or less destroy all double stranded RNA because it's considered dangerous because a lot of viruses they use double stranded RNA as their genetic material so if if eukaryotic cell encounters double stranded RNA within the cytosol then it will directly degrade the double stranded RNA which is where silencing so RNA silencing is based on so just a single overview so what are all of the differences between DNA and RNA well I already told you that RNA generally a single stranded DNA is almost always double stranded the bases are slightly different so an RNA molecule contains uraseal while a DNA molecule contains thymine the sugar backbone is slightly different so the RNA has a ribose backbone and DNA has a deoxyribose so that means that the deoxyribose has a as an H group and the ribose has an OH group and if you think back about chemistry then OH so an oxygen with a hydrogen or an H the OH is much more reactive that means that in water RNA degrades much faster than the backbone of RNA gets attacked by the water molecules so this OH group is kind of making RNA much more reactive in a way so but it's just a slight difference in the backbone but it is a major difference biochemically speaking so the size of RNA is generally relatively small while DNA is relatively big and DNA comes in whole chromosomes which are really really long millions or billions of base pairs RNA generally is small very small and the location is DNA is in the nucleus of a cell while RNA moves from the nucleus to the cytoplasm although there's RNA which can be found exclusively to the nucleus as well but generally it's a difference that DNA is not found outside of the nucleus while RNA is found outside of the nucleus so just a quick overview between the differences between DNA and RNA alright and then we end up in the part of the lecture that I like the least and that is telling you about all of the different types of RNA that are there so these are the types of RNA that I will be talking about and I hope that I can make clear what all of the differences are so we have a lot of these short names as well so we you guys just have to remember that so the main form of RNA which we've been talking about a lot is messenger RNA messenger RNA comes in two forms so you have Hn RNA and you have mRNA so mRNA is the the ready RNA so the mature RNA and Hn RNA is the pre-mRNA so the difference between these two is that this one has still the introns in it besides that we have transfer RNA ribosomal RNA small nuclear RNAs we also have as small nuclear organelle RNA SNO RNAs we have catalytic RNAs which are called ribosomes we have micro RNAs which are MIR RNAs we have small interfering RNAs SIR RNAs and we have non-coding RNAs which are NC RNAs alright so let's look at these a little bit more in detail right so messenger RNA or mRNA or Hn RNA comes into form so we have the pre-m RNA which is called Hn RNA which is called Hn RNA and then this is spliced and to produce the mature RNA so the introns right if you have the structure of an of an RNA then hey you have an axon there's an untranslated region at the beginning and at the end so these are for stability of the RNA and also directing the ribosome on how many proteins to make from this RNA but if you look at the pre-m RNA so the Hn RNA it has an axon and then generally it has some introns and axons so the axons are the parts which are coding for the protein and of course different axons can be retained or script to produce several protein transcripts from the same gene so a single gene on the genome a single sequence generally I think on average in the human genome a single gene produces four to five different proteins which is a lot so we have 20,000 genes encoded on a human genome or around 20,000 genes but protein-wise a human cell can produce up to like a hundred thousand to a hundred and twenty thousand different proteins depending on how you count and this is because it could couple this axon all the way to that axon and just skip out the axon in the middle so there's different ways of combining these together so the ORF so the ORF the open reading frame so the open reading frame is the final kind of sense-making sequence of codons that is translated into a peptide chain by the ribosomes and these peptide chains are then called proteins so the start of where in the mRNA the ribosome starts making proteins is in 99.9% of the cases in ATG so in the DNA sequence and ATG codes for a methionine when it's somewhere in the middle of the protein but when we start reading the mRNA so when we look at the mature mRNA we start at the beginning and then the first letter so the first time that ATG occurs that is where the ribosome starts transcribing the protein so many proteins also have a methionine first so a lot of proteins like 90% of the protein start with a methionine and then there's a stop codon and the stop codon can be coded in different by different three letter codons of RNA and that is UAG, UAA or UGA so those are the stop codons and those direct the ribosome to stop translating the mRNA at that point and to release the protein into into the cytosol or then the plasmatic reticulum depending on where synthesis occurs and then of course the UTRs the untranslated regions at the beginning and at the end are there for lifespan control of the RNA so they make sure that the RNA is not degraded during transport but they also they also encode like the number of proteins that should be made from a single messenger RNA so all right messenger RNA we already talked a lot about it so the next form is the tRNA so the tRNAs are really really interesting because they are coded into the genome and the tRNAs they always have this clover leaf structure so hey you have a you have a you have the five prime beginning and then the RNA folds back on itself three times to make these three different stems and then you have the kind of acceptor stem in the end and here at the three prime end there this is where the amino acid is actually coupled to the tRNA so tRNA is an RNA molecule which has a single amino acid attached here to the three prime OH group and here the A and the AOH so this free part is then coupled to a amino acid and of course the loop on the bottom here so this is called the anticodal loop so here there are base pairs which are encoding which which three letter base pair this tRNA should bind to so based on the anticodal loop also the amino acid is attached here so that they are loaded and then there's the D loop and then you have the T phi C loop and the T phi C loop is here and then you still have this little variable part which makes for we don't really know exactly what it does but the tRNAs are the physical link between the RNA sequence and the amino acid sequence of the protein and since and every animal more or less has its own structure of tRNAs because tRNAs are just encoded on the genome so that means that there are mutations occurring in these tRNAs as well so this means that based on mutations inside of these tRNAs you can see if a sequence is more or less human specific or if it's mouse specific or if it's specific for another animal so had the DNA code is more or less equal for everyone but these tRNAs are there to kind of well they they they they make a codon more or less specific so hey it can be that for example ATG right which codes for a methionine is coding for a methionine in mouse and in human but in mouse in human there can be different tRNAs which are which are coupling there so there's something called codon optimization which you can do to make codons optimized for a specific species so tRNA between 73 and 94 nucleotides long and it always comes in this clover leaf structure and it's a very fascinating molecule so we will come back to it again so how does this work so how are peptide synthesized well we have the ribosome the ribosome consists of two parts in this figure it's just a single ribosome but what happens is is that you have the messenger RNA the messenger RNA is pulled through the ribosome in the ribosome there are three sites so you have the first site the entry site where the tRNA kind of is is matched with the codon on the mRNA then you have an intermediate site where the binding between the two amino acids occur and then you have the exit site so the empty tRNA is then more or less thrown out of the ribosome at when when the next tRNA comes in so if we look at this in a little bit more detail then have we have the small subunit so the bottom part of the of the ribosome we have the large subunit which is the upper part of the ribosome what happens is the mRNA is pulled through the through the ribosome and what we see is that there is a matching of tRNAs towards the sequence then we go to the A site the A site is where the where the matching occurs the A site is then moved to the P site during this move the the amino acid is coupled to the newly born protein and then after the P site we get the exit site and the the empty tRNA is discarded so how just in detail so we have the A site which is the amino acille site which is the binding site for the charged tRNA then we have the P site for peptidyl and holds the tRNA which is linked to the growing polypeptide chain and then we have the E site which is the exit site which is the final binding site for the tRNA before being injected from the ribosome so there's that it's like a three-stroke engine and so as soon as something new binds into the A site the piece at the age the previous A site is moved to the P site and the P site is moved to the E site and the E site is made empty so it's a it's kind of an engine which just continuously runs and synthesizes proteins all right so the ribosome itself also contains RNA so the ribosome is a protein but it's a protein and as a cofactor of this protein there are several RNAs so ribosomes contain RNAs these RNAs are called R RNAs so in prokaryotes we have 23 23 s RNA 5 s RNA and 16 s RNA and s here stands for the sedimentation speed so it has to do with the size of these RNAs and in eukaryotes we actually have four RNAs inside of the ribosome called 28 s 5.8 s 5 s and 18 s and these RNAs in the in the in the ribosome itself they they are there to recognize the different tRNAs because there has to be a coupling to the tRNAs to the mRNA and it recognizes an mRNA sequence so there is there is the ribosome itself is a protein but there are RNAs in there to facilitate the recognition because of course RNA can only bind to RNA itself and it cannot directly bind to a protein so the ribosome contains either three or four RNAs to facilitate this binding and to match the proper amino acid with the proper codon using the tRNAs. Alright so let's take a look at the three-day visualization that I made so people always say never do a live demo but I'm still going to do a live demo so I took one of the biggest structures from the protein database let me shut down the sound actually and let's hope that it loads it in very good so I can add a new window capture alright and this is the window that I want to capture so this is my 3d engine you can see that it's also it can also be used to visualize other things let me make it a little bit bigger for you guys this one bigger alright so here we see the the upper part of the ribosome so what you see here is is the amino acid chains which are which are these little kind of squares right so an amino acid has four sides more or less so it's a planner it's a planner amino acid and then you have the groups so the side groups so you can see the side groups here if you look a little bit closer can I zoom in a little bit more and then all of these little dots here are the different dots which are the different molecules that are inside of the ribosome and when you zoom out then you can I don't know if it's that clear on twitch but here in the bottom you can see a couple of these tRNAs I think it's not clear enough for you guys at least when I look at the recording but had this this little program that I wrote allows you to look at all of the different amino acid chains so hey in total there are 57 peptide chains of or 57 different peptide chains which make up a ribosomal protein there are around we can see some dots yeah yeah can you see that it looks like a like a helical structure with little blue dots in the middle and then like the red dots on the outside for me the window is relatively small since I'm looking okay so you can see the really nice RNA kind of DNA looking structure so this is the RNA which is inside of the ribosome that facilitates it and then hey in total what you see here is the 57 peptide chain so it's not a single protein it's a protein which is made up of 57 different proteins a ribosome is around 11,000 amino acids in total and if you look at the whole thing then it contains around a hundred and fifty thousand different atoms to to do that and you can see actually that there's a little hole in here so the hole here this is where the RNA is pulled through the ribosome and then the other hole which should be here on the top which is not that clear let me see so this is the hole for the messenger RNA and then there's a hole on the top well that it well you can't really see the hole but here on the top this is where the mature protein comes out so you can you can look at it and I just like visualizing stuff and this is something that I made it does other things as well it also renders a solar system and other stupid stuff so you can you can also look at a model of the of the Sun and that kind of thing but it's something that I made which I think is interesting so hey it's when people talk I'm having real issues currently with okay so but it's it's it's just an interesting like visualization that they had to show you guys that the ribosome is really one of these immense proteins within the cell and this is kind of the link between how proteins are made how RNA is kind of translated into proteins so it's the most important protein kind of in the cell all right so let's continue so besides these messenger RNAs and ribosomal RNAs we also have small nuclear RNAs called SN RNAs so these are RNAs which are located in the nucleus not so much in in in the whole nucleus of the cell but in very specific parts of the nucleus and one of the things that these things do is do is facilitate the splicing because again when you splice out the introns from the mRNA to produce mature mRNA this of course has to be facilitated using RNA because only RNA can bind RNA so there's a protein which uses small nuclear RNAs to detect where the introns are or the introns and the axons are and then they there's a cutting mechanism to kind of cut out these parts so the spliceosome so this big protein which does the splicing is a protein RNA complex and this has five different little nuclear RNAs in there and these are called U1 U2 U4 U5 and U6 why they didn't name them U1 to U5 but skipped U3 I don't know exactly and this this whole spliceosome complex is again something which is made out of 150 different proteins and it is built up in parts so you have the first parts binding and then other parts binding and then the RNA is pulled together to make a little loop and then the RNA is again fused together so these small nuclear RNAs along with their associated proteins form a ribonuclein protein complex which is called SNRNPs which bind to specific sequences on these pre-mRNAs so did I have a yeah so that these these small nuclear RNAs are found within something which is called the splicing speckles and inside the cajol bodies of the cell nucleus so if you look at the cell nucleus it has these pores in there to facilitate transfer in and out of the nucleus and alongside these pores there's these little speckles so it's at the cell surface is not a it's not a flat surface but there's these little pouches near the big holes where stuff is being transported in and out and these are called the splicing speckles and these are there to just do one thing and that is the pre-processing so the processing of pre-mRNA into mRNA so splicing is the process of removing introns and there are two known types of splicing and those are U12 and U2 splicing so when you look at the at the mRNA then here you see the exon here you see the other exon and here you have the intron and the codes here after the introns are more or less fixed because these are the codes that are being recognized so hey you have the one splice site here and then you have the splice site at the three prime end and then in the middle here you have something which is the branch site and the branch site here is the thing that is kind of different between U2 and U12 splicing and so the sequence here where the new exon begins is more or less always the same it's YAG here it's G-U-R-A-G-U and in the other one it's also very similar like G-U-A and but the branch site here in the middle that determines if there will be a U2 or U12 splicing going on so it's a very highly regulated process where these two parts of the introns where the intron is cut out and where the exons are more or less fused together again to produce a single mature messenger RNA so I won't go into detail I in the previous lectures I always had like a very detailed way of how this splicing works and which splicing factors bind in which order but I don't think that that's important just remember that splicing happens in the splicing speckles which are near the pores of the nucleus and it there are two different types of splicing and splicing is either U2 or U12 and there are three sequences which are important for that one at the five prime end one at the three prime end and one more or less in the middle of the intron which is called the branch site so of course like mutations occurring here in any of these three sites will make so that splicing doesn't work because it doesn't recognize the sequences anymore and an intron might be inside of a protein or no might be inside an mRNA so not spliced out leading to a protein which cannot function anymore all right so besides these small nuclear RNAs we also have small nucleolar RNAs which are different types of RNA which play an essential role in RNA biogenesis and these are not there to do splicing but these are there to do chemical modifications of for example the ribosomal RNAs and other RNAs like the tRNAs and the SN RNAs because RNA itself also has a chemical function sometimes the the base pairs like the urodine base pair and which you see here needs to be adjusted to perform a certain biochemical function so these small nucleolar RNAs are RNAs which transform for example urodine into pseudo urodine and this is done by snow m1 as an example and it takes this urodine base pair and what it does it is actually it actually flips the side group of the of the urodine has so you can see here that the NH3 is moved to positions so the whole ring on the on the base on the urodine base is moved to base pairs and in addition there's an OH group being added here and this is very important because this actually produces this weird Phi thing right so if you look at this tRNA in the tRNA you see that there are some bases which are not really known base pairs right because we said that RNA only consists of four base pairs and you have the A the C the U and the G but there are in RNA there are other base pairs so this here this this Phi residue that is actually a u residue which has been chemically modified into a different chemical base pair had to facilitate like binding to the ribosome and also here to make sure that the anticodal loop and the T Phi C loop actually bind properly to the mRNA so there's a lot of chemical modifications going on and this snow M1 is one of the most well studied and well known small nucleolar RNAs have which do these chemical modifications of this U base pair into a Phi base pair into the RNA so RNA itself it has a very important biochemical function and it can only perform that function when it is modified and this modification again is done by other RNAs so and pseudo urodine modification is the most abundant RNA modification cellular RNA hey you can see still some more if you look very closely you see here this M2G which is a G base pair which has two methionine groups added to it and you see that that happens a couple of times and head there's so there's different there's different method methylated G base pairs in there as well and this is to make sure that it folds properly these these Phi residues here they are very important to make the tRNA tRNA being able to bind properly into the ribosome all right so catalytic RNA is called ribosomes so ribonucleic acid enzymes so those are enzymes like more or less proteins made entirely out of RNA and we already saw a whole bunch of them like the SN RNAs and the SNO RNAs they are ribosomes they are ketically active enzymes they're more or less enzymes right so they they facilitate a process of modifying one chemical chemical into another chemical and they are not used in the process and they are they they facilitate a whole range of RNA processing reactions like RNA splicing which is very important viral replication but also the tRNA biosynthesis because tRNAs once they are transcribed from the genome have to be chemically modified to be able to find their or to do their proper function and of course there are like a whole range of ribosomes like the hammerhead ribosome the hairpin ribosome the lead and the VS ribosome so there's a whole bunch of RNA molecules out there which are biochemically active and function as enzymes to facilitate a couple of very important bio biochemical processes so the discovery of ribosomes was done in 1982 so RNA can be both genetic material similar to DNA for example for viruses and a biological catalyst such as proteins and enzymes and then in the same kind of time the RNA world hypothesis came around and this is the hypothesis that before life on earth started so there was no DNA there was no proteins there was only RNA so RNA was because you have RNAs which can kind of facilitate the reproduction of themselves and this is the RNA world hypothesis states that before proteins and DNA more or less were invented there was a whole world which was alive and this world just consisted of RNA molecules copying themselves and having biochemical activity and modifying themselves and in 1989 there was the Nobel Prize for Chemistry for the discovery of these catalytic properties of RNA by Sydney Altman and Thomas Cek and they prove that RNAs can be both DNA so genetic material as well as proteins and enzymes so again a very important part of the RNA world alright and then we still have the micro and small interfering RNAs that we have to talk about which are micro RNAs and small interfering RNAs they are abundant in eukaryotic cells and you do post transcriptional control over our mRNA expression so when mRNA is expressed there is a level of control on the DNA level so if an mRNA is made or if it's not made it's then transferred to the splicing speckles to undergo splicing then it's transported outside of the nucleus and outside of the nucleus it can still be that no protein is being made from this mRNA because another RNA is binding to it and this is removing this RNA before it can be transcribed into the protein and so it functions by binding very specific sites within the mRNA and induce cleavage of the mRNA via a specific silencing associated RNA degradation pathway and that that is kind of the TAD complex which does that within the cell but it's it's a there's a lot of control at every step hey it might be that the cell nucleus got the message oh we need more of this protein so hey it starts then producing the mRNA but once the mRNA is made and processed and brought into the nucleus it's not needed anymore so then it is degraded before it can start producing proteins which were needed like a couple of minutes ago but not anymore so they're at every step in this biochemical process going from DNA to protein there is control and there is a way to kind of stop proteins from being made alright so micro RNAs are 21 to 22 nucleotides long and they are processed from high hairpin RNAs which are encoded by cellular DNA and they regulate gene expression by primarily inhibiting translation and promoting mRNA degradation so they bind to the mRNA and then are more or less removed before they can do so we know now that there are around 250 to 350 of these conserved micro RNA genes in the in the genome and these micro RNA genes are much longer they're 20 250 to 350 base pairs long they form something which is called a hairpin and then this hairpin is is cut into little pieces to form these 21 nucleotide long micro RNAs which are then used to degrade messenger RNA when when needed so how does this look so you have this gene here so you have a gene which is then called which is then transcribed the transcribed thing is a pry micro RNA then you have the Drosja protein which then cuts off the ends so you get really this hairpin structure and then you have Dyser which then cuts the the end of the thing off and then you have a mature micro RNA which is then exactly complementary which is binding to the to the mRNA and degrading the mRNA or it is partially complementary and that means that it does translational inhibition and that means that when a gene is translated it now when a when a when a messenger RNA is translated it is inhibited so the ribosome cannot translate the protein correctly so and then again another form of control to make sure that proteins are only made when they are really needed and of course situations can change and the mRNA pathway going from pre mRNA all the way to my messenger RNA which can be transcribed into the ribosome is a long process so this process is much faster and can kind of interfere and say well this thing was needed like two minutes ago but it's not needed anymore so we don't want the protein to be produced all right so then there's still the non-coding RNA so non-coding RNAs is more or less a catch all name for all of these things in the group so you also have long long non-coding RNAs and the RNAs that we talked about before are of course also non-coding RNAs because they don't code for a protein so except mRNAs all RNAs are non-coding so not protein coding and there are some special long non-coding RNA families that are found to be have a function in genome defense and chromosome in activation these are for example the PI RNAs so the PI RNAs are very very important in the germline cells so when when an embryo starts dividing and then starts making germ line cells so the the egg cells for example in females these PI RNAs are very important to provide genome stability so to make sure that when a cell is dividing that no mutations are occurring one of the other very well known long non-coding RNAs is called XIST and this provides chromosome X in activation in mammals like I think most people know when you are a female you have two X chromosomes and one of these X chromosomes needs to be shut down otherwise you would produce double the amount of protein that you would need right you are males also have one X chromosome so one X chromosome is is more than enough having two X chromosomes is actually well not bad but if you would express genes on both X chromosomes and then you would have an issue because then you would produce double the amount of protein and there is where XIST comes in because XIST operates at a very early embryonic stage to shut down one of the X chromosomes so the entire X chromosome is is tagged with one of these long non-coding RNAs and this is a random process and then the X chromosome which has been tagged by XIST is kind of wrapped around histones so the entire chromosome is more or less like wrapped around a couple of histones so it's not readable anymore and no genes can be transcribed from it so to very long of two relatively long non-coding RNAs which are found to be very very important in early embryonic development so how do this non-coding RNAs work well non-coding RNAs are more or less a made out of different parts so head there's different functional domains so for example part of a long non-coding RNA can do binding of RNA there are parts which can bind to proteins there are parts which can bind to DNA and then you also have conformational switches which mean that there's a it's kind of a mechanical switch where for example if an iron molecule is found the confirmation of the RNA starts changing and this will then affect the other parts and long non-coding RNAs are having a modular architecture so they are built based on these different structures and so you can it's kind of a mix and match bar-corpogen in German what's a bar-corpogen I'm a little bit confused Florian what what do you mean by bar-corpogen anyway a long non-coding RNA is generally made out of several domains so it's kind of a mix and match structure so had sometimes you have RNA binding and protein binding sometimes you have protein and DNA binding and sometimes you have like a protein binding DNA binding long non-coding RNA which also has a conformational switch so that means that when it binds DNA part of the RNA changes so another protein can bind or cannot bind so it's a it's a very modular architecture and these things can be kind of constructed mix and match together and cause these things that X chromosome you mean the inactivated X chromosome is called a bar-corpogen corporsion all right so those were all the different types of RNA that I wanted to discuss I know it's a lot yeah you can yeah you can see it in a light microscope yeah yeah but that's that that that let me let me google that for you I'm I'm a little bit yeah yeah no that's exactly what it means yeah yeah yeah so it's it's the X chromosome which is more or less packed together it's a very very small weird like when I look at my stream I'm like half an hour behind I hope actually everyone's able to follow the stream like I had some I had a message from Sandra that she actually had some issues going through the or watching the stream so I hope everyone's watching the stream more or less in real time and doesn't have too many issues okay and now I'm back strange all right there seems to be some weirdness going on with Twitch so I've been talking again for almost 45 minutes let me see we are not even halfway through the slides but the last part is all right skirita very good you're still here like thumbs up Florian's still here my moderator is still here I think everyone's still here only some lagging during your galaxy yeah yeah it's because it takes over the entire video card and OBS doesn't really like that so I should have tried that out beforehand but you can't like try out the stream and all right commandos here as well yeah good yeah so let's continue let's do another like five six slides and then we will take a short break again and I think that we should be able to finish finish everything before five all right so back to the mRNA because it's the most important RNA or the most interesting RNA because it actually codes for proteins which according to the dogma of molecular biology are the like most important part because it's the effector things it's the things that things do so when we are interested in when we are interested in in proteins and we want to know how much of a certain protein is there we generally or at least in in my field we don't measure the protein we don't measure the abundance of the protein we measure the RNA expression level right because the RNA expression level if you would take and so if you would measure all the genes along the genome and you would kind of try and figure out how much of each RNA is there and you would get a kind of a map with an overview of how active each part of the genome is and if you know that then there is a very high link to how much protein is actually there if mRNA is produced then in general we assume that the protein which is encoded by this mRNA is also there this is not true in all cases but it's true in enough cases that it becomes really useful so when we when we look at gene expression we are looking at Mike messenger RNA expression and then we want to compare these different levels of messenger RNA expression between different cell types for example or between disease cells and normal cells so normally we use that to estimate environmental or genetic effects on a certain phenotype or we want to find differences in gene expression that could explain for example different phenotypes right because if we would measure all of the proteins it would be really hard because measuring proteins is much harder than measuring micro RNA messenger RNA and had the way to measure how much mRNA is there can be done in three different ways you can do this by using quantitative real-time PCR you can use micro RNAs micro arrays or you can use RNA sequencing so quantitative real-time PCR is very small scale and you use a housekeeper gene as a reference so more or less head this is how they do the PCR test for the corona virus so if you if you think about corona virus corona virus is an RNA virus and when it replicates more RNA is being produced so what you do is you you take a little swap you put it in someone's nose and then you extract RNA from this swap and then what you do is you start amplifying this using a standard PCR technique so you have primers which bind which amplify the RNA and this happens in cycles so every time that you do one cycle you double the amount of RNA within your sample so of course you can really do this with RNA directly so what you first have to do is transcribe the RNA into DNA so hey you get the little sample from the nose you put it in in a in a in a buffer and then you add you you make from this from this RNA you make DNA so that it's more stable and this is relatively small scale because this quantitative real-time PCR you can only do on like a couple of genes but you can't do it on hundreds of genes well you could but then it will take a long time and so if you think about the corona virus there's only one gene of the corona virus that they're looking for and that is the spike gene and there's a very specific part of the spike which is unique to to COVID compared to for example the standard or the other corona viruses which are around and they're having primers which amplify this very specific part so in the end what you get is when you start doing that then of course in the beginning head every time that you double the RNA you also add a fluorescent marker and so here the on the one axis you have the fluorescence so the the amount of fluorescence in the sample and then here you have only on the x-axis you have the number of cycles that you did so in the beginning there's no fluorescence why because the quantity of copied DNA so RNA which has been copied is very low and the more cycles you do the more copies you start making and of course at a certain point hey you will see that there will be this kind of S shaped curve curve and this curve will start going up and up and up until you reach kind of the maximum fluorescence intensity that you can measure and of course the cycle at which this thing starts coming up determines more or less is related to how much RNA there was in the original sample and so you use a housekeeper gene and a gene of interest for example the spike gene from the corona virus and then you start just amplifying both of these genes you measure the intensity of both of these genes and the ratio between these kind of determines how much virus there is in relationship to a normal house keeping gene for example a gene like octene or have octene is being produced in in every human cell so you can see well if there's like more virus than that there is octene then that's bad because then you have a high viral load and you might be infectious while if there's much less viral virus compared to the octene it's it's not that big of a deal but you always get these S shapes and these S shapes you can then compute how much material how much RNA there was originally off a gene in in relationship to the housekeeper gene that you are looking at and of course we assume that the housekeeper is always expressed at a certain level if you think about octene then everyone need or every cell in your body needs octene at a certain level so hey if I'm comparing a cell from me to a cell from Florian or from a cell from commando then we will be expressing octene at a very similar level and then had the ratio of how much viral virus there was compared to this octene then determines or kind of gives an indication of how infectious I am and you get these nice curves and these curves are then more or less calculated to be head so you have a certain detection threshold that you set the cycle at which you come above the detection threshold that is more or less an indication of how much material there was originally and if it comes like two cycles later then there was like only one fourth of the material compared to when you come to cycles earlier and that is because every cycle you have a doubling so two cycles means one in four three cycles means one in eight and four cycles means one in sixteen so so this is how we measure messenger RNA on a very small scale and then have of course when we want to do this on a large scale and we can use micro arrays or RNA sequencing which are able to measure well micro arrays are able to measure almost all of the genes in the genome and RNA sequencing has as an added advantage that you cannot only look at the individual genes but you can also look at the individual transcripts right because you are sequencing the the messenger RNA and micro array still use probes so they go on a fishing expedition so if you measure a gene you don't know which of the five transcripts of this gene is actually being expressed but when you use RNA sequencing you get more information because you can see exactly which transcript of a gene so which axons are in this transcripts and which axons were left out of the transcript and so it gives you a much better idea of which protein exactly is being produced alright so quantitative RT-PCR overview is very similar to normal PCR we will discuss this in lecture 8 together with the life of Kairi Mules we will have QPCR steps and at first you design the primers for your gene of interest then you design primers for your housekeeping gene then you do a PCR reaction with both of these simultaneously and then you determine the relative expression of the gene of interest versus your housekeeping gene so RNA sequencing is is is very similar to DNA sequencing there's just an additional step the additional step is this reverse transcriptase so taking the RNA and making it into C DNA so DNA and then of course had the C DNA is made double stranded and then you add some single stranded tails to to make sure that you can sequence them and then you just perform normal DNA sequencing so RNA sequencing is similar to DNA sequencing it just has three additional steps first the reverse transcriptase step then the step to make it double stranded and then single stranded tails are added has so that you get these these constructs which you can then just perform normal DNA sequencing so so when you do RNA or DNA sequencing you have input and output in text format and I think I already showed you a faster or fast Q file but I wanted to show you again last time it was not really readable so raw data that comes out of a sequencing machine comes out of a sequencing machine like like this so you have the name of the read then you have the base pairs so the different base pairs which have been read then you have a plus symbol and then you get the quality scores for each of the base pairs so these quality scores are encoded using a certain ASCII coding because of course you could not write like 10 or 50 or 100 and when you when you then do the alignment against the reference genome then you get an aligned file which is called a sum format file and the sum format file then has the same reads again as it were in the original fast Q file but now it also has the position in the genome where they are binding so for example at chromosome one at this position this read is is attaching or is is a match to the genome so DNA and RNA sequencing completely identical to each other except for three preprocessing steps to go from RNA to DNA and I just wanted to show you the input formats so the fast Q format like I told you it has four lines for every read that it's being done so the first read starts with an ad character and is followed by a sequence identifier and an optional description then you have the raw sequence letter ATC and G as determined by the sequencer and then you have another line so line number three has a plus symbol on there and it's optionally followed by the same sequence identifier so 99% of the time this is just more or less an empty line with just a plus character and then you have quality values for the sequence in line two and this of course must have the same number of symbols as letters in the sequence and these symbols are chosen in such a way that that quality score is encoded for the sequence alright we're at 55 minutes we just do another slide reads are aligned to the reference genome which is based on sequence similarity we allow for some mismatches and had this allows us to investigate the quality so like which gene transcripts are expressed we can look at the quantity how much of a certain transcript is expressed and the added advantage or another added advantage of RNA sex is that we can have look have a look at snips and insertions and deletions in there in the messenger RNA have for example we can look to see if there are snips in the genome which are also found into the messenger RNA or we can actually see if there is some RNA editing going on so if you look at a transcript it might be that some of the transcript is RNA edited using different ribosomes or using different RNA editing tools which are available to the cell so RNA sec is one of the most accurate technologies to determine how much of a certain mRNA is there and it also allows you to determine are there snips are there insertions or deletions and quantity and quality so which transcripts are there and what is the quantity of these transcripts so RNA sec analysis exactly identical to DNA sequencing analysis except for one additional step this one additional step is to extract the expression levels of the different genes normally when we had the DNA sequencing slides this would finish up with having here calling the variants so looking at the snips and the indels in RNA sec we can also do snip and indel calling but the first thing or the thing that we are most interested in of course in RNA sequencing is to extract the expression levels of the individual transcripts in a gene all right and the rest is of course very very similar to DNA sequencing there's just a DNA to RNA reverse transcript a step at the beginning and the extract expression expression levels at the end all right so again RNA sequencing just like DNA sequencing data is visualized using the integrated genome viewers like we talked about last time and of course how we can see the individual reads how we see here the different amino acids that could have been coded and then here you see how much how much reads there are so just an example from our own research and I think we should stop here have a little break of 10 minutes and then I will tell you more or less our story about how we used RNA sequencing to identify the differences between the Berlin muscle mouse so we have three different Berlin muscle mouse strains so you see the the really muscular one here this is the double muscle Berlin muscle mouse so it has twice the amount of muscle and then we have two other mouse strains which have also got much more muscles than a standard mouse and this is the standard black six mouse and and how we figured out which region of the genome and exactly which mutation in the genome is causing this 8 866 phenotype so the double muscle phenotype in our Berlin fat mice a Berlin muscle mice we also fat mouse but I wanted to talk about the muscle mouse all right so if there are no questions then I think we will do a 10 minute break and then afterwards we will quickly go through the RNA second example and then we will talk about the free microarray data that you can get and we will be talking a little bit about how to do RNA structure prediction to predict the secondary structure of RNA molecules which is part of the assignments for today all right so if you have any questions let me know and I will already start or stop the recording