 Start recording. So welcome everyone. This is the overview for today in case you missed it So history DNA versus RNA classic. Okay. Yeah, thanks commando. Yeah, I know I know I'm not a I'm not a professional streamer I just do this as a side job like it's not my main job. I'm a scientist not a streamer So history DNA versus RNA all of the different types of RNA Microarray expression and if then free microarray and RNA sequencing data, where can you get it? And how do you download it? And one of the nice things is is that like you don't have to ask for like a 300,000 euro grant To kind of do your research, especially if you're interested in like gene expression and gene regulation. There's a lot of There's a lot of stuff that you can get for free And then we will start with the more bioinformatics part and then we will go and do a little bit of structure prediction on RNA Which I think is interesting. I like to do structure predictions a lot I don't get to do it a lot in my normal work But every time that I can do it. I think it's it's interesting to do So yeah, first off, of course, we will do the assignments from last week I hope everyone was able to do the assignment. Just throw in chat if you were able to do it I think that The assignments are a Little bit difficult, right? Like this was the first assignment where there was a little bit of our programming in there And it's not an our programming course So I Hope that people were able to do it. So if you were not able to do it I'm not mad and I don't really I Could not get our to work. All right, and That's difficult difficult what kind of an operating system are you using then are you using Windows or Mac OS X or? All right. Nope. So another person So do you want me to go through how to install our we can I can I can show you how to do it Windows, okay, that's good. So Windows is actually the easiest one Okay, so you used our with our studio Commando says that it was doable, but you followed the our course right commando. So that's Yeah Directory and file was the problem. Okay. Yeah, because that's one of the harder parts in our since when you start our I Got our studio already, but couldn't do it. All right Good, so I can show you guys Well, let me switch to Firefox then So if you want to get our I have to move the window a little bit So if you want to get our you just say our download in Google and then you get here the first download All right, just a source of your PDF is really helpful. Oh the big one that I yeah Yeah, I made it for incoming PhD students. So There's there's a lot of things in there and there's a cheat sheet somewhere on my website as well just a very small like two-page Kind of what commands are there and are and how can you use them? so if you just go here you just say download are for Windows and Then you get how you don't see the pop-up, but you get a pop-up. So you save the file And let me see if I can actually Do that because I have our install, but I think I can just install it over it So I want to so first start the installer for you guys Yeah, yeah the RStudio thing is actually kind of something which goes on top of Of are and like I don't really like RStudio that much Like the only real advantage of RStudio compared to standard Rs that you can go back into the plots that you made previously Let me see if I can actually capture the So I want to capture the whole display I just want to capture a window and the window that I want to capture is this one So when you when you start the installer, I'm gonna make it a little bit bigger And then it looks like this, right? So you say I want to have it in English or in German, which is up to you guys And then it doesn't capture the next one All right, and then you get like this this standard installer for R Which looks a little bit like this and then you hey you have to read through the Through the through the thing you should always do that see if you're signing away all your rights to your first-born children But in this case it's quite okay You just tell it where you want to have it installed like I always installed it on the C drive Then depending on which operating system you have If you have a 32-bit windows, which is still relatively old. Hey Sandra Welcome to the stream So if you still have a 32-bit bit operating system, which is quite old But it's then it doesn't allow you to install the 64-bit one But it's just next and then hey if you want to specify any startup options, so we don't ever want to do that It's just no Creative folder. Well, don't do that I don't like having R in the start menu and then hey it will save the And you can do a quick launch and stuff and then you just press next and it will just start installing R So it's just now copying the files over the files that I already had there Which should all be fine like I'm hoping that it doesn't really screw up my R installation But hey you can This is kind of the in in windows. It's relatively easy to install R So it's just like next next next finish and then it should work and If you want to have our studio then you can then download our studio and put it on top of it Because that's the thing that you need, right? If you want to use the R studio thing then our studio just uses the R program that you have installed So if you have a version of R installed, which is like version 3.6 Then you can still use our studio, but our studio is using the 3.6 R version in the in the past That you installed so so when you then click finish then there should be an R window at the bottom So if you go into your start menu You will have two different R's installed because there will be the R 32-bit version and there will be the R 64-bit version So if you then start R then you get a window which looks like this and then I have to disable file And this is just the standard R window that you get So the standard R window I think when you start it up It gives you some additional text and here you can just start typing in stuff So you can just say well, what is one plus plus five and then it will tell you that six And the nice thing about R is that you can do that with like lists as well So you can have like a list so I can say give me a sequence from one to a hundred going by four Right, so that's one five nine and these things right so you just get a list of elements And then you can actually do like mathematics with that because you can just say well subtract six from From every every element and then hey, it will just do that for the whole list And that's that's one of the advantages of ours that it's really good when you're working with like arrays or lists of numbers So let's clear the screen clear console So that was like that's just the way that you install R So young Young you had problems getting R to work, right? So so did you get it to start up? Did you were you able to get a screen like this or did you totally get stuck in the installer? And it just says like it throws a whole bunch of like windows errors in your face. All right So you're able to get a screen like this. Okay, that's pretty good. That's pretty good. All right So then we can go through the assignment so Let me get my word document Okay, so the first assignment was not using our directly so the first assignment was going to the ensemble database So let's go back to firefox And let's close the our thing so The idea was to go through the through the ensemble database and then to compare the mitochondria Yes, so the little pieces of DNA which are in the mitochondria And you get inherited from your from your mother And then the question was to compare mouse human and zebrafish Hey observe that there's a huge overlap, but what are the differences? So to get to the mitochondria the easiest the way that I always do it is just go to the cardiotype So the cardiotype is kind of what you see when you put a genome underneath a microscope So if you go all the way in the back then the mitochondria are here And hey, I just so when you go to the home page of ensemble, right? So if you just go to the ensemble main page It looks like this and you can just click on your favorite genome So in this case we we click on the human genome and I've already opened up the mouse and the the danio rio So the the zebra fish on the top so So the idea was to to view the cardiotype from for example humans There's different ways of going to the mitochondria, but you can just click on the mitochondria And then you just say well give me to come give me the summary Because we want to compare them kind of globally, right? So I'm getting the summary of the of the human mitochondrial genome And then it takes a little while to have the image load in But the summary is here on the bottom, right? So you can see that it's sixteen thousand five hundred base pairs long And there's 13 genes on there and there's twenty twenty four non-coding genes So heads it's they're called small non-coding genes and there's some short variants Which are generally micro RNAs and transfer RNAs and stuff that you need And if we then look at the mitochondria in in mouse, right? So we can go to the mouse which are already open. So again, we go to the cardiotype And then we have to wait a little bit to Ensembles really slow today. I can't really click on the mitochondria if it's not It's not loading in That's bad. So it's actually not let me let me refresh the page actually Is this the ad blocker? No, my ad blocker is disabled. So All right, I can just Copy the link on the top that I usually use for humans and then we can do the same thing for for For mouse to get the overview and then Daniel Riero we do the same thing So go to view the cardiotype this one loads in really fast You click on the mitochondria and just say give me the give me the summary of the mitochondria, right? So now we have the summary. So here are also the summary picture loads in so you can see here the mitochondria You see the different genes that are encoded. So these are protein coding genes meaning that they they code for proteins Then we have short non-coding genes And hey, you can see that those are more or less scattered all over while the non-coding or the protein coding genes are in a Specific region. So this this first region of the of the mitochondria. It only contains like RNA. So Non-coding genes so genes which do not code for proteins, but are still considered genes And then here you see the variation part So the variation part is how many known snips and indels and stuff are at a certain point So you can see that some points of the of the mitochondria. I don't don't have any Which means that these regions when they get a mutation they are generally non-functional anymore But then the other parts of the mitochondria they there can be small variations So did it load the other ones as well? Yeah, so this is the one for mouse and for humans It still didn't do that. So let's see if we can reload it and get the picture Well for humans the picture doesn't really want to load so But if we then look at the overview of the mitochondria Hey, then we can see that for example in mouse and in humans. We both have 13 genes So this is the this is the human one. So humans have 13 genes on there Compared to mouse. We see the same thing. So mice also have 13 protein coding genes on the on the mitochondria If we then go to the to the zebrafish, we see that again 13 coding genes had the length of the of the of the mitochondria Didn't know that the info was in the karyotype. Yeah, you just have to click around a little bit, right? You could have just searched for MT then it gives you the mitochondria as well But I could have shown you that beforehand, but then like It's no fun like you have to go to the database and kind of get the information from there And then we discuss and then I'm showing you how I how I do that And so if we then look at the at the number of non-coding genes Then has zebrafish have 24 Mice of 24 and human have 24 as well So there's no difference in the number of protein coding genes and the number of non-coding genes And the only major difference is is that there's a massive difference in the number of short variants So the number of short variants is the amount of snips and indels that are Located in the mitochondria and for humans at 3700 but for mice you can actually see that there's only very few short variants And the same thing holds for zebrafish zebrafish have even less And that that has to do a little bit with Kind of the amount of sequencing that has been done So how but the conclusion is is that? Mitochondria are very much preserved So it doesn't really matter if you're looking at mitochondria from humans or from mice They have the same genes on there So it's it's one of the few chromosomes which is very much shared between almost all You carry it lifeforms. So all you carry at lifeforms. They have Mitochondria and these mitochondria in general have 13 genes on there and they have 24 RNAs on there to make the mitochondria work So that was the first question just hit click around a little bit an ensemble and see if you can get the information out that you need Hey, you can you can actually look at some some of these tabs here So you can do like comparative genomics like the syntany so you can click the syntany tap and then it will show you Hey, so you can actually compare the zebrafish directly with humans But then you have to I think you have to select the Use the links in the navigation box to the move to the nearest one Change chromosomes. Yeah, I do want to do that, but I want to not do this little part of I Want to have the whole so I can jump to region and All right So here we're then looking at the region and then you see the different Different genes and you see the different non-coding genes So the genes themselves have names like MT and D1 and the non-coding genes have NC identifiers and then head there's a There's the variants that are then listed on the on the bottom if you look there but had the idea of was just for you guys to Look at the database and struggle a little bit with with getting getting the data out All right, so let's move on to question number two So the question is is what are the differences? Well, the differences are that there are a different number of snips so short variants in in the mitochondria But that the mitochondria for the rest are more or less the same between mouse human and zebrafish So three completely different species, but mitochondria you could probably swap them in and out So you could take a mitochondria of a zebrafish and put it in a human and it will still work perfectly fine And because they are they are very similar All right So then the question was use the ensemble database to download the faster sequence of the mouse gene mitochondrial and a D H the a NADH the hydrokinase one and the official name of the gene is empty and a one So it's the first gene on the genome So what we can do is we can go all the way back and we can say go to go to the mouse And then we can just type in the real name. So it's empty minus and D one And then hey, it will just say it's a human gene a zebrafish gene and a mouse gene So in this case we want to have the mouse gene So we just click on it and then it directly brings us here to the empty and D one gene page So the question was is to use the export data function on the side So if you go a little bit down then here you have export data So when you click export data, you get this overview here and the question was Go for the text output So you can do the tab or but comma separated values But the text output in this case is just a faster sequence And Do we want any three prime and five prime sequences? No, we just want to have the gene Yeah, because it allows you to get like a thousand base pairs in front or a thousand base pairs in the back And the maximum number of base pairs that you get can get in front of a gene or in the back of the gene is like 1 million so and Then head there's this option here to do it unmasked or masked and that means that if you ask for a mask sequence It will take the DNA sequence and it will look for areas of the sequence Which are not unique so which are occurring at different positions in the genome So when we will be talking about primary design, this will be really important So if you are designing primers, you'll want to always do this on a mask sequence and this is because Primers need to be unique. So you want to kind of Block out the parts of the sequence, which are also found in the same animal in different parts of the genome We can then click next And then it will load the content and then it just so you can then select the output format Which is the text output format and then it will just look somewhat like this So it will just give you a single sequence, which I didn't want actually because I wanted to have the That's interesting, okay So I have to hear say select all because I want I want all of the sequences and not just the DNA sequence I also want to have the coding sequence and the peptides and the introns and the axons So we do it again We go to text output and then you see that it gives you different sequences So first it gives you the first transcript of the gene and this is the cDNA So it's the it's the the coding DNA. So the DNA which translates to the protein So you have here the first sequence and then you have the same sequence again, which is for the gene So this is for the transcript then you see the gene sequence and then here you get the protein coding part Which again is the exact same in case of this gene and then here you see the the sequence Which is the amino acid sequence, which is how the how the protein is built, right? So amino acids can be coded with a single letter as well So M stands for methionine and so on so and then of course we have the the same sequence again Which is the chromosomal sequence. So in the chromosome, it's written down like this So this is a gene which doesn't have any introns or axons And that is because it is a mitochondrial gene. Mitochondrial gene come from prokaryotes, right? It's a prokaryotic cell which has been absorbed into a eukaryotic cell So and like we talked about gene structure about the DNA structure last time Where we showed that there's a massive difference between how bacteria code protein or how code Proteins and how you carry its code proteins. So you carry us generally have introns and exons But prokaryotes do not. So, hey, of course the mitochondria since it is from prokaryotic origin It codes proteins in the same way as a bacteria does Although it is part of the eukaryotic cell All right, so then we we just save this in a in a in a document. So There is a way that you can do that in Firefox. It's just file and then save as and then you can save it as All right, so then the next part is after we have this file Hey, you save the file on your hard drive in a known location such as and then hey, you can pick out that for yourself so Let me switch to the R window. So let's close the Firefox one. All right. So here we're in the R window, right? So once we have installed R and that all works I asked you guys to create a new file called Like your answers. So I will show you guys my answer file now I have here notepad plus plus so I always use notepad plus plus. I think it's a nice editor, but you can use any editor that you want So here it's answer. So 3d and a dot txt So when I'm when I'm doing these kinds of things, I always like to put in a little bit of header So that I know what's the content of the file when I open it So this is where I saved the file. So I saved it in DD drive projects lectures bioinformatics and animal breedings bioinformatics course 2016-2017 which was the first time that I did the course and then I put it in the Assignments folder there. So it's a long path in in my case, but you could have just put it on the C drive So then you just do set working directory. Yes, so to move Or to make our move from one directory to another You use the set working directory. So normally when you when you boot our standard You can use the command get working directory to see where you are So in this case, I already set my working directory to this to this folder, right? So if I do a set working directory To for example my D drive I can do it like this And now when I say get working directory, then it says well, you're on the on the D drive And you can do like a dare command I think and then it lists all of the things which are on your D drive So you can see that I have like a folder called bull s which is not bullshit, but it's bull related stuff so cows And I have a couple of other things and I I used to have Three hard drives so a C drive a D drive and an e drive, but then I got a new computer Which only had a D drive so I made a folder called D drive and a folder called e drive Which holds the content of the old, but that's just the way that I organize my computer You of course organize your computer in your own way So you could put it anywhere. So if you only have a C drive, then you could have made like something like C Bioinformatics slash lectures or assignments or something like that All right, so let's go back to where we want to go So I just use the notepad and then I say well set my working directory to here because that is where I save the empty and the one dot faster file So I just set it there All right question. How do you read lines a dot faster file when it's a TXT file? Well, you can you can use read lines on any file even on an image file It will just try to load lines and it is a text file, right? It's just having a different extension. It's just called dot faster In computers computers don't care about the extension of a file the extension of a file doesn't mean anything It's just for us humans to kind of have a handle on what's in there Windows cares in a bit because yeah, if you call a file dot exit and when you click on it Windows will try to execute it But you can save a text file as an executable file with like a dot extra extension and that won't change What is in the file? So it's and if you save the data as TXT you might have to add dot TXT Yeah, but like I said extensions are meaningless like if you're using a linux system then Linux doesn't care what extension you give it you can even make files without an extension and that's perfectly fine Okay, yeah, yeah, okay because yeah Commando that is because you are hiding the extensions in Windows so Windows has an option Which is the hide known extensions? So then it will show you not the real file name That's something that you actually have to disable because that's Making you very susceptible to clicking on viruses because I have you download a file which is virus dot txt dot X Then if you click on it or if you look at the file in Windows Windows will just show you virus dot txt So and then you think oh, it's a txt file So it's fine to double click on it, but then when you double click on it because the extension is hidden It it doesn't allow you to do that. I can show you where that is in Windows Let me open up a thing because I always think that that's one of these things that you should change So let me add a window capture a new window capture. I want to capture this thing So when you just have a standard like Windows it doesn't really matter where you are, but you can go to view On the top here and it doesn't show you. Yeah there so And then you have here file name extensions right show or hide the set of characters added to the end of the file that identify So you always want to have this enabled so to show the file name extensions And then in options you have this change folder and search options Which will open up a little additional window. Let me see if I can capture that window as well All right So you then get something which looks like this right and then you can go to view and then here there's this Hide extensions for known file types and this should be off Because then if you are downloading something called virus dot txt dot x it will hide the dot x Because it knows that dot x is a known file type So this one should always be off Because then it's just hiding part of the file name and you want to see it. All right, so that's good, right? So it's just a tip like you can hide it, right? But like then you don't know exactly what the file name is Um, so hey if you would if you would do that and then you would look at your file again Then it would actually see or it would actually say that the file is called empty and the one dot fast out dot txt um, yeah, so it's it's just You always and then actually this this um Um hide There's there's another one don't show hidden files and folders or drives You do want to show hidden folders and files and drives at least I want to Okay, I totally get the point. Yeah Yeah, so so the show hidden drives and files and stuff is also useful because if you have a virus then viruses usually Live in hidden files So that that's but this is just some general windows stuff like on on linux You know I never have to worry about this because like extensions don't mean a thing. So It's it's all fine. All right, let's close these ones up then and then let's go back to r So in r I want to set my working directory and then load in the file. So I'm going to go to Hide the notepad and then we're going back to r So when we're going back to r and here you can also see some of these hidden files, right? Like the sister the recycler folder is something that windows standard height. This is your like trash bin And then you have this dollar recycle bin, which is another recycle bin And then you have your system volume information, which actually is You shouldn't throw this file away, but you can see it and you can see what's in there So once we've loaded it right then Because we we read in all of the lines in this file and then put it in this variable So we can then just type empty and the one and that will show us all of the stuff which is in the file So this is just what I just showed you on the on the firefox window. It's just the same thing And hey, it has these little dollars here because it actually continues and it it cannot fit everything on the same Line so if you do it like this, right then it will show you dollars and say well I cut off some of the lines because they were too long All right, so this is then the file that we've loaded in so and then we can start doing stuff with this file, right? So the first question that there was is Load in the file how many fusta sequences are in the file? So each sequence starts with this this like Greater than symbol or smaller than depending on how you how you read it So hey, you just go through and you say well, this is one. So this is one sequence This is a second sequence This is three This is four and this is five. So in total I have five sequences in my file So the answer to this question Like I've written down in my more or less cheat sheety thing is there are five sequences So that I wrote it down. I like writing down stuff. So to make it explicit All right, and then I the next step is to use the table function Because the table function is really really useful in r a table stuff So it just looks at stuff which is similar and then just counts the number of occurrences of things So we want to table the variable Just to see first what happens. So let me disable the window captures and then go back to r So we want to table the mt And the nice thing about r is is that when you're coding right and you you type mt, right? And I I already forgot because I don't have the other window open anymore So in r you can press the tab key and it will show you what what you can What you can use right? So These show this shows all of the variables and functions starting with mt So there's a variable called mt cars, which is default in in in r loaded You have m text which is a function to put text on a on a on a figure And you have mt and d1 So I just do mt and d1 and I just type the n and then press top and then it auto completes the thing So when I do this it shows me that well, for example, this sequence here is in this file four times This sequence here is in this file also four times And we see that some of the sequences here like this one is only in this file one time But it gives you a kind of an overview And you can see here that the the five different names that we have now all of a sudden pop up at the beginning And head this one is in there three times I think it's three times one time. I don't know why the three is above the thing It's probably because it's too long, but Let me see. No, so this thing is in there one time But it just gives you a table So it gives you an overview of how much a certain thing is in there And of course, this is not really useful doing it on the whole file So what you can do is for example say well, I take the second line from this variable Which is just the the letter code Right and then I can say well, I can string split this So I can I can split it on every Non character. So this means just split after every character So when I do this then it then it separates the different characters from each other And then it puts this in a list because I I can do this not just for a single line But I could do this for two lines as well, right? So I can say do this for line two Two line three So now it will split line number two and it will split line number three And since it needs to keep these things separate it puts them in a list So this is a list identifier. So the double brackets and this is a but hey in this case In the assignment, we just wanted to use line number two And then we want to unlist this to get the the the thing without having it being in a list So in r this means that this is the first list element And then this is a This is a vector and this vector starts at one Then this is number 20 28 and this is number 55 in the vector And then the vector is like 59 or 60 60 long So if we unlist it, we get rid of this Double quote thing one And that is just because we could have done this on multiple lines But we're doing it only on a single line So we we want to unlist it to get rid of the fact that it's a a vector stored in a list All right, and then we can use the table function again on top of that So we then say well table unlist this thing And then it counts the number of base pairs for us Which is really handy So this first line of the of the file it contains 13 a's 20 c's 9 g's and 22 t's And so you can you can see the it it makes some some sense to do this Yeah, because otherwise we would have counted them by hand But as a bioinformatician, you don't really want to count things by hand You want to use something like r or python or a different language to do the counting for you And of course we could have done this as well for the the first like five lines So we say line two until line six And then how we just say string split them unlist them and then table them and then we can see that there are 98 a's 82 c's yes So this this allows us to count through like big sequences and and get a very very quickly get an overview of how many a c Ts and g's are in there All right, so the question was using the square brackets. We can look into the object and the question is How many a's are in the first line of the sequence so there are 13 a's So here 13 a's so when we take the first line, which is actually in the second line Because the first line of the file is the name of the sequence And then the second line of the file is the first line of the sequence. So 13 a's So you could have counted them by hand but You you don't have to All right, then the next thing is to start using a little bit of programming and a little bit of logic Let me get my notepad window back So had the the the code was more or less given, right? So I just had this is just the exact same code as what was in the assignment So let's just let's just see what it does So we can just copy paste it in and have what it does it it string splits the line That we're currently looking at so you go through all of the lines of empty and d1 so of the variable And then every time that you go through this loop what we do is we then take the line We split it by the individual characters Take the first character and put that as first letter or first character And then we cut so cut is the way that you can print to the r window So we say cut the first letter and then add and enter behind it So when we run this in r It looks something like this And it will just go through the whole file and just show us the first So when we look at the first line, of course, it starts with a Greater than symbol because this is the the name of this thing comes after And then the next one is a g and a g. So these are the first letters of the second line first letter of the third line Yeah, so if you get a plus in the command line testosterone A plus means that the command that you're typing is not yet finished So hey, if you if you see here this little plus it means that here I opened up a bracket But I haven't closed it yet So I can just continue on typing until I close the bracket So in your case if you if you count you see you have table opening bracket opening a round bracket once Then opening a second Round bracket after unlist then you do spring split which is the third bracket But then in the end of your command you only have two of them closed Because you have you have two closing brackets So you you you have to add an additional third closing bracket because of course It needs to be balanced if you have three opening brackets you need three closing brackets as well So it's like it's fiddly right you have to If you're programming it's like it has to be exactly correct It's like doing something in the lab right for getting a single chemical Will make your thing not work So that the same thing holds for programming like for getting a single closing character So if you see the plus in front of it, it means that the command has not finished yet So that means that you're either for for getting a closing bracket Or you're for for getting like a double air quotes somewhere And then here you can type on and on and on If that happens you can actually just press the stop stop button here The stop button will will stop the input. So if I would do something like give myself a variable Which I would store a string in Right like this and I forgot to close it right then I can just continue on typing on typing on typing And if this happens right and I want to calculate like 8 plus 9 Um and I don't get an output then I know oh, I'm I'm still I'm still inside of a command. So then I can just press the stop button and it will just End the command for me because I might have not known what what is the thing that I'm forgetting All right, but then the first letter part that we that we did is just it's just a little bit of coding Um, so you go through all of the lines in the variable And then you use the string splits with the unlist to get the first elements or the first letter of the line And then you can do that right so then in the in the next question the uh There was a little bit of you having to think about it Um, and then had the same structure holds so it's for the line in the empty nda get the first letter And then had the the question was can you decide if something is um empty right because there are some empty lines in the file If the first letter is um a greater than symbol then of course this is an identifier or a or a sequence name And else right if it's not a greater than symbol or if it's not empty then it's actually a dna sequence so had the the Had the the idea was that you had that you can use Programming logic to decide what is in every line of the file so And here I use the lsif structure So to say if if it is not if it is na the first letter then it's empty Lsift when the first letter is is Is equal to the greater than symbol then it is an identifier and else this line contains a dna sequence So it's just a way of of saying what what every line does So I can copy paste this into r just to show you guys what happens And that was more or less it For the assignments So had the ideas to just go through each of the files and then you see that it now kind of identifies So it says well the first line contains an identifier And then there's a bunch of lines which have dna sequence And then there's another identifier then you get a bunch of a dna sequence again And you see here that at some point in the file there's like an empty line So there's a line which has no no characters on there And at the end of the file there's two additional empty lines as well So and the r is just for you guys to to practice. I will never ever ask anything about r on the exam Because it's not an r course. It's a bioinformatics course, but be aware that like I A lot of the assignments will contain some r because I do think that Like you need to be faced with the fact that without being able to program Bioinformatics is not going to work that well So hey, you can only really do bioinformatics if you're able to program And with the with the short tutorial pdf that I showed you last week And then more or less the the coding examples that are in the assignments You should be able to pick up some basics of r To be able to do some of the coding so And that's also one of the reasons why I don't force people to do the homework because like the homework is just for you guys to be To kind of learn how to do things All right, so are there any other questions? Remarks Frustrations that you want to share with me and the other people who are here I have to wait a little bit. I think there's a little bit of a delay in uh in twitch, so But just let me know so It's up to you guys If you want to do more r if you want to have like a single lecture in which I kind of explain the basics of r Then we can do something like that If you say I don't want to do any r. I just want to look at the database and have a much more Going by hand Florian neighbors fault. It's dark and cold. How do you mean it's dark and cold? It's not that dark yet It can be a lot darker outside. So Yes, please. What's the yes, please the yes, please is no more r or um More r or have a lecture about r. Okay, it would be nice to have a lecture on r. Okay, then um We we will do a a small lecture about r about the very basics of r um and um Yeah, we can do that in the regular time. So we we we will put that for next week I won't do that this week because this week we have to do the rna stuff And I really want to get rid of that like I told you I I don't really like the rna lecture I don't like talking about 60 different forms of rna, but uh All right Why is florian actually not a vip guy? Could and I is it possible that you give him vip status or do I have to do that? I think you you should be able you're a moderator So you you could give floriano vip like little diamond in front of his name Although now he really stands out because everyone talking is actually vip and floriano is not so All right, so that was it for the uh for the assignment So let's close the r window and then go back to the power point which is over here Exactly if everyone is vip no one is vip. Yeah, that's true. That's true. We can revoke your vip status as well commando if you want All right, so I've been recording for 45 minutes so we can do a couple of slides before we uh before you take a little break. So All right, so um word in advance. I already told you a couple of times There will be a lot of theory there will be a lot of going through this is snow rna s and rna my rna m rna pre m rna and these kinds of things So just bear with me because we will have some more fun bioinformatics at the end of the lecture So how where can you uh, oh you can't promote people to vip. Oh, that's good. That's good. Then floriano is not not a vip So that's okay You could give him a timeout though Yeah, well the the remark was not really related to the lecture so you can give him like a 30 second timeout or he's not talking that much anyway, so um anyway, so um had the the nice bioinformatics stuff will be at the end of the lecture just so that everyone stays until the end so things like where to download uh dna and rna data or free microarray data And um, there will be a couple of slides and perhaps even a live demo on rna structure prediction And somewhere in between the lecture. I'm also going to show you my 3d engine So I wrote a 3d engine in the d programming language Um, and since we're talking about like 2d and 3d structure of proteins and rna Well, mostly rna this today proteins will be next week. Um, I wrote a visualizer Which allows you to visualize protein 3d protein structure. So crystal structures And allows you to fly through it and and click on stuff or not really click on it, but you can go through it Um, and I wanted to show you guys that as well since I've been working on it for five years or something and It it it does more than just doing the 3d structures of proteins, but I think that's one of the things that looks pretty All right, so a lot of theory bioinformatics at the end All right, so question to you guys What is the central dogma of molecular biology? Since we're talking about like head like the way that I structured the lecture should give you a little bit of a hint What the central dogma of molecular biology is? But I I think that like it's good that that people Know this and it will definitely be a question on the on the exam so you can give an answer now and see if that is the answer that I would give or The answer that would get you the points on the exam so in chat Just throw like what do you think is the central dogma of molecular biology? And no googling well you could but it is the coolest of all biologies I agree I agree I have a master in molecular biology So I would definitely agree that molecular biology is way way better than like behavioral biology or these kinds of things but All right, John Hages says DNA to mRNA as to proteins. Yes. Yes. Yes the best biology No, it's not the biggest biology, but molecular biology is is like It's it's interesting and But the central dogma that we have in molecular biology So when you get taught molecular biology or do a master Then this is always what they teach you right the central dogma is that you have DNA Which is transcribed into RNA which is then translated into proteins So that is the central dogma. So the central dogma is very basic in in So DNA is the carrier of genetic information RNA is more or less the intermediate between the Keep your recording. Yeah. Yeah, I just said to you guys I have I'm recording for like 50 minutes So we can do a couple of slides and then we will have a break. Um, I actually have new Gifts during the break. So we can actually vote on which gives you want to see but Back to the central dogma DNA translation RNA DNA transcription RNA translation protein So if we get the question in the exam, um, then hit the central dogma in molecular biology Is that DNA is the carrier of genetic information RNA is the intermediate So coupling the DNA world to the protein world and proteins are more or less the effector molecules the molecules that do stuff So they they make a cell work And during the lecture we will actually see that this dogma is not entirely correct Because RNA also does stuff itself. So it's not just the proteins that do stuff RNA can do stuff as well And of course we will talk a little bit about the RNA world hypothesis All right, so I think we can go to like two slides of history Um, and I only have two slides of history and then we will take a break So we already saw frederich missioner So frederich missioner is kind of the godfather of of DNA and RNA because he's actually the guy that more or less Discovered DNA, right? He came up with his nucleane. Uh, so the the weird weird protein That he found inside a cell nucleus, which was not able to be cut by proteolytic enzymes Which he then called nucleane And he actually figured out that when he when he did his experiments that there are two different types of nucleane So had there's two different types. So so DNA and RNA had the names were not coined yet In in 1868 But he discovered that when you look into a nucleus of a cell, uh, there are two different Substances in there. That's how he described it two different nucleanes One which we now know is DNA and another one which is which we now know is RNA And they have different chemical properties. So he already had because he was a chemist He looked at the chemical properties of the different molecules and he said well, this is these are two separate and unique molecules Um, so had then In 1959 we have uh, save it or watcha And save it or watcha is actually the guy that in that well not invented But that discovered that there's something called mRNA So hey, he discovered that mRNA is is is the is the Is the thing, um, which informs or directs protein synthesis um, and hey, that's uh He has a really good paper on on how this happens. Um, and In 1960 so more or less at at the exact same time a lot of people were working on okay So we now know that we have this DNA stuff, um, which carries genetic information And how does that this now kind of tie in? Um, and hence so this is kind of in the discovery of the dogma So had when when people were looking into the nucleus and into the cytosol of cells They found that well, there's there's like nucleon and DNA is more present in the nucleus And then inside the cytosol we have normally proteins But we also find a large amount of this messenger RNA there and then using different experiments They they found out that that that RNA or messenger RNA carries the genetic information From the nucleus to the cytosol to make proteins In the 1960s they figured out that ribosomes are the things that make proteins And then in 1965, um, Holly Robert W. Holly Figured out that there is this physical link so that there's something called t RNAs so transfer RNAs That actually couple the messenger RNA with the protein. So we will talk extensively about t RNAs But before that of course in 1957 RNA polymerase was purified and so that allowed molecular biologists to start using RNA Polymerase to to make RNA from DNA So hey, if you would have DNA and then you add RNA polymerase then that DNA is being transcribed into RNA And then in 1983 the year that I was born we have also one of my Heroes in molecular biology Kerry Mules Who invented polymerase chain reaction? So PCR is kind of the fundamental method used in molecular biology to study DNA and or RNA And Kerry Mules is a very very interesting figure and like I could fill a whole lecture just about him like he got a Nobel Prize for his invention of of PCR and It's just a very interesting story, but it's a story for a later time Because otherwise I'm talking for another hour here and we don't don't go through the whole lecture, but during the During the primer design slash PCR lecture, which will come up I will talk extensively about him and about his life and about His theories and I will probably also make you read his paper on time travel Because it's just one of the best papers on time travel I ever read And then in 1989 There was this another big discovery Yeah, he's denying HIV and AIDS and he's a strange guy. Yeah, but he's an interesting guy like um I love Nobel Prize winners like I always say like if you if intelligence is is on a scale It's more like a speedometer, right? So zero is at the top and then you have like your maximum speed on the other side And and people who win Nobel prizes generally are Flipping between the two so they're either in like a genius mode or they're in complete idiot mode And and Kerry mullis is one of these good examples for that, but um All right, the next big step in polimer as a chain reaction, right the fundamental methodology in molecular biology Is that from this thermophilic bacterium thermos aquaticus? A polymerase so a dna polymerase was extracted and this dna polymerase is stable up to very high temperatures Which it has to be in this bacteria, of course because it lives near these hot springs In the in the ocean So hey, it it allows us to do pcr in a In a much faster way than before before we use like polymerases which were not that temperature sensitive Which would function at much lower temperatures which makes pcr much harder to do but with this invention of this polymerase Which can actually work at like 70 degrees Celsius Doing pcr became a lot easier and it became kind of a A common technique which is done in every molecular biology lab around the world So in 1978 there was the discovery that genes are commonly interrupted by introns that must be removed by rna splicing So hey here when we're talking about like the dogma Hey where you have dna rna then this falls into the rna level where people figured out that when you have a long rna molecule Hey that this rna molecule has something which is called axons which are coding for protein And they have introns and these introns they have like a regulatory function But they they do not code for protein So they have to be removed and the process for this removal Was in funded or was discovered by walter gilbert And this is that the introns have to be removed and that this is called rna splicing Another nice discovery in the rna world was done in 1973 and Later again, it was predicted in 1973 to be there and then it was like confirmed in 1984 And that there that is that there is something called telomarase Um And telomarase is an rna template or it's telomarase is a protein which uses a built-in rna template So it's a it's a protein with a little piece of rna inside of the protein And that is that is maintaining the chromosome ends because every time that a chromosome divides The polymerase will copy the dna, right? But Polymerases have to start somewhere so they can't copy the whole dna They can only copy like the the middle part of it and they have to at the end They can't exactly copy up until the end they fall off So every time that a chromosome divides it loses somewhere around 100 to 150 base pairs So that means that the chromosomes become shorter and shorter and shorter and Inside a cell there is this telomarase protein which uses rna inside of it to kind of Stabilize the end of the chromosome so to make these chromosomes ends longer again So we already talked about Barbara McClintock's is dna elements so the the jumping genes the transposons Yes, so the transposons were discovered in 1948 But somewhere around 1970 people realized that many of these transposons actually use an rna intermediate To jump around so again rna very important when we talk about transposons Um, one of the newer discoveries in the 1990s is that small rna molecules regulate gene expression by post transcriptional gene silencing So this is also where short interfering rna technologies are based on so the fact that you can use Small pieces of rna to bind to m rna to kind of have this m rna degraded and to kind of not have m rna produced proteins And this is a common way of the cell to regulate gene expression But it's also a really common way nowadays to regulate gene expression if you want to kind of make a A knockout of a gene you don't have to kind of cut the gene out of the genome anymore And no you can use like a complementary piece of rna Which you then bring into the cell using lipofectamine or other techniques And this little piece of dna will bind to the messenger rna and the messenger rna will be degraded and then In 2001 eddie came up with the With the theory that there is non-coding rna Which controls the epigenetic phenomena? So the epigenetic phenomena is the way that some dna molecules or some Bayspares are actually methylated or not methylated and there are other ways that that base pairs can be changed chemically changed and this chemically changing of base pairs was actually um Is actually an epigenetic phenomena because it's something that is attached to the genome but not part of the genome And this is this is controlled by non-coding rna as well All right, so now we've done one hour of recording. So I will stop the recording and then