 everyone to lecture number three of the bioinformatics course. Today we will be talking about DNA or DNS as they call it in German. So I had a couple of questions, so I added a couple of slides, but for the rest I have to apologize. I was not able to change as much of the lecture as I wanted to because I do this every year, right? I kind of try every year to make it a little bit different and unfortunately I did not really have time this week to really update it as much as I wanted to. So I apologize for that, but let's hope that it's all fine for you guys because otherwise you could just watch the old lectures on YouTube and it would be exactly the same. So just to prevent that we have a little bit of a change every year. New slides, different figures, so let's hope it's different enough from last time. All right, so for today I got a couple of questions last week on the GIT and installing GIT and using Github and setting up the SSH keys. So I wanted to give you guys a very, very brief introduction about the command line because the command line is very difficult for a lot of people unless you grew up in the 1980s, 1990s, because if you grew up back then then you only had the terminal. The first computer that I had didn't have things like Windows, so the first computer that I had was a Commodore, which had these cassette tapes that you would put in and because there was no Windows and stuff that you could click on or like you had to just type in all of the commands and use the keyboard to talk to the computer. And 90% of bioinformatics still is that way, unfortunately, because you still have to call programs, give command line parameters and all of these things. So I wanted to talk a little bit about that. And then we will talk a little bit about the history of sequencing and then sequencing in general. So mostly next generation sequencing. And then we will talk about genes. So we already talked about genes last week, right? When we talked about the phenotypes and Mendelian theory, because in Mendel's theory, a gene is a unit of inheritance, but from a DNA perspective, a gene is a completely different thing. And of course, genes are also different if you're a bacteria or if you're a multicellular organism. So I just wanted to give you guys an introduction to bio or biochemistry, what biochemistry views as a gene instead of how genetics views a gene. Furthermore, we will be talking about regulatory elements, which is of how genes are regulated and how they are expressed. And I will be talking a little bit about other types of DNA. I think I have like two or three slides about mitochondrial DNA and chloroplasts, which generally get too few attention or not enough attention, right? And I think that they're really important. Mitochondria are the power house of the cell. And people should talk more about them, especially in the context of some research that people are that we, for example, are doing. So like I told you guys, we work on the Berlin FET mouse. And there, of course, the mitochondria are really important. And then a few words about biomarkers, although I might have dropped one or two of those slides, but we'll just see. Like it will be a surprise to me as well. I worked on the presentation up until when we started. So it's also not online on Moodle yet. I apologize for that, but it will be directly after the lecture. Okay, so first things first, answers to the previous assignments, which is nice, because I can directly start drawing. And I love drawing. I think drawing is a very important skill when you're a biologist. Actually, when I started doing life science and technology, because I didn't do biology at the university, but life science and technology, which is kind of biology mixed with medicine, we had a lot of drawing things. So people, hey, you had to look through a microscope and draw the different types of cells that you saw. And I think that by drawing things, it forces you to look at the object in a different way. And that helps you to kind of see small changes. And these small changes, of course, are very important in biology. And being able to spot a mutant chicken in a barn where there's 10,000 chickens is important, because these mutants might have properties that are very beneficial. So being able to look at a lot of things and see which thing is different is a good skill. And a good skill like that, you can only kind of develop when you do drawing, or at least that's the way that I work. I'm a visual person. So when I do like statistical modeling and stuff, I also do a lot of drawing. Actually, around me, I have all of these little pieces of paper with like little drawings and stuff on there. I don't know if you can see that, but like, I love drawing. So I do that all of the time. Good. So last week, let's start with the assignments. So the first assignment about the phenotypes was from maps, maps from NDB and phenotypes. And the first question is, in a two point cross, we take a heterozygous animal and cross it with a homozygous animal. So we did the inheritance diagrams for when you have a single gene. But in this question, we want to do it for two genes at the same time. So question one A is, draw the Mendelian inheritance, inheritance diagram for two trades in an AABB, AABB cross, and of course, big A small A. Okay, let me actually get my drawing thing and let me draw one of these inheritance diagrams for you guys. So let's hope that everything goes well. I haven't practiced this a lot. And the annoying thing is actually that the drawing board that I'm using is actually having an update. All right, so let me get rid of that thing, which was just a test drawing. Let me get a pen and a certain thickness. So let's first start by just drawing a inheritance diagram. So we need a straight line like this. Then we need a straight line like that. And then we just have to start writing down all of the gene combinations. So on the first, on the top axis, let's use the easiest individual, which is the AABB individual, right? So of course, this individual can only produce one type of gamete, right? It doesn't matter which type of A you take, it doesn't matter which type of B you take, it always generates a small A, small B gamete. All right, so that's the first parent done. Okay, so then the second parent is a little bit more complex because that parent was a big A small A, big B, small B parent, right? And now we end up with a whole bunch of different possibilities. But let's just run through them. So the first thing that I'm going to do is just say, well, I have a big A and a big B, right? Big A, big B. So that can end up in one of the gametes. So let's just write that down big A, big B. All right, then the next one is still having a big A, and then we take the small B. So big A, small B. Then the next one is having a small A, big B, so small A, big B. And then of course we have a small A, small B. And those are the four types of gametes that can exist. All right, then let me get rid of this line here. Why does that not work? Oh, such a shame. I have to click on it with my mouse. Like I was actually hoping that I could just, there's a like little thing on the top here, which allows me to do gummies, but it doesn't really want to do the gummies. So let me just get the eraser tool and then just erase. Come on, erase that thing. Very good. All right. So of course, now we have to write down what happens, right? So the first thing that happens is if you get a big A from your mother and you get a small A from your father, then of course you are a big A, small A. Right? The next one will be big A, small B. The next one will be you get the big B allele and the small B allele. So the first child will look like this. And of course this child is identical to the mother, right? That means that this is a non-recommanent, right? Because it didn't recombine. It does have the, it does the exact same genotype as the mother. So it looks exactly the same as the mother. All right. So the next individual will be big A from the mother, small A from the father, small B from the, from the mother, small B from the, from the father. So this is a recombinant, right? Because we haven't seen this combination before. All right. Then the next one is again, small A, small A, right? So one A from here, one A from there. Then we get the B from here. So this is a big B. And then we get a small B from the parent. So this is also a recombinant individual. Because this combination of alleles we haven't seen before. All right. And the last one is then quite easy, small A, small A, small B, small B. And this is of course a non-recommanent because this is identical to the father. So this is how the crossing scheme looks like. Of course there's this last column you don't have to. But now if we would want to calculate the distance between these two genes, we would just see how many animals we have in total. And imagine that we have 500 individuals in total. We would count the number of recombinants. For example, we would have 25 of these and we would have like 70 of these. And so then we end up with 100 out of 500 individuals being recombinant, which means that the distance is 1 in 5 or 0.2 Morgan. Also called 20 centimorgan. That's how it works. So that was question 1A, draw the inheritance diagram. Question 2B is why don't we take two heterozygous animals? And I wanted to give you guys an opportunity to kind of see what your opinion was on this question. Like why don't we just take two heterozygous ones, right? Why don't we take two individuals with this genotype? So I'll give you guys some time to type it in and I'll just be sitting here doing a little drawing here off the side. So if you know the answer on why we don't take two, I'm just going to do a little flower while I wait for you guys to answer. Pretty flower. Let's do a little house next to it on the distance house. Of course, the house needs a door. It needs another window. It needs some proof shingles and stuff. Little thingy, a little bit of smoke coming out. Right. So no ideas. Like did people do the assignments? Let me ask that question first. Just in chat, just a yes or no. Did you do the assignment? I won't get mad if you don't do them, right? I also got one question over the email saying that I didn't know what to do with the assignment. So here I send them to you by email, which you don't have to. Assignments are for you guys to practice and to prepare for the exam. The questions that you get on the assignments are very similar to the questions that you would get on the exam. So it's just a possibility for you guys to practice beforehand and if you don't want to practice, that's fine with me. Like it's a master course, so you're not in high school anymore. So I won't punish you for not doing the assignments. That's completely up to you. Right. So did the assignments? Didn't know the answer for that question though. So so any ideas or or any kind of like wild guesses why this would be the case, because I can tell you the answer, right? And then you will go like, Oh, yeah, no, that makes sense. So the main thing is, is that it has to do with the number of possible combinations, right? Like the nice thing about this individual here, a ABB, right, the homozygous one, so the one which has two times the same allele of both genes is that you only get one possible gamete. While for the heterozygous animal, you get four possible combinations. So imagine that you take two heterozygous, then instead of having four different types of children, you would end up with 16 different types of children. And then it becomes a lot more complex. Yeah, yeah, you it's just combinatorical, right? So instead of having four possible offspring groups, by taking a homozygous individual and a heterozygous individuals, you take two heterozygous, then you would have four combinations here at the top as well. And then you would end up with 16 different possibilities. Of course, a lot of these possibilities would be similar to each other. But still it like 16 is harder to deal with than four. So that just it's kind of the strategy of keeping it simple, stupid, the kiss philosophy. So why do it more difficult if you can do it relatively easy? So that's the answer to question B. Alright, so the next one is a three point cross. So again, draw the Mendelian inheritance diagram. I will do that. Why not? Let me get a new empty slide for you guys and just delete these things. I don't need those. So we can do like three point cross, right? So three point cross. Alright, so exact same situation again. Let me move one of the windows out of the way that I can actually see. So again, hey, you can see here from the question that we have one individual which is heterozygous and one individual which is homozygous. So let's start by putting the homozygous on top. So the homozygous will be A, A, B, B, C, C, so all small, right? So how will a gamete look from this individual? Well, this individual can only produce one gamete because it will always pass the small A gene, the small B gene and the small C gene, right? So again, like very, very easy and no complexity on that side. Of course, in the other parent, we have of course a big A small A, big B, small B, big C, small C. And of course now the different difficulty is with C being big and being small that like it might help they look very much the same instead of the A of the B. But let's just start. So the first possibility of course is getting a big A, a big B and a big C. So big A, big B and a big C in the gamete. Then the next possibility we'll get we have is getting a big A, big B, small C. So big A, big B, small C. And then the next one will be getting a big A, small B, big C. So big A, small B, big C. And I think I will run out of spots on this thing. And then we go on and we continue. So big A, small B, small C. And then we have the next one. And now we don't have any combinations here anymore. So now we can just switch to doing small A, right? And now of course this part here, let me highlight that for you guys, we'll just start repeating, right? Because now we have the exact same combinations. So these four combinations will now be coupled to the small A. So it will just be the same. So it will be small A, big B, big C, small A, big B, small C, small A, small B, big C, and small A, small B and small C. So again, eight combinations possible. And of course now we can draw all of the children. So we can then say, well, we have a big A, small A, big B, small B, big C, small C. So this is a non-recommanent because it is similar to the mother. The next one will be a recombinant, of course, which will be big A, small A, big B, small B, small C, small C, recombinant. Again, ABC. So here we have a big A, small A, small B, small B, C, and C. And this one is open. This is a big C, right? And now it's difficult to see why people don't always use ABs and Cs, because the big C and the small C start looking very much like if you just draw them on a board. So this one is also recombinant, recombinant in the B genotype, right? And we can just continue this and fill them all in. And then, of course, we can calculate the number of recombinants compared to the total. And then for each of these combinations, for A and B, we can calculate a distance. From A to C, we can calculate a distance. From B to C, we can calculate a distance. And we can figure out how these genes are located on the genome. If it's first gene A, then B and then C, or the other way around. And that is, of course, the way that we do it. Good. So question B. So to B, how many different gametes can be produced by the heterozygate, big A, small A, big B, parent? And of course, that is eight. So combinatorically speaking, it's just two to the power of three. So that is because every allele has two combinations in this parent. You have three different genes, or every gene has two alleles. So that's two to the power. You have three genes. And if you can just expand this. So if you would have like 16 genes, then, of course, the number of combinations would be two to the power of 16, which is massive. So that's why we only use two point and three point crosses in genetics. All right. So let me close the drawing thing for you guys. And then we can just continue with the other questions. So phenotype databases. So let's go to the IMPC. So let me get the Firefox window open. And let's just go to MPC. So mouse phenotyping consortium. So how many protein coding genes does a mouse have according to IMPC? And this is a little bit more difficult, right? Because it tells you the amount of genes that have been knocked out. But it doesn't really easily show you how many there are in total. But let me see how I actually got that. Because there is this, there is this thing, I think if you search, let me just search for a random gene. So this one, because it should show you the total number or it used to show you the total number. Where did they hide that this time? And this is always difficult because they change the website every year. So let me see, because pbs7, yeah, I know that one. Where did they mention that? Because there always used to be a button which you could click where it would say that we are this much percent done. All right, let's feel the full release. There it is. So here it says the number of phenotype genes, the number of mutant lines that they created. And you see where the total number of genes is centered, the number of phenotype associations, the trends, the data. So it used to be on this page. Did anyone find it? Because it's like the idea is that you guys click around on the database and then figure out how many there are. In theory, we could just go to the genome assembly and just look at the genome assembly, how many genes there are. Although this goes to here. So scaffolds also here on the website that they link to, they don't provide it. So this is a little bit annoying. It used to have a total number of genes and they would say we're like 25% done. So it's a shame that they don't mention that anymore. They do now mention how many mice were generated in each of the different collaborator groups. Anyway, what did I write down? Isn't the number 22,900? All right, where did you get that number from? Because let me see what I have in my answer sheet. Yeah, so I also have 22,924. Just press enter in the search bar. All right. Yeah, that's where it is. Yeah, very good. Very good. Very good. Total number of entries. See, I told you it was easy and I thought it was on the overview page as well. The previous release actually had like 31 genes more. So the total number of entries since the last update that we did it last time and this time. So it went down. So the total of number of genes. So some genes were not classified as genes anymore. All right. So next question. Question, how many protein coding genes have been knocked out and completed the phenotyping pipeline? So this is something that we can actually see. So 7,824 have been knocked out and have been submitted to the whole pipeline. That used to be 7,022 last year, just as a comparison. All right, next question. How many genes are associated with respiratory disease and have been completely phenotyped? So we can just go here. We go to phenotypes, of course, and then we go to respiratory disease. Respiratory disease. So they also decrease respiratory. Interesting. They actually removed the whole respiratory, oh, abnormal respiratory function. So they renamed the thing. That's a little bit annoying. But for the abnormal respiratory function, which used to be called respiratory disease, there would be, let me see, so 27 now. Last year it was 26. So there have been, so they tested 1,202 genes in total for respiratory disease and of these, 27 were significantly associated. And a lot of genes see the database. Let me see, because I don't understand why respiratory disease is actually gone. Why did they rename it? Because if we search for respiratory, then you also get like heart rate variability, which doesn't really make sense. So I think they just renamed it. We can, of course, like you could also say that, well, disease is a respiratory system phenotype, but respiratory function, I think, is what it is renamed to. So abnormal respiratory function, which would be a disease, but respiratory failure. Interesting. So they just renamed it. That happens. So things in the database change. And of course, we could just go back to an older version of the database, because they do provide the older data releases as well. So in theory, you could go here to the older releases and then look it up in an older release. All right, but at least 27 phenotypes have been associated with abnormal respiratory function, previously called respiratory disease. All right. So the next question is, what effect does ALG13 gene knocker have on fat mass of a mouse? So we can just search for that gene. So if we search for ALG13, imagine that that would be my favorite gene of interest. Then, of course, the question would be, what does it significantly affect? So we already know that it significantly affects the fat mass of a mouse, because that was in the question. So here we can go to body size and these kinds of things. And then it says that decreased lean body mass and increased total body fat amount. So the answer is just basically that knocking out ALG13 will increase the fatness of your mouse that you are looking at. All right. So that were the IMPC questions. Oh no, there's still two questions more. So imagine that you have measured heart weight in a knockout model, right? Imagine that you knocked out a gene yourself because hey, you're doing a PhD and your PhD project is about knocking out a certain gene and you successfully do that. Then how many animals should you contribute back to the IMPC database before they mention your phenotype? And again, this is a little bit hidden. So if you go to data, then it says getting started, blah, blah, blah, all really nice IMPC pipeline and here it says sample sizes. So roughly 14 homozygous knockout mice, seven females are phenotyped for each gene. So that means that if you have your own knockout mouse and you want to contribute your data to the IMPC, then you should provide the phenotyping data of seven females and seven males at the minimum. The next question is, is it easy to contribute your own information back into the database? And the answer to that is not really because your university first needs to become a member of the consortium. And then you still need to follow their exact pipeline description. Plus a lot of the information is kind of hidden. So it is not easy to join the IMPC club. But still the data that they generate is available freely. So that's good. All right. So the next question was about OMIM. So online Mendelian inheritance in men. And then the question is, which gene causes colorblindness in humans? So we go to search for colorblindness. And then we have different types of colorblindness, of course, right? So we have the Dutron series and the Proton series. And one of them is the red green color blindness. And the other one is the yellow one, I think. So this is the green color blindness. So if we look at green color blindness, I don't think I specified green color blindness in humans. Then you can see that green color blindness is controlled by a gene called OPN1MW. That's the gene slash locus. Of course, this gene is causing, because mutations in this gene are causing colorblindness. And this is because they encode the cone, so the back of your eyes as cones and rods. And the cones are mutated when you have a mutation in this gene, making it so that you cannot see green. And green looks like red. All right. So we can then look at this gene and say, well, okay, so this is the gene that we were interested in. Then we go back to IMPC, right? Because, of course, this gene might have other effects, not just cause colorblindness. So the question is, does IMPC know anything about it? Actually, yes, they know about it. Because they say this is the name. And they have produced ES cells. But there is no phenotyping currently planned for a knockout strength. So unfortunately, this gene is, they are aware that this gene exists. They also produced embryonic stem cells with a knockout mutation. But these embryonic stem cells have not been implanted into female mice yet. And they are also not planning to do this because no phenotyping center is interested in doing it. So has this gene been phenotyped? No, if not, what is its status? Well, its status is that ES cells have been produced. So embryonic stem cells containing the mutation are there. But there is no plan for phenotyping at this point in time. All right. So had just a couple of questions to get you guys to look into the IMPC database, look into the OMIM database. And of course, if you want to know more about green color blindness, then of course, there's a lot of information in OMIM. And that's one of the nice things about OMIM is it gives you a really, really good basis to start the rest of your research on. So they have citations. And I think that in this case, there's a lot known about color blindness. And you can see actually that some of the citations actually go back to 1845. So it really gives you a good basis on what has been done in the last like 100, 200 years in relationship to color blindness. Of course, not every Mendelian disease goes back that far. But yeah, color blindness has been known for a long time. Good. All right. So let's switch back to the presentation. So those were the answers for today. All right. So the overview of today, I wanted to I added two slides about the command line and the terminal. And I wanted to show you guys what you can do and where some of the things went wrong with some of the people trying to, for example, clone the repository. Some people had issues still with generating their SSH key. And again, I want to reiterate if you have any issues getting hit installed, getting your SSH key to work or getting a clone of your own fork of my repository, then definitely contact me. Just send me an email and generally I respond within like an hour and a half depending on what time it is. And you just get an email back with either a link to look here or these kinds of things. But I think it is very important that people learn how the command line works. And then we will just do the rest. So sequencing, sequencing, genes, DNA, so more or less everything related to DNA. So I think for a lot of people nowadays, the command line is more or less like this screen of the matrix, right? Because people know that it exists. The command prompt in Windows or the terminal in Linux, although people using Linux generally use the terminal a lot more than people in Windows. But in Windows, almost no one uses the command line because it is not necessary for 99.9% of the things that you do. But of course in bioinformatics, a lot of things go via the command line. If I do sequence alignment, then I have to call this program using the command line because there is no graphical user interface because you could not even load in the file size. If you have a file which is 500 gigabytes big, then loading it in and displaying this file is almost impossible because you need massive amounts of memory. But the command line itself, and let me just open up my command line. So if I open up my command line on Windows, then it looks like this. So on the top, I have my command line. So that is here. And on the bottom, I just have my file explorer, right? Which currently is looking at my OneDrive folder, right? So the thing is that you can do everything in Windows just using this window, right? It used to be the case that you would only have a window like this. So your computer would boot up. And then if it would have boot up successfully, it would say something like this, see double point slash and then the larger than symbol, right? And that was all you got. And then there was nothing to click on. There was nothing to like double click or, well, my first mouse actually didn't have a right mouse button. But here you can kind of do anything and you can talk directly to the computer. So first off, there is this, of course, structure in your computer. So if I go to my PC, right, then you can see that my PC actually has like a whole bunch of folders here on the top, which we're not really interested in because those are shortcuts for Windows. But here we have, for example, the C drive, we have the D drive, and I have my CD-ROM player. So I still have a CD-ROM player in my... Well, it's a DVD player. I still have one in those. And a lot of people don't have that anymore, right? So the terminal here is at the root of the C drive, right? So if I want to see what is on my C drive, then I can type the command deer, right? So the deer command shows me the listing of the current directory. So it's just a listing. It shows that there's a directory called adb, there's a directory called bio info 2021, which is the GitHub repository. There's also a GitHub folder and a whole bunch of other folders. And here you see that this thing here, startmenu.ini, this is a file, right? Because it doesn't say deer and it has a size. So it's 202 bytes big. So this is nothing more than what you see here, right? So here you see the exact same folders. And you also see the startmenu.ini file. And so it just gives you the information back. The same information here is displayed graphically. So if we now want to move into one of these folders, for example, we want to move into the bio info 2021 folder, then we can do that very easily. We say CD, which stands for change directory. And then we just say bio. And then I just press top, right? Because I like you don't want to type everything. And the top key is the auto fill key, right? So as soon as I press top, it starts looking like what is there with AB. So the thing, of course, is if you have multiple folders, then it will just start cycling through them. So I have a folder which starts with a B called bio info 2021. And there's also a folder called brother. So if I say CD, so change my directory to bio info 2002, and I press enter, then I enter this folder. So what happens is here, it would be the same as double clicking and going into it, right? And here I can then do it there again to see what's there. And you see that there's only one file, which is called the read me file. So if I now want to go back, right, because there are two ways to do that. And I can say CD, right? So change my directory. And then I just do dot dot. So here you also see that every directory on the hard drive, like the bio info, my directory, it has a directory called dot and a directory called dot dot. And the dot directory is the directory that you're currently in. And the dot dot is the is the parent directory. So the directory above. So by doing CD dot dot, we go into the dot dot directory, which is actually going back to like one up. So we go back to the C drive in this case. So you can change your directory, but we can also make directories. So I could also go and say, well, go into the bio informatics folder, and then do a make dear. So mkdir stands for make directory. And we can say make a directory called code or something or SRC for source code, right? And as soon as we do this, then you see that here in my terminal, it made a new file, a new directory called source, right? Then we can go into source, right? And then I have to double click here to do the same thing here. And now it's loading loading. And now if I wanted to create an empty text file, I could do that in several ways. So one of the ways that I could do this is just to say touch and then main dot r, right? And now you see that it creates a new file called main dot r. And this file is completely empty. There's nothing in there. But it just says touch this thing, right? And by touching it, it creates the file if it doesn't exist. I could have done it in a different way. I could have just cut it or echoed some information into this file. But this is the way that your command line works. So you have to be aware that you cannot do everything from everywhere, right? Because some of the folders on my hard drive are not owned by me. Because as a user, I am just Denny, right? But if I look at my C drive here, then of course, I have also folders which I did not make, like the Windows folder. Oh, you guys can't see that. Let me take it a little bit smaller so that we can scroll down. So here you see, for example, the Windows folder. And of course, the Windows folder is not owned by me. So if I would go into the Windows folder, right, then I'm allowed to do that because I, as a user, I am allowed to read what is in the Windows folder. But I'm not allowed to make anything here. So if I would try to make a directory, then, and I didn't test that, right? So I just say make directory test. Then it says access is denied. So some people, when they were trying to clone the Github repository, they got an access denied. And that just means that they try to clone or execute the Gith clone the URL of the remote repository in a directory where they didn't have any access. So the way to solve that, of course, is then to say CD and go somewhere that I know. So we can do CD tilde and tilde that doesn't work under Windows that works under Mac. But then you can just say CD dot dot. And then you can say CD users, CD aren't because I'm called aren't. So I'm using my last name as my username. And you always have rights to do anything here, right? Because this is my own folder, Windows created this for me. And I'm the like administrator of that folder. So I can always make stuff here. But it depends really on your Windows setup. And if you have multiple people using Windows at the same time, where you can make files. But remember, when you want to clone a Github repository, you can do that always in your C drive users aren't. All right. So one other thing, if I want to move from the C drive to the D drive, right? Because I have a C drive and I have a D drive. So if I want to go to the D drive, I have to, I cannot just simply do CD, because CD is just change directory. But I have to explicitly say that I want to change to a to a different, different hard drive. And you can do that by doing just the name of the hard drive and then adding the double point. So I can just do D double point. And then it will switch me to the D drive. And the nice thing is it will remember where I was when I go back. So when I say C double point, you will see that it remembered that I was in the users aren't folder. So it's just jumping from one folder on one on one of the hard drives just to the root of the other hard drive. And here again, if I want to see what's what lives on my D drive and can do it there, and then you see all kinds of files and folders that that are there. So good. That's the thing that I wanted to tell you guys. So there are a couple of good cheat sheets online. So for Windows and for Windows is slightly different than for example Linux and Linux is again slightly different from Mac, because sometimes they use different names for the same commands. So and of course, you can you can make new files, you can delete files as well. Not just that, but you can also access your printer if you wanted to because like I said, the command line can do anything that Windows can do as well. And it's just a way of directly talking to your computer. And it will always execute your commands. And it gives you feedback. All right, so let's close this down. So I hope the command line is now less of a of a matrix habit. There's a one to one relationship to where you are on the command line, and where you are on your hard drive folder when when you would be clicking. So I made a little slide for you guys. So CD and then the name will change the directory. So we'll go into the directory. If you want to make a new directories, it's called mkdir. And then you have to give a name. This will make a new directory. Dear will list the directory content. D double point will move to the D drive, C double point will move to the C drive and E double point will move to the E drive. And so it's just the name of the drive double point. If you want to remove something, you can use the RM. I would not do that. Generally, when I remove stuff, I use the I use the explorer window just to make sure that I know what I'm deleting CD dot dot is moving up. So it goes into this magical hidden folder that every folder has. So it just moves up one level. So it goes from see user items up to users. If I want to go all the way up, then I can use CD slash. So CD slash will move to the top most folder. So if I'm on the C drive, it goes to the root of the C drive. If I'm on the D drive, it will go to the root of the CD. So you don't have to type CD dot dot 20 times if you're 20 directories D. All right. So I hope that that gives you guys a little bit more understanding of how the command line works. And I looked up a couple of cheat sheets. And mostly these are the things that you need to know. There are a lot of commands. But these commands are the most basic commands that are needed. If you want to move around on your hard drive. And of course, if you want to call a program, you can just type the name of the program. And that should work like it, right? So if you just type it, then it looks through your hard drive finds the executable and then executes it. All right. So let's start the lecture. So just as for my understanding, since everyone here is a biology student, can you guys name me what the four biochemical parts of life are? And if you get all four of them, then we can do a small break. So just to get you guys a little bit involved and typing in the chat, what are the four biochemical parts of life? And of course, we're discussing one right now, right? Like the lecture is called DNA. So of course, DNA is going to be one of the four biochemical parts of life. Like just as a tip. So that means that there's three left for you guys. And if you paid attention to the first lecture, we actually went through the different biochemical levels. And also the lecture series is structured in that way, right? We start all the way at the top with the phenotypes, which is of course not a biochemical part. That's just something which is observable. But then of course, you have the four biochemical parts, which are underneath. And we start with the lowest level. So the lowest level is the DNA, which holds the genetic information and which encodes things like genes. But there are three different levels before you go from the DNA all the way up to the phenotype. All right. So you can see everyone's still asleep. No answers in chat. Yeah. So I'm going to do the cricket sound for a little bit. Okay. Just Sophie, first answer. Not sure. Carbohydrates, lipids, nucleic acids and proteins. All right. That is 100% correct. Well, not the carbohydrates though. But you have proteins, right? Which are the kind of workhorse of the cells. So everything that happens happens because proteins make it work. Lipids, of course, are a very fundamental part because they keep a cell a cell. Then we have polysaccharides, which can also be called carbohydrates, but those are the sugars. So the sugars are a very important part of life as well. They have things like ATP and these kinds of things. They are not proteins. They are not lipid, but they are more or less like combinations of sugar molecules. Besides that, of course, we have the nucleic acid. So the DNA and the RNA and those are the four levels. So just Sophie, perfect. All right. Been talking for 50 minutes. So let's just do one or two more slides and have a small break. So sequencing, which we will be talking about a lot, right? Because I hope that everyone knows what DNA is. But so the lecture will focus more on the bioinformatics part of DNA. And of course, as a bioinformatician, it is your task to figure out what the sequence is, in many cases, right? So DNA sequencing is the process of determining the order of the nucleotides and the four nucleotides that make up DNA are adenine, guanine, cytosine and thymine within a DNA molecule, right? So a DNA molecule is like this double helix. And there is an order in which the base pairs occur. And DNA sequencing is finding out the order of the different base pairs. And that's one of the core tasks of a bioinformatician. So sequencing itself is used nowadays almost everywhere. So it's used in diagnostics. Well, not for, well, in SARS, for example. So if you think about COVID, right, and you have COVID, and you go, you test positive, then you do a QPCR test to confirm that you're positive. And then afterwards, if you end up in the hospital with a very severe case, and probably the hospital will sequence the version of the virus that you have, so that they can know if you were infected by the Delta variant, or if you're having COVID original, or if you have like one of these new or new, but if you have one of the mutants, right? So in diagnostic sequencing is used a lot. Of course, sequencing is also used in biotechnology. And with biotechnology, I mean the production of chemicals and the production of medicine. Also there, sequencing is a very important part. Furthermore, we have forensic biology. So if you ever plan to work at a, for like a national intelligence servers or a police unit, then they employ sequencing as well. Because if they find some traces of blood on a crime scene, then the first thing that they will do is make a DNA profile, or just sequence part of the DNA to figure out what exactly was the, whom is the perpetrator and whom is the victim, right? So DNA is used a lot in forensic biology. And of course, in virology, sequencing is also used a lot. Because hep viruses and also bacteria, they are, there are literally thousands and thousands and thousands of them. And some of them look very much alike, right? If you would take a photograph of COVID and you would take a photograph of one of the other coronaviruses, then from the photograph, you cannot see the difference. Had they're, they're both looking like this. So to know what, which virus you are dealing with, if it's a harmless coronavirus or if it's a very dangerous one, you need to sequence to figure out if it carries any genes or if it carries any genes that code for proteins that might be very dangerous, right? So virology and also in, in microbiology sequencing is used a lot to determine which organism we're dealing with. And the same thing holds for in the rainforest, right? You have people that go into the rainforest and collect just little samples of poop, and then they sequence the poop to see which animal made the poop, right? So to check biodiversity and see if, if animals are still alive and not extinct. So sequencing is a big field. So some terminology about sequencing, because if we talk about sequencing, then we always have something which is the reference sequence. So we always make things relative to this reference sequence. So the reference sequence for mouse is the C57 black six J. So this is a very specific mouse, which is an inbred mouse. So this mouse was originally in the laboratory. And then they started mating brother brothers and sisters together. And then in every generation that afterwards they continued mating brothers and sisters up until a point that the DNA stabilizes, right? Because if you think about this inheritance diagram, of course, when you, when you have two animals which are very genetically related, right? Because they're brother and sister, so they already share half of their genome. If you mate them together, right, then the offspring of those will share even more of their genome. And if you take the offspring and you mate them together, then every step will make the genome more homozygous. So after 20 generations, right? So in the first step, you have 50% equal, then they become 75% equal 87.2, 87.5 and so on. And after 20 generations of inbreeding, what you see is that you have, or you create animals which are more or less genetically homozygous. So the two DNA strands that they have within their cell are exactly identical. And their children will have, again, the exact same DNA strands as their parents. So once you sequence one of these animals, right, because these animals are kind of immortal, because their children are clones of the parents, so you can just continue the population forever and ever, and the mouse won't change because the genome has no variants left. So one of these mice, the C57 Black 6J is the reference sequence for mouse. And for you guys, a small question in between, the human reference sequence, who is the human reference sequence? Because we had to pick one human to be the reference, right? Because every human is slightly different. So if you want to have something where you want to fix every variant on, right, so every change towards the genome sequence, then you always do it relative to something. So who is the human reference sequence? And that's just a sneaky question, right? And I don't know that a lot of people know this, but the human reference sequence is actually Greg Venter. So Greg Venter is the human reference. So all sequencing that we do nowadays, when we say you have a mutation, that means that at this point in your genome, you are different from Greg Venter. And Greg Venter is our more or less average standard human. That's how we just decided. So at a certain point when the human genome project started, they needed some guy to be the reference and Greg Venter was chosen to be the reference. So when we talk about the reference sequence, we can also talk about changes from the reference compared to other people, right? So reference relative to the Black 6 mouse or relative to the Greg Venter human. So when we talk about variants, we have something which is called an SNP. So an SNP is a single nucleotide polymorphism, right? So here we have a small picture which shows you what a SNP is. And so everyone has the same sequence, except for this single base pair in the genome, where some people might have an A, other people might have a G, and some people might have a T. And this is called a single nucleotide polymorphism, because only a single nucleotide is polymorphic. So it is varying between different humans. Furthermore, we can have things which are called indels, which are insertions and deletions, right? You could have a big piece of chunk of DNA which Greg Venter did not have. So we call that an insertion. And you could be missing part of the sequence that Greg Venter had and that you don't have, right? So then that is a deletion. Furthermore, we talk about reads a lot when we talk about DNA sequencing. And by reads we mean the base pair, the order of the base pairs one after another, but also the quality of each base pair, because a read, so every base pair in the read that you get, right? So the sequencer gives you a sequence, but it also gives you a quality score for each of these reads. So reads are the base pairs plus the quality score of these reads. And furthermore, we have something which is called alignment, and that is the process of matching reads to the reference, right? So if I get a read, ACTG, GTGA, right? And then I look up where this thing matches in the reference genome, then what I'm doing is aligning my read to the reference genome. All right, so I've been talking for an hour. So I will do a short break. Merida Walker, thank you for following me. Thank you. And good luck on building up the 2000 points to get a personalized drawing or a slide in a different language. All right, so I will stop the recording. So if you're watching this on YouTube, we'll be right back, because on YouTube, I generally post them one by another.