 So welcome back everyone. We will do a quick microarray example. So I told you about microarrays. And microarrays are one of these major inventions that actually forced bioinformatics to a new level. Don't forget, no, I already started recording. I said that when I was muted, so, no, sorry. Okay, so microarrays are a tool and they are one of the main tools nowadays for analysis of gene expression. We also have like RNA sequencing nowadays, but microarrays are still, still very useful and they kind of highlight, oh, you were watching a commercial due to a page refresh. Yeah, yeah, I don't like the commercials, but like it's like, yeah. I can't really do anything about it. Twitch needs to make money as well, although they're owned by Amazon, so they have more than enough money, you would think, but they kind of don't. So, and the nice thing is, is you're watching commercials, so that means that you are not the product. So on Zoom, which is free more or less, right? You are the product. So anything that you say on platforms like Zoom, they record that, even though they say they don't, but I don't trust Zoom at all. It's from Chinese origin, so. Anyway, so that's why I do it on Twitch. So, gene expression and microarrays. Let's get back into the flow of the lecture. Microarrays can measure the expression level of more than 20,000 genes in a single experiment. And that is a major advantage. Like in the 1990s, beginning of the 1990s, we could only measure like five or 10 genes at the same time. And when I started doing my PhD microarrays were miniaturized already a lot, but they were shaky. And nowadays, microarrays are really, really good. So there's two types of microarrays. We have one color microarrays versus two color microarrays. So one color microarrays allow you to measure a single sample. And two color microarrays are generally used in kind of a case control experiment. So you put normal tissue on there, and for example, cancer tissue. And then the microarray with the color shows you which gene is highly expressed in the one and lowly expressed in the other one. So how does this work? So we use a fluorescent dye, Psi 3 and or Psi 5, depending if you have a one or two color microarray. And then we have a signal intensity. So the intensity of the dye is read out by a scanner. So a little bit of an overview on how this worked. So we have the annotated genomic structure. So we have, for example, the human genome, with its 20,000 genes. So the first thing that we do is decide which parts of the genome we want to measure the activity on. So to do that, we make something which is called PCR products. So we amplify probes. So we define the regions that we want to target. These regions are done using PCR. We take out these regions from the genome and put them on these little glass plates. Then we have two samples, for example, normal lung tissue and lung cancer tissue. And then we extract the RNA from this. We color the RNA using Psi 3 or Psi 5. And then we take both of the samples, we mix them together, and we put them on the microarray. There is then a laser, which are actually two lasers. So one measures the red channel. The other one measures the green channel. And in the end, we get raw images of each of the channels, which are then using computer science or more or less tools put together to create the overview of a microarray picture. So the microarray here, if a dot is green, it was highly expressed in sample one. If a dot is red, then the gene was highly expressed in sample two. If it is yellow, then it's not interesting because there was no difference between the two samples. And if it's black, that means that the gene or the probe was not expressed in any of the samples, in any of the two. So that's kind of how a microarray works. So we define which areas of the genome we wanna measure. We then take out parts of the genome using PCR and then put them on little glass plates. We then put our samples on. So I always say that microarrays are kind of a fishing experiment because here you're kind of fishing with little DNA probes to see which sample has which genes that are expressed. So the workflow for a microarray is more or less the following and all of the bold parts are parts where bioinformatics is involved, right? So the first part is to create these oligo arrays. So to create these arrays with these little DNA probes on there, a DNA probe is also called an oligo. And of course there you need a bioinformatician to take the whole genome sequence, figure out where the genes are, and then cut them up and make a microarray. So even nowadays, custom microarrays still need to be made. We recently did an experiment and in that experiment we did ice on microarrays and we were really interested in a regional chromosome three. So the commercially available microarrays did not have as many probes there as we wanted. So the thing that we did is we kind of saturated our region of interest with hundreds and hundreds of additional probes. And of course when you put more probes in one region you have to remove them from the other one. So it's always a cost benefit analysis, right? Because we're interested only in this region so we saturate the region with probes. Of course the acquiring of samples is generally done either in a mouse house or in a greenhouse. So that's where people, biologists work, they do their experiment and they extract their samples. They extract and label the DNA from the samples that they collected. This is then hybridized to one of these microarrays that were designed. And then of course scanning occurs. And already at the scanning point bioinformatics starts becoming more important because you need to know how the dyes are reacting to the different lasers. So there's kind of not all dyes have the same dynamic range and so also there bioinformatics is already involved in how to build better scanners and miniaturize it even more. Of course after you scan these microarrays there's a lot of data that comes from it, right? If you're measuring that because the machine just takes kind of a picture and this picture is stored in kind of a diff format but these pictures are huge because these dots are tiny. So you have images which generally are like 5 million pixels by 5 million pixels which generates images in the order of like 100 to 200 MBs. So this data needs to be stored somewhere because of course you have to do something with the data later on. And you wanna keep it for later as well. The next step is of course data normalization. So data normalization is something that we will talk about more but also there of course statistics is used to kind of get rid of unwanted variation. For example, there's a little air bubble on one of the microarrays or there's a hair that got into the machine. The next step generally after data normalization is to do gene expression clustering because we're measuring 20,000 genes, right? We're not interested in a single gene but we want to see if for example a whole pathway of genes comes up, right? If you think about plants we might be interested in the glucosinolate pathway. If you think about mice we might be interested in for example the glucose pathway in the liver or if it's some pathway in the brain that is highly expressed in people with Alzheimer's versus lowly expressed in people that don't have that. And then of course the next step is data interpretation and there bioinformatics is responsible for searching through all of the available literature using automated literature scanning to see if we can kind of interpret our data and see how our data fits into all available knowledge that people have collected in the last dozens of years. So just a step by step going through so creating the arrays, a bioinformatician determines which sequence are measured. So humans have around 25,000 genes, a little bit more than was determined in the human genome project because we know now that there's also things like genes that do not code for protein but code for things like microRNAs or other types of RNAs which are biologically active. Each gene in the human genome is like 10,000 to 15,000 base pairs long. So if you multiply these numbers together you get an idea of the amount of base pairs that we could measure but we only have probes which are like 20 to 40 base pairs long. So we definitely have to make a decision on what we want to measure and what we can't measure because we can't measure everything. Even though microarrays are really good they still have a limit. So there's around 5 million probes that you can put on but 5 million times 40 is not the same as 25,000 times 15,000, right? So we do have to make a decision on what goes on there. So using bioinformatics we design these probes and we estimate the performance of these probes beforehand because not every probe will work because of temperature, sensitivity or like too many A's and too little C's so the binding is not good enough. And so this estimation of the performance is something that a bioinformatician does together with the design of the probes. So after the microarrays are scanned which is kind of what we get here. So this is a real picture of a two color microarray and this picture is really, really huge. Like it's tens of thousands of pixels by tens of thousands of pixels if it not a couple of million. And but if you zoom in on a two color microarray you see that there are very, very tiny dots and each of these dots have a certain color, right? I told you that green is high in one sample red is high in the other and here we see kind of a one color microarray. So that just has green and then the green intensity the more intense the more a gene was expressed and you don't get any information on it's not a sample one versus sample two. But hand microarrays are scanned and then of course we get these pictures and these pictures that need to be converted into intensity values because like we can't just use the picture. We have to have a matrix with numbers. So how does that, how do we do that? Well, we go from this picture using several software tools into something which is called kind of the gene expression matrix. So generally on the rows you have the different genes and then on the different columns you have the different samples and in the middle you have the gene expression level. So generally this fits in Excel depending on how many samples you have. Like 10 years ago Excel only allowed to have like 16,000 rows and there was always an issue so you couldn't use Excel so you had to find a different format and generally these are big comma separated files which just for every gene and for every sample have the gene expression level in a certain range, right? So the next step then is to normalize and why do we need to normalize this data because if microarrays are so good why are they not perfect? Well there's always some kind of non-biological variation per samples, right? One sample we just pipe it or one microarray we just pipe it a little bit of more RNA on the whole microarray compared to the other one, right? You can pipe it exactly 10 nanograms of RNA. Sometimes you pipe it 10 comma one, sometimes you pipe it 9 comma eight, right? So this little variance in the total amount which is put on needs to be normalized away. There's also the problem that there's a different dynamic range of different scanners so we can't just use two different machines and then expect the results from the first machine to exactly match the results of the second machine. And of course if you're doing gene expression on like 300 or 500 individuals then not all individuals are scanned on the exact same machine and on the exact same day, right? Because also these machines are sensitive to things like environmental temperature. If the temperature goes up by two degrees Celsius it changes the ability of DNA to bind with other DNA so at a temperature but also the humidity has a massive influence on the intensity of the arrays and all of these effects are something that we are not interested in. We are interested in which gene is different in cancer tissue versus normal tissue. We're not interested in like how the temperature of the room affects our results. And so there's many different algorithms that exist to normalize microarray data and we'll be getting to that in later lectures. So in later lectures we will be talking more in detail about microarrays and which tools and which algorithms there are to analyze this type of data and to normalize away non-biological variants. And of course this also has a risk, right? There's also the risk that we throw away real biological variation by normalizing. So we will have an example of that and you guys can then in R do your own kind of analysis on a little bit of microarray data. The next step is gene expression clustering. Like I told you, the microarrays has five million probes and they measure like more than 10,000 genes and we want to focus on the main result because we can't just say, well, there's this gene which kind of follows this. No, there's a holistic approach, right? So we need to take into account all of the genes of the microarray and we need a way to visualize this data. We can't make 10,000 box plots and look at the box plot of gene one and then see, oh, it's different in our sample and our control and then look at the second box plot, right? So the most common strategy here is to visualize the most differentially expressed genes. But this is difficult when you have a lot of different groups that you're dealing with. So we might have not just a case control study but we might have like 10 varieties of grain and all of these varieties of grain. They have a very similar genome but of course all of these 10 varieties have their own unique gene expression patterns and of course we want to visualize this and we want to show the biologist, yes, we did this experiment and the high yielding grain has this gene which is generally expressed higher or has this cluster of genes which expressed together seems to increase the yield of the plant. And of course we use again algorithms to create these groups and cluster data together. So the common strategy is not to focus on a single thing like you do with QPCR but really to look at all of the genes in the genome and then figure out, well, this group of genes is probably the group of genes that I have to have a closer look at. So how does this look? Well, I gave you a couple of or two examples. So one is a clustering of genes where we look at immune cells, the sort of paper where it comes from is located here. So we see we have CD20 minus cells. So those are cells which are not expressing CD20 and then we have CD20 high cells. So those are cells which are expressing a certain surface marker on there outside of the cell wall. We have different samples, right? So here we have five samples measured with the CD20 and we have like five samples that do not have and that have and then we can see here in the rows, the names are not here but each of these rows and there are literally thousand of rows have we can see that some genes are expressed lowly in CD20 cells, other genes are expressed highly in CD20 cells and the opposite here. And we always focus on differences, right? Because if a gene is expressed highly in both cell types, it's probably not interesting. But if it's high in one, low in the other one, then we can think like, oh, there might be something going on there. We can also do something different, right? And that means we can calculate correlation. So I assume that most people know how correlation works but here, for example, we look at different tissues. So in this research here, this is the preprint and what they show is indeed that if you look, for example, at brain tissue, I hope it's readable. It's a little bit small but I can just read it. But if you look at brain tissue, you see that it doesn't really matter where you take your brain tissue from. If you take a vector of measurements from, for example, hypothalamus, then, of course, the vector of genes in the amygdala are very highly correlated. But you can see very clearly that brain tissue has a very different gene expression pattern compared to, for example, things like skin or the different arteries. And so correlation also helps us to more or less group these things into data and it allows, correlation in this case, allows you to look across different tissues and figure out what is unique to these tissues and if there are any tissues which are very similar. And so, for example, here we can see that actually in the pituitary gland that these samples are, again, more like brain tissue and not so much like, for example, testis or cells that were taken from lymphocytes. So different ways of visualization and this is what a bioinfantation does, right? We get big matrices filled with intensity values and then we visualize it in a way that people can understand, oh, all of the genes in CD20 cells that don't express all the genes in CD20 negative cells, all these genes have, for example, something in common. And that is the next step because the next step would then be to see if there's any pathways that are overrepresented in this data. So gene expression, there's many, many different ways on how to cluster gene expression so you can use different partitioning methods like hierarchical clustering or fuzzy clustering or density-based, but all of these measurements depend on having a certain distance measurement because we have to define which samples look similar and which samples are different. And this is not an easy thing to define because what is different and what is similar is, of course, very dependent on how you think about differences, but all of these things can be caught mathematically in something which is called a distance measurement. And during next lectures, we will talk about different distance measurements that people came up with, like Euclidean distance, Manhattan distance, Minowski distance. So there's different distances and all of these distances give you a score for how similar two things are or how different two things are. And then you can use these differences and similarities to kind of cluster them together in trees using a kind of tree visualization. Of course, we also use a lot of statistics because we need to determine which effects are significant, right? And significance is something that we have rules about so we generally take like, hey, we want to have at least a false discovery rate of 5%, meaning that if we claim that these 100 genes are different between the two samples, have we kind of agreed that, well, not all 100 are probably different, but 95 out of 100 definitely are, right? So significance in a statistical sense means that the likelihood that the differences are observed are real is high enough for us to trust the results. And of course, this comes back with the fact that we're testing tens of thousands of genes. Yes, so there we have to account for things like multiple testing. And we'll go into much more detail in the microarray lecture to kind of show you guys how you can more or less still have these significant effects, even though you might have a very low amount of samples and you've might have measured like hundreds and hundreds of different genes. So the data interpretation generally occurs via something called genontology. So genontology is kind of the tower of Babel in biology. It is an up-to-date comprehensive computational model of biological systems, which defines molecular levels larger than pathways like cellular and organism levels. So we will be talking a lot about genontology. I'm not the biggest fan on genontology, but the thing that it does is that it provides you with a structure vocabulary so that you can talk to each other, right? In the past, we used to have like five names for the same gene, but not only that, but someone talking about the insulin pathway. Of course, what is the insulin pathway? Which genes belong to this pathway? This is something that genontology defines for us. So it is a structured way so that using a computer, you can see if a certain pathway is overrepresented or underrepresented in the genes that you found differentially expressed. Yes, so that's the handiness of genontology, is that it's a structured vocabulary. So instead of just having words which are defined by the context, it's words that are defined by an ontology, which is something that a computer can understand and reason about. So how does genontology look? Well, genontology is a tree, right? So we have, for example, the genontology, which is the biological process ontology. The biological process ontology is split into two different categories, right? So all of the genes are in here. So all of the genes are involved in some kind of a biological process. But some genes are involved in, for example, cellular processes, while other genes are involved in responses to stimuli. Of course, if you think about responses to stimuli, these can be responses to endogenous stimuli. So endogenous means from within the cell, but it can also be responses to stress, right? The same thing for cellular processes. Cellular processes can be cell communication. So one cell talking to another cell, but it can also be cellular physiological processes. So processes happen within a cell which have nothing to do with cell to cell communication. And all of these genes, so if you belong to a cellular process, you also belong to a biological, you're also a biological process, but it's a tree, right? So for each gene, we have assigned this gene to one of these categories. All right, that's one of the annoying things. So let's do ban. Where's the ban hammer? Ban, all right. Yeah, first one, wanna become famous. No, I don't wanna become famous. I just wanna do my lecture. Oh, my moderator was quicker. All right, so gene ontology structured vocabulary allows us to talk about things which might have been defined differently in the past. I, it sees that one message was deleted by moderator just when I clicked the button to delete the message. But I did the ban, so that's good. Had so, but gene ontology really, really useful. Interesting, by a moderator, but not you. I only have one moderator as far as I know. Or did Daniel become a moderator again? And is he sneakily watching the lecture as well? I don't know. Anyway, we will be talking more about gene ontology, but every member gene ontology is a three and it allows a computer to reason about concepts like cell death, what is transcription, what is dependent transcription or regulation of transcription. All right, so very small example. So bioinformatics is involved in the design of the oligonucleotides that go on to the array, so the array probes. Bioinformatics is involved in normalization of expression data, clustering of data, creating visualizations, interpreting the results and also storage and management, right? And that's what people often forget, is that a big part of a bioinformatician is to kind of keep all the gigabytes or terabytes which are generated within a group of people working together on a single project or on a single biological entity had to kind of keep that data. Hey, it happens a lot that the professor comes in and well, you notice experiment that we did like 15 years ago, how does the new data that we have relate to that data, right? And then you have to be able to go to your computer, fish out the data from 15 years ago and of course you have to do this in a reasonable amount of time. You can't say come back in a week and I will have sorted the data for you. So had the management and the storage is a big part in bioinformatics as well. Of course, programming is essential. And like I told you guys, the course will not teach you to program. And unless everyone wants that, right? We can have a poll on Moodle and everyone says, well, I wanna learn how to program but just go to my YouTube channel and watch the lecture that I did on programming in R. So I did a whole programming courses and we will use some R's for some basic analysis. And of course, when we will have assignments where it is required that you program, of course these parts will be introduced. But the focus of the course is to get you to know which databases are there, which hold the data, things like GenBank and Ensemble. And so where can I find my data but also to kind of give you an idea of which tools are out there. Because as a bioinformatician, you can't program everything that you want from scratch, right? So you have to use tools developed by other people. So the homework for today is for you guys is to kind of prepare your computer that you will be doing the course on. And things like, hey, you need to install at least a good text editor, install the R programming language because all of the assignments will be related to R. But also to install the Git version control software and create a GitHub account and link your local computer to your own GitHub account so that you can collaborate with other people. Because we will be doing some assignments where it will be kind of group assignments. So for that, people can work together. And I'm actually a big fan of that. Like when I do computer programming courses, I don't think it makes sense for one person sitting behind a computer and trying to come up with an answer. It's much better to have like two or three people looking at the same problem and coming up with different strategies and discussing these and then together implementing it. I'm a big fan of kind of peer programming where people sit together behind a computer and figure out the best way of doing something. Because one person is just an island and you can't be an island. You're always collaborating with other people and the same thing holds for people who program. So Git is more or less the most common version control system. Was developed by the guy who also wrote Linux kernel. So if you've ever installed the Linux operating system, you use this software, but Git is made by him. GitHub is a collaborative platform. So it's one of the biggest and I use it and there's other providers of these things as well. The nice thing about GitHub is, is that since you guys are part of the university and you're students, you can get a free pro account, which is always nice that you can get something from free. So the homework for today, we will be doing after this session and we will just take an hour and I will guide you to where can you download the text editor, the R software, Git and these kinds of things. And we will be going through the steps needed to link your GitHub account to your local computer so that you can send data or code from your computer to an online source and then online work together with other people to generate a solution. All right, so the coming lectures. Introduction to bioinformatics, that's the lecture what we did today, right? So we did the course overview. I give you a little bit of history, some definitions. Why do we need bioinformatics? I hope I convinced you by now that bioinformatics is essential in modern day biology. Some examples, well I gave you one example on microarray data. So the next lecture will be all about phenotypes or traits as people call them. So we will be talking about Mendelian traits, about qualitative and quantitative traits, about statistical analysis, multiple testing, sample size and project planning. Had to get you guys familiar with everything surrounding phenotypes. And then we will be just going up the level that I showed you. Right, I showed you the arrow with the different biomolecular levels. So the first level that we will be discussing is DNA. So there will again be a little bit of history about DNA and who did what, who invented what. We will be talking about different sequencing techniques for DNA. We will be talking about genes, like how is a gene structured, what are transposals, also called jumping genes. We will be, let me take a sip. We will be talking also about regulatory elements. So how is DNA expression regulated? And we will be, and I will be showing you some examples of other types of DNA, like mitochondrial DNA and chloroplasts. So if you're a plant biologist, then you need to know about chloroplasts because they have their own DNA, which is separated from the genomic DNA. And of course we will be talking a little bit about biomarkers. The next level is RNA. So there will be a whole lesson about RNA, about the history, about the different types of RNA, about RNA expression, expression analysis. Again, RNA sequencing, the structure of RNA, which is very important for the function. The next level is proteins. So the history of proteins, structure prediction, function prediction, how do we define which proteins belong together? What are phylogenetic trees? These things will come up in lecture number four, right? So this is one, two, three, four, five. So lecture number five. Then we will be talking about the metabolome, so the metabolite. So everything that you either put into your body or that your body creates, which are not proteins, RNA or DNA. And we will be talking about things like, what are metabolites? What is mass spectrometry? How do we identify different metabolites? Like, how does the police know that you've used some type of drugs? And we will be talking about different databases which contain reactions, right? So how does cocaine break down in the body? Well, there's several databases that can allow you to do that. Well, the database doesn't allow you to break down cocaine, but it allows you to see how this chemical substance is transformed in, for example, the liver or the kidneys into something that is not harmful for the body. We will be talking about keg and the reactome, which are more or less the two greatest data or the two biggest databases. And we will also be talking about cytoscape in this lecture. And then I think there is one lecture which is canceled because of Christmas, which is good because everyone likes Christmas. But there will be a holiday break. I think it will be after the metabolomes and pathway lecture. And then we will be talking again about phenotypes, right? Because then we go full circle. We start at the highest level. So the phenotypes, the things that are interesting to us, like because that makes the money, right? The milk yield of a cow is the thing that we're really interested in. And then we dive through all of these levels to get back to the same level. And then I want to talk to you guys about QTL mapping because that's what I did my PhD on. So my PhD is about high throughput methodologies for QTL and genome-wide association studies. So I'm kind of an expert on that. I wouldn't call myself an expert. I know a little bit about QTL mapping and genome-wide associations. So I just wanted to have one lecture to talk about that. We will then switch to something primary. We will then switch to primer design. Primer design is actually a pretty interesting lecture because it is very applicable because if in your master project you are bound to have to measure a gene or something like that, right? Like in biology, we measure genes and proteins. So knowing how you design primers and knowing the basis of polymerizer chain reaction is good. Although it might be actually that all of the students that are following the course have already some experience with primer design. So if that is the case, then do let me know because I think in the bachelor there's also a course which explains PCR and primer design. Although I always love talking about Kerri Mules. Kerri Mules is one of my favorite guys in biology because he's just like a very interesting character. And when we have the lecture, I can talk more freely about Kerri Mules and his drug history and the fact that he thinks that polymerizer chain reaction actually, he claims that the method actually was given to him by aliens and aliens look like multicolored ferrets. It's such an interesting figure. He got a Nobel Prize for his invention and he's one of the smartest guys, but he's very, very strange and fun, right? So I think that that makes biology interesting, that you have all kinds of these figures of all kinds of people who have their own kind of handicaps in a way. So I love talking about Kerri Mules. So if everyone knows primer design, we can skip the primer design lecture. If not, then I get to talk about Kerri Mules. The next lecture will be databases. So how are databases organized? What features are there? What are important databases? And then we will also be talking about Biomart. So Biomart is a software tool for the R programming language which allows you to connect your R session directly to Ensembl and directly pull data from the Ensembl database, which is really, really useful because then you don't have to go online to ensemble.org and click download and because the computer can then automatically talk to the database and Biomart is a very smart tool. We will then be talking about sequence analysis. We will be doing alignments, alignments and alignments and we will be talking about things like homology, what are sequence motifs, how do you find sequence motifs and what do they mean? And then of course we will have gene expression analysis. So we will be talking a little bit about the experimental design, the data organization and preparation, statistical analysis and visualization of differential expression results. So that's more or less the lecture about microarrays because we will be using microarrays as an example. Of course gene expression can also be measured using QPCR or RNA sequencing nowadays. So the lecture is focusing on microarrays because it's small enough for me to give you a data set because otherwise I would have to give you an RNA sec data set which is like 500 MB or something and no one wants to download that. Then the last couple of lectures will be the standards for bioinformatics and statistical analysis. So we will be talking about different types of files, different file structures about naming conventions that are there like how do I call my file names, headers and different variables. We will also be talking about documentation and testing. So how do I write proper documentation for software that I developed? What types of documentation are there? And testing of course is very important because if you develop new algorithms then you have to have a test suite, right? Because you have something which is kind of a gold standard. So we will be talking about that. And then the last lecture that we have is about scientific literature, mining and management. So that is where you get taught on how you use PubMed Advanced Search but also how you can more or less automatically churn through like 40 years of biological data. And I also have a part of the lecture which is about reference management software which I think is very useful for people that write a master thesis or write a bachelor thesis because you shouldn't do scientific citations by hand. You use a computer program, right? Why would you spend time formatting your citations if there's just software out there which can do that for you? And then of course we will have a couple of your own choice lectures. So if you have a data set or if you have something that you think is missing from this list and for example, oh I want to learn everything about machine learning in the context of, for example, the new Google algorithm for protein prediction, protein structure prediction. I think it's called deep protein or something. Then we can do that. So if you have something that really interests you and you don't see it on the list, send me an email or just throw it in chat and I will make a list and we can vote on it on which course we want to do. But I always like having a little bit of free space for you guys to decide what you want. All right, so the next lectures. So we decided we will get the lectures from one to four instead of from two to five. Assignments are necessary to pass the exam. So I will make sure that the lectures are around two hours, three hours when we do the assignments together. But there will be questions on the exam about the assignments. And the assignments are the useful part, right? You can listen to me talk about databases and stuff. But if you don't go to the database and click on the links and see what's there, then you don't learn what's in there, right? Then you just get a very global overview. So the exam will contain questions about the lectures and the assignments. So I'm just a tip for me. I've, in the past, I've noticed that people who don't do the assignments generally don't pass the exam. There are exceptions to any rule, of course, but there is a direct correlation between the people that do the assignments and ask questions and the people that pass the exam. All right, any questions so far about microarrays, other things, any people that already have a suggestion for a course that we wanna do, then throw it in chat. We will have, oh, I'm a little bit on time. So we will have like a 10, 15 minute break. So I will try and be back at like 4, 10, 4, 15. And then we will start getting your computer ready. So then we will start downloading some stuff. And we will go together and see that everyone gets their hit installed and everyone gets their, oh, thank you, just to shout us for that bit. Thank you for cheering. It's good that you're here. We can set up your computer properly as well. So that will be after the break. So we will have a short break. I think this break will be birds. Yeah, Daniel's here. Welcome, Daniel. Oh yeah, I didn't tell you guys, but like up here, you can see something which is called the mood box. I actually made a button for that as well. So let me show you, these are the commands that you can type in chat and then you will get your own little avatar there. And then you can change it. So I made this thing for Twitch. So I can do stuff like monocle and then I'm in the list and I have a little monocle. I just came here to ask you something about R because I have problems with the date format. What do you mean the date format? You already asked that yesterday. Like I was confused yesterday if you meant date or data. But dates are very simple, right? It's like the day minus the month minus the year. I will ask you in the evening. All right, but stay for the rest of the lecture because it's good that you get Git installed and make a GitHub account as well. You probably have Notepad++ and R installed so like the first 10 minutes won't be that interesting. All right, so I will stop the recording.