 start the recording. So for everyone watching it later, welcome back. We missed a couple of slides, but the slides will be available on Moodle, so you just have to look through the slides and do without the commentary that I'm currently doing. Of course, you can watch back the video as well, so then you will, no, then you won't see the slide. So annoying. So if anyone in chat would just shout every time that we come back from a break, shout, start the recording, don't forget to start the recording, then that would be really nice because like I'm having multiple screens open, so and it's a very little button. It's like a button which is this big, saying that start recording. All right, thanks. So Skorita, you will be my like warning for recording message, so that would be cool. All right, so data needs to be stored and managed. I don't know how many people are watching this on a laptop. Go to your desktop and see how well your desktop is managed. This is a screenshot that I got from Business Insider, and many times when Florian says, isn't it on Twitch? Yeah, it's on Twitch, but Twitch is only available for like two weeks unless I get to 50 followers. So if you're not subscribed yet, subscribe because when I get to 50 followers, I can set it so that it will be available forever and ever and ever, but when I have less than 50 followers, then I can't do that. Subscribe or both. Subscribe, follow, put a little heart there, put stuff in chat, like every attention helps. It's kind of like YouTube in a way, so also in YouTube it's good to subscribe to a channel and these kinds of things, but it's also important to comment. So throwing things in chat helps the algorithm determine that this is a very interesting stream to watch and that everyone should follow it and like it. All right, so I see a couple of people subscribing. Thank you for the subscriptions. Like I said, data needs to be stored and managed and this is one of the key tasks in bioinformatics. It is also the thing that I'm kind of bad at. I do server management for our server and it's always really hard. We have a storage machine which is like almost 100 terabytes big and I'm not happy with the structure that we have there. I've never really been, but that's also because like you inherit things from, I think it's only possible to follow you. Let me see, are you already followed? Yeah, you already followed three days ago, so that's fine. You're all good. You're all good. If you follow, you actually get an automatic email when I go live. So if during the weekend I feel like streaming like some of my computer games that I generally played and you could also watch me get slaughtered in RimWorld or get eaten by zombies when I'm playing Project Zomboid. I don't stream games often, almost never, but yeah, if you follow, then you get a mail about that as well. All right, storage and management, key of bioinformatics. One of the things that I'm not really good at, but I will do my best and convince you that it's really necessary to have data properly stored, properly managed. And why is this? This is because in science we are looking for reproducible research, which means that if you publish a paper and you have data on which your results are based, that people can take your data, people can read your paper, see what you did exactly, and then to reproduce your research. It is also very important to quickly find back any relevant data. For example, when the professor comes walking in and says, you know that experiment that we did in 2015, then I have to be able to find that experiment, of course, and to redo the analysis because people come up with new ideas, new ways to analyze data. So it's very important. Also, storage and management, if you do it properly, it allows you to get more out of the data that you have, because you can make connections between different sources of data, but also between data which has been gathered on different time points during the lifetime of more or less a research group. Here I always like talking about the bus factor, so the bus factor is a term which comes from computer science, and it's just basically something that says how many people have to get hit by a bus before the experiment or the data is more or less gone, so the knowledge is gone. And so if you think about a bakery, something which makes bread, and you have 10 people working in that bakery, but all 10 people know how to knead dough and how to make like the sandwiches, but there's only two people who know how to operate the furnace, then the bus factor for this bakery is just two, because if there's two people who know how to operate the furnace, if they get hit by a bus, then they cannot make bread anymore. So the bus factor is the minimum amount of people that have to get hit by a bus before the project ends. So if you think about something like the Linux kernel, which is a massive open source project, then of course the bus factor of something like that is relatively high, because there's a lot of developers that are working on something like the Linux kernel, and for other projects it might be really, really small. And generally if you look at biology and how biology is done, the bus factor for many projects is two or three at the maximum. So the PhD student working on the project knows what's going on. There might be a master student that is helping the PhD student who knows what is going on, where the data is stored, where the samples are in the freezer, and generally the professor has a vague idea of what's being done, where stuff is in the freezer, but not exactly. But if the postdoc or the PhD student gets hit by a bus, then there's a massive issue for projects, because projects cannot be finished in time. So when you structure your data and you manage your data properly, you can kind of increase the bus factor, because multiple people know where data is stored and how this has been analyzed. So an interesting term, and it's very relevant in biology as well, it's just that it's difficult to quantify. But if you think about your own work, if you think about your own research group in which you are participating, or in which you are active, then think about how many people are really aware of what's going on in my project. And sometimes I see projects where hundreds of thousands of dollars are there for funding, or hundreds of thousands of euros, and the bus factor of these projects is only one or two, and that's of course something that you want to avoid. You want to be able to have your data structures in such a way that there's many people that know exactly what's going on. Alright, so here we have a statement from my PhD thesis, and the statement is, we have propositions in PhD thesis in Holland, where I did my PhD thesis, and you have to have 10 propositions, which you get grilled on during your defense of your thesis. And this is one of my propositions that I had, is that there is no good communication on big data without proper visualizations. So visualizations, so going from data in text format on a hard drive to creating a figure, which you can sit down and talk to a biologist about this figure, about like, I think that this, that the liver is influencing glucose, which is then influencing the storage of things, and that if you can visualize it, then you can talk about it, not only with biologists, but also with computer scientists who speak a completely different language than biologists in general, and also to the general public, because in the end, the goal of science is to inform the general public, so normal people. And so having like a big table with numbers generally doesn't really help people from the general audience understand what you're doing, but making a really nice figure, a nice visualization of that data can generally mean a lot for people on understanding what is, what is going on. All right, so this was where I was expecting the break to be, so we will continue with the example. And I think I should hurry up a little bit, because I do want to finish at 11, because that was kind of the plan. All right, so I wanted to show you something where bioinformatics is involved at many different points, so and one of the, one of the things that I came up with was microarray analysis, because it's one of the analysis that I've been doing a lot in the past, and still do on, well, not a daily basis, but on a regular basis. So a microarray measures the expression level of a gene, it measures the activity of a certain gene, and with nowadays, with new microarrays, we can measure the expression level of more than 20,000 genes in a single experiment. That means that we can kind of tag every gene in the human genome, and at a certain point in time, get a snapshot of how active a certain gene is. Microarrays come in two different types, so you have one color microarrays on which you put a single sample. For example, you put a single human on a one color microarray to see the activity of its genes, and you have two color microarrays which are used when you are comparing groups. So imagine that you have tissue from someone who has lung cancer, and you have tissue from someone that doesn't have lung cancer, and then you can color these two tissues with different colors of fluorescence dye, like Psi 3 and Psi 5, so you color the lung tissue of the healthy person, you color that green, the other tissue, the cancer tissue, you color red, and then you put both of these on a single microarray, and then the signal intensities will tell you how a gene is expressed, if it's higher expressed in the one group, or if it's higher expressed in the other group. So how does it kind of work? So this is kind of a slide that I took from, well, cmr.asm.org, and it's kind of showing you how a microarray is designed and how a microarray experiment goes. So we start here with an annotated genomic structure, and so we have a DNA sequence with different genes on there. So the first thing that we do is we use polymerase chain reaction to PCR out pieces of that, and these little pieces of DNA get spotted onto a little glass plate into these little holes. This is a two-color microarray, so have we spot these little pieces of DNA on here, and then we go more or less for a fishing expedition. So what we do is we take a sample, for example healthy tissue, we take tissue which is cancerous, what we do is we extract the RNA from this tissue using standard biolateral methods, then we label this RNA using different fluorescent dyes, and then we hybridize both of these samples together on the microarray. We then take the microarray, put it in a machine which uses lasers to kind of scan the whole array. What we get is two pictures, a picture which is from the green channel, so it reads the intensity of the healthy sample, and we get another picture which is the intensity of the different probes of the cancerous sample. And then these two images are put on top of each other, and then we get an image which more or less looks like this. I think I have a better picture of a real microarray, but on the microarray you have different colors, so when it's green it means that it's highly expressed in sample one, when it's red it means that it's highly expressed in sample two, when it's yellow it means that there's no difference between the two, and when it's black it means that it didn't work, so this probe is either not present in the two samples or something else went wrong. Alright, so when we look at the workflow of how to create a microarray, then of course the first thing is to select or to create these oligo arrays, to select which parts of the genome, which parts of the genome do we want to have on our microarray. So of course this is a typical field where bioinformatics is involved, and because you take the genome sequence of the animal that you're interested in, then you look to see where the genes are, and you determine which parts of the gene you want to measure. Then biologists take over, so they acquire the samples, so hey you need someone who has an animal permission to work with animals, to do the experiment, to for example inject animals, or to grow plants under different conditions, hey you have to then extract and label the DNA, which generally happens in a lab, then you do the hybridization to the array, which is generally also done by a biologist, and then after that it's back to the bioinformatician. So the bioinformatician, hey it operates the machine, it makes the scans using the different colors of lasers, and then we do data storage, because we get two images which are very high resolution images, and these images need to be stored. Of course we then need to go from the pictures that we have to data, which is more or less in an Excel kind of table, so have we have to have data storage, but then after we have the Excel table we need to normalize data. We then need to do something which is called gene expression clustering, and then afterwards we have data interpretation. Data interpretation is not something that a biologist does, it is something that a biologist and a bioinformatician generally do together. So they sit together, they look at the data, they look at which genes are differentially expressed, so for example highly expressed in the cancer tissue, and then the biologist says, oh wait, I read a paper about that, and had this gene is known to be involved in cancer, or it's known to be involved in growth, or in other things. So you can see that most of the steps in a standard microarray experiments are more or less assigned to a bioinformatician, and there's only a minority of steps which is done by biologists working in a lab. So creating the arrays means that we have to determine which sequences we want to have measured, humans have, well, 25,000 genes, each gene is on average 10,000 to 15,000 base pairs long, so which sequences do we measure, because the probes on an array generally are in the order of 20 base pairs to like 40 base pairs, so we cannot measure a whole gene with a single probe, we have to select which parts we want to measure, and of course these probes that we create have to adhere to certain rules, and for example the hybridization temperature needs to be very similar between the probes, so we have to use bioinformatics to design probes, and to estimate beforehand how these probes are going to work, so there's a big in-Cedico analysis when you are trying to make a microarray which is done by a bioinformatician, and this is even before you generate your samples, because you have to first make your microarrays, and like I said in my old university we used to have a microarray spotter in the basement, so we could just do the design, spot the microarray, and then directly test if it works, nowadays if you design these new types of microarrays which measure 20,000 to sometimes up to 200,000 probes at the same time, then of course you have a company which makes these for you, so it becomes even more important to have this estimation beforehand, and because you can't all of a sudden decide, oh this probe isn't working, I want to swap it out, because that means that you have to reorder microarrays which is relatively expensive, so you want to have beforehand a very good in-Cedico analysis of how the microarray will function, even before you are putting samples on your microarray, because it's not a trial and error experiment anymore, like it used to be like 10-15 years ago. So microarrays are scanned, after which bioinformatics is used to convert the intensities to values, so here you see a little picture of a one-color microarray, so this is a microarray which is done by Cy3 by the green color, and here you see a picture of a two-color microarray with a much higher density, and if you zoom in you see here that most of these probes they are not really different, but you see some probes which are highly expressed in the one sample, and you see some probes which are highly expressed in the other sample, and of course you then need to use statistics to figure out if this is significantly different, but this is more or less how the raw data looks when you get it from a scanner, so you get these big files which are generally not PNG files or JPEG files, no they are generally like these TIF files which are huge, sometimes up to 100 megabytes per microarray, and these need to then be processed into data which looks like this, because in the end we want to have a nice table on which on one axis we have all of the genes that we've measured or the probes that we have measured, and on the other axis we have all of the samples, so for each sample and for each probe on the array we get kind of a measurement value, and so we kind of compress this whole big data that we have which used to be an image format, we kind of compress it into a matrix of numerical values which we can then use to do statistics on. Of course there's a step where we need to normalize and just a little bit of information on why we need to normalize, this is because there's a lot of non-biological variation per sample, and these microarrays are physical things and things in the environment can change, and for example some of the microarrays are done on Wednesday, on Wednesday it happened to rain so the humidity is up, other microarrays have been done on Friday, on Friday it was very dry, so had the conditions are different so that also impacts how a microarray works and you want to get rid of that. So there might also be differences of the RNA that you put on the array and there's of course differences between the dynamic range of the scanner, because the scanner doesn't work as well every day, things get old, even electronical equipment is subjected to wear and tear, so you have to compensate for these kinds of variances which are not of biological interest, but which do occur. Fortunately there are many different algorithms that are out there that you can use to normalize microarray data and we will have a lecture in which we go into more details on how to do a microarray analysis and there will then be a little assignment for you guys to analyze like a small microarray experiment. The next step is then, because then we have like this big matrix, let me go, so when we have this massive matrix, when we have normalized this matrix, then of course we need to figure out which genes are different, so generally when we do that, we do that using, we focus on, we measure tens of thousands or hundreds of thousands of genes and we want to focus on the main result, right, which real gene or which gene is really different, which gene is driving it, so we need a way to visualize this data because you can't just publish a paper with a massive table with just 200 columns and 200,000 rows and no journal will publish that, so one of the common strategies is to visualize only the most differentially expressed genes and this is kind of difficult when you have a lot of groups, but you can use algorithms to create these groups for you and to cluster data and to go from having raw data to having data which is more or less similar versus different, so here I show a little example that I just took from two papers that I recently read, so this is dealing with CD20 cells, so CD20 negative cells and CD20 high cells and here on the one axis you see the different genes, well not all of them, but a selection of genes which are different and so what you see here is that some genes are very lowly expressed in cells which are CD20 negative and these genes are then highly expressed in cells which are CD20 positive and so the colors here, so red means lowly expressed, green means highly expressed and you can see that in this case there are only two groups, so you see that there are also two groups of genes, genes which are highly expressed in the one, low expressed in the other and of course the other way around where they are highly expressed in the negative cells and lowly expressed in the positive cells, so this is one way of clustering your data and it gives you an idea of okay so how many groups are there in my data and especially this is handy when you just, when you have not two groups but for example when you have five or six different groups and you could imagine that you have two different types of corn and these two different types of corn are grown under five different conditions and then you end up with 10 different groups that you could have and then you want to see for example are certain types of corn very similar to each other or are they very different or you might be interested in the different conditions that you have, so one of the things that you see in the B figure, so in the other figure is a different way of doing clustering and visualization, so here what they do is take all kinds of different tissues, measure these tissues, the expression of these tissues by different microarrays and then what they did is not just cluster the genes on the microarrays but cluster the different tissues towards each other and so they use correlation to correlate the expression of one tissue to the other tissue and of course when you correlate brain the cerebral hemisphere with brain the cerebral hemisphere of course the correlation is one right because these two tissues are the same but if you look at different tissues you can see that there are very clear patterns so you see that all the samples that they took from brain and they all kind of cluster together because they all have a very similar gene expression pattern while if you look at the other tissues you see that that's not not really clear one of the things that you can see actually here which is relatively interesting I hope that it's big enough is that you can see that for example samples taken from heart are very similar to samples which are taken from muscle of course the heart is one big muscle and so you would expect this but it's nice to see how different tissues cluster together so here you're not clustering across the gene expression level like in a but you're clustering based on the gene expression between different tissues and you get an idea of which tissues are related to each other or which tissues might be very different from each other so when we talk about clustering there's different methods to do this so you can do like partitioning methods there are hierarchical clustering measures they're fuzzy clustering you have density to based and model based but most of these measures are based on a kind of a distance measurement so how different or how similar are these genes are they expressed in a in a very similar way or are they expressed in a very different way from each other so we will go back in the next lecture we will have and we will explain how all of these methods work so that you get more insight into this but just remember for today that clustering of genes and saying these things are different or these things are similar that is a non-solved issue in bioinformatics and it's an active field of research on how to best cluster things to get the best result of course when you do clustering you also use a lot of statistics so statistics are used to determine which effects are significant and significance means that there is a certain likelihood a certain probability that you assign that the differences that you observe are real and that they are not based on chance and so also here statistics in the analysis of microarrays play a play a major role after that we usually find these groups right if you compare the CD20 negative cells versus the CD20 high cells you see that some genes function together and they seem to be related to each other so generally when you then go and you go from doing clustering to something which is called gene ontology so you take for example one of these clusters and then you look to see if there is a common denominator within these genes so every gene in the genome has gene ontology associated to it gene ontology I will just read the definition is an up-to-date comprehensible computational model of biological systems from the molecular level to larger pathways cellular and organism system levels organism level systems so it provides you with a structured vocabulary which you can use to do analysis on you can see if a certain ontology term is over represented so how does this work well it works kind of like this so gene ontology for example has a group and this group is called biological process so the biological process group contains all the genes in the genome because every gene in the genome is associated with a certain biological process but then they subdivide this group into two new groups so you have a group called cellular process so this is a process which happens inside of a cell and then you have another process which is called response to stimulus so this is generally something that happens on kind of the outer layer of the cell right so you divide biological processes into biological process which happen inside of a cell versus biological processes which happen on the kind of the interface between the cell and the outside world then these groups are defined again and again and again in little and more little and little groups so in the end when you do a gene ontology clustering hey you can for example see here based on the color of the circle which process is kind of over represented it in the data that you give it so you give it like the 50 genes which are highly expressed in the one tissue lowly expressed in the other tissue and then it will tell you well all of these genes that you are currently looking at have something to do with for example the transcription from the RNA polymerase 2 promoter so and that will help you to kind of figure out what's going on and not only that but there's also an over representation when you look for example at the negative regulation of self-cycle progression and so these things are these this gene ontology will give you an idea of what is going on in your sample on kind of a holistic level it will tell you that for example well all the genes that you gave me are involved in either insulin secretion or all of the genes that you gave me are involved in growth or they are involved in degradation of proteins so it kind of allows you to zoom in on which part of a cell or which part of an organism is very different from the group that you are looking at so it's kind of a statistical method which uses over representation or under representation to kind of figure out well all of these genes that I now find to be different what do they have in common and what is kind of the common thing which is affected when I put my sample from a normal condition into condition with increased heat or increased salt levels so it helps you to kind of zoom in to which pathway or which ontology is very different we will come back to gene ontology because it's not just biological processes you also have molecular function and localization so you can also see if the genes that you are giving to the algorithm are mostly expressed or mostly active in the nucleus or in the endoplasmatic reticulum or in other parts of the cell which gives you an idea where in the cell things are going wrong or where in the cells your process is being being adjusted all right so this was was kind of a quick example of why you need bioinformatics in when you are doing microarray experiments so you see that or we talk or I talked about where bioinformatics is used so bioinformatics is used in the design of these microarrays so deciding which parts of the genome are we going to put on there bioinformatics is a big part in in extracting the data from the pictures and then normalizing the data clustering the data making visualizations interpreting the results based on things like gene ontology but there are many ways to do this but also bioinformatics is very important in the storage and the management of the data so that people at a later point in time can redo the analysis and can change things like swap out a certain algorithm by using now a newer algorithm which is better has so storage and management of the data is is a fundamental part of bioinformatics again I want to also stress that programming is a very big part of bioinformatics and I will also want to stress that this course is not here to teach you how to program if you want to learn how to program then I will advise you to do the our course which is in the summer semester so we will start with the our course somewhere in February March of 2021 however we will learn some basic our skills but these are not at the end of the course you you really shouldn't call yourself a programmer so there is some are which we are going to use to do some basic assignments and of course like one of the people had the idea that they wanted to get information from ensemble so there we can look at different packages for our head that allow you to automatically connect to ensemble and download for example all of the proteins of us of a certain animal or download the DNA sequence but the focus here is not on programming although it is a bioinformatics course and the focus here is to know which databases exist which tools exist and how you can use them to answer biological questions by kind of compressing a whole bunch of available data into one nice figure which tells a story which you can then explain for example to your grandmother or which can help you talk to to biologists or to people doing computer science so the homework for today is very easy it's just to install our on your computer on the last slide I will have a link where you can download our and if you're on Windows it's just downloading the executable running it and then clicking next next next next finish and then ours installed on your computer some people like to use our studio which is kind of a more fancy environment so if you want to use our studio also install our studio on your computer but we will need this in the coming lectures but don't worry it's not it's not I'm not here to teach you how to program if you want to learn how to program follow follow the our course all right so then we're at the last part of the lecture and that's just an overview of what's coming next so today we had the introduction into bioinformatics so a little bit of the course overview the history some definitions some wise some examples so a very course introduction of what you can expect from the next lectures like I said I kind of structured the course via the biological dogma but I first want to talk about phenotypes because phenotypes are the things that matter to us and so phenotypes are things like how much does an animal weigh how quickly does an animal grow how much milk does it give when you think about plants is how much yield do we get from a plant how resistant is a plant for certain certain pests or weather conditions and so during the phenotype lecture the topics will be what are Mendelian trades what are quantitative trades statistical analysis multiple testing some sample size and a little bit of project planning to obtain phenotypes that are useful in biology then we will start at the lowest level so we will start at the DNA level flagpole 33 are these lectures going to be archived yes they will be on Twitch for two weeks and I will also put them on the Moodle if you don't have access to the Moodle I will probably also put them on my website but on my website it will probably only put the PowerPoint slash PDFs of the different lectures but the lectures themselves will be archived and they will be available on on on on Moodle is that good enough or do you want to have more I don't know if everyone's watching is having access to the Moodle I saw from my list of subscribers that at least one or two people do not have access to the Moodle yet because I couldn't automatically register myself that is generally the case when you have a middle name so so people say I am Denny Arendt's but then they have some name in the middle which they don't tell me and then I can't find them via via Moodle so if I think they're the Moodle is still empty so after this lecture I will upload the lecture I will upload the slides and that's it for today because there are no real exercises except for installing R and then have for the next lectures I will kind of do the same thing I'm still a little bit doubtful if I should put the PowerPoint online before the lecture because I noticed and when I do that then people read through the PowerPoint and say well this is not interesting for me so I will not follow this lecture which I don't really like so I might just do it all retrospectively but there's there's different opinions on that so if you think that no I really want to have the PowerPoint beforehand so that I can make notes on the PowerPoint and just let me know Sandra yeah I know you don't have Moodle I will put the stuff or the presentations themselves you probably will have to look on on Twitch because I think they're too big to put on my website but the PDFs will all be available on my website as well and I will send around an email to everyone so that they know where to download it if in case they don't have Moodle I have an overview of everyone who has Moodle on everyone who hasn't so for you I will put them at least on my website where I normally put them as well all right so then we on the 19th so that will be in two weeks we will start with things like DNA so history of DNA some sequencing what is a gene and the structure what our transposons what our regulatory elements other types of DNA and then how to use DNA as a biomarker and of course here we will start introducing things like the DNA databases like ensemble and how to use ensemble then we will start having a lecture about RNA so a little bit about the history of RNA which types of RNA are there expression expression analysis so here we will talk about microarrays again and how to analyze microarrays we will talk about RNA sequencing the difference between DNA and RNA sequencing and then we will also talk about structure of RNA since when you are talking about RNA it's not just about the sequence RNA function more or less like proteins is that the effect that RNA has is driven by the the 3d structure of RNA then we will talk about proteins again same kind of structure for the lecture history of proteins prediction of structure of proteins functional prediction and then talk about families and phylogenetic trees so how our proteins related to each other we will then talk about metabolomes and pathways on the 10th of of December so what is a metabolite how do you do a mass spec experiment how do you identify metabolize from a mass spec experiment so here we will look at some databases which are relatively unknown for most people since I don't think many people will do mass spectrometry experiments but I think mass spectrometry is one of these one of these fields which is very very useful but under under appreciated in a way so I really want people to have an idea of how do I analyze mass spectrometry data when you are faced in the future with these types of data so that you have a basic understanding of what's going on physically but also on how to analyze data coming out of it and we will talk about different databases which have kind of metabolome and pathway information databases like CAG like reactome and how to use cytoscape to make your own kind of biological networks on the 17th of December there won't be a lecture because I will be on holidays so that's lucky for you guys okay yeah very good yeah yeah I agree I like mass spectrometry a lot it's one of these tools which is becoming more and more and more useful and it's it's really interesting the data that comes out of it because it follows a completely different structure like you can't use normal statistics because metabolites usually are there or they're not there and when they're there they're like more quantitative traits so it's a very interesting trait to look at like metabolites and how they are how you should analyze them especially statistically they're a little bit they're a lot different for from things like microarrays so the 17th is free and then it's already January and then we will again go and talk about phenotypes a little bit and then I will talk to you about QTL mapping and genome-wide association studies so here at this kind of couples the DNA level or the RNA level with the phenotype level that we are interested in right we're interested in phenotypes and we want to know which part of the DNA is responsible for generating more milk or generating more yield from your plans so head and QTL mapping is also where I did my PhD thesis on so it's one of my my favorite lectures to give we will have a lecture about primer design which kind of ties in with microarray design as well but I think that it's good to have a very set lecture about how to design primers when you are going to do PCR reactions and of course here we could talk about things like corona virus detection and these things as well because that's also done by PCR at the moment and I could after the lecture you should be able to make your own corona virus detection primers in case you're you're interested in in doing things like that the primer design lecture is also really fun because we get to talk about Kerry Moulis one of my favorite scientists who won a Nobel Prize and it's kind of crazy it's an interesting story so we will have a whole story about his life and what he believes in and why he is so famous because her PCR reaction is really like the genome sequencing technology PCR is one of these things that have opened up a whole world for biology a whole new world which wasn't available before 21st of January we will talk about databases so how is a database organized features important databases that are out there and of course we will talk about bio March so how to use our to automatically download data from these databases so that you don't have to go to the web interface and click and point to have to kind of collect data by hand the 28th of January will be sequence analysis so it will be about alignment alignment alignment I will talk about homology and I will talk about sequence motifs so things like transcription factor binding sites or micro RNA binding sites so that will be in that lecture then there will be a lecture about gene expression analysis which is again wait is that a double lecture no so this is the lecture which is focusing on micro arrays on in February and then we're almost there because the 11th of February we will talk about standards for bioinformatics analysis so things like what types of files are there how are they structured things like naming convention we will talk about documentation and about testing which is generally overlooked in bioinformatics like a lot of people do bioinformatics but they write software but they never write documentation and I think that's a shame and testing is there to make sure that when you have an algorithm which is supposed to compute something that it really computes what you think it computes and then on the 18th we will have scientific literature mining management so here we will talk about PubMed about how to do kind of advanced searching and also talk about methods which will do like literature over representation analysis or do literature mining using a computer there's also a little part about reference manager software because I've noticed that especially people that are in the process of writing their master thesis or their bachelor thesis for that matter they have problems using using references correctly so we will talk about different software tools which can help you manage references and will save you a lot of time instead of doing it by hand of course contact me if there is something that you really want to have a lecture about so I already have four things written down like ensemble and automatic information breeding so I think the breeding part I will throw into here into the QTL and GWAS analysis because here it's a good time to talk about how do you select animals to generate a new generation which is better than the previous generations proteins was mentioned so we already have a lecture about proteins if you have very specific topics that you want to know about proteins like how do I define a protein domain I think you all have my email address so just send me an email and think like this is something that I want to learn and Chipsack I will see where I will put that probably I will put that in the DNA lecture next week I also might put it here into sequence analysis on the first of January in case I don't have enough time to really update the whole analysis or update the whole presentation before next week alright and then the exam I don't know yet when the exam is going to be that's because of the corona pandemic moving the winter semester a little bit but the exam generally is in the last week of lectures so instead of having because generally you have two weeks of the exam after the lectures but I like to do the exam at the last lecture because that then you have two weeks off and you you don't have to do anything after the lectures stop so I think that people would agree that that's kind of one of the nice things to do so to have the exam as quickly as possible alright and your own choice like I said if you come up with something and you think like this is the reason why I subscribed for this course this is what I want to learn then definitely send me an email and explain what you want to learn and we can we can weave that in with the current kind of things that I've planned if it takes too much time then we might make a whole lecture about it and vote which lecture we are going to swap out and that should be fine alright so the next lecture will take around two hours assignments are necessary to pass the exam just so that people do the assignments and it focuses on learning different databases different techniques learned the exam will contain questions about the lectures of course and also some exam questions will come from the assignments so doing the assignments will give you an advantage when you do the exam alright so that was it for today so the homework for today is install R on your computer at home or the computer that you're using you can for Windows you can install it here so you can just I will upload the PDF and then you can just click the link then it says download R you click the button and then after you've downloaded it you just install the software like any normal software on Windows alright so for me that's it for today if you have any questions currently just throw them in the chat then we can I can still answer them and otherwise if you have any questions coming up later then just send me an email and I will answer the questions by email alright good I think that's more or less it one question next week at 9 a.m. to know next week we will be from 14 to 17 so we will start at 2 p.m. and we will finish at 5 because a lot of the people that are from fishery sciences they actually have a lecture from 9 to 12 in the morning so that's why we're moving the lecture of course if you follow me on twitch you will get an email like four or five minutes before the lecture start informing you that there will be a lecture the new time I will put it also on Moodle explicitly so that people know that the next lecture is not at 8 in the morning like on Agnes but that it is at 2 in the afternoon alright I hope it's I hope 2 in the afternoon is okay for everyone if you have really big issues with 2 in the afternoon then let me know as well of course you can always watch the sessions back it's just annoying that you can't directly ask questions alright then if there are no further questions at the moment then I will say goodbye and enjoy the weekend already since it's only Friday left and then it's weekend and thank you all for all the questions and being active in the chat also for the people who followed I greatly appreciate it we're at 46 followers now so we only need four more and then I'm at 50 and I can store like the lectures for longer on twitch which might be useful for people and it will also allow me to show you real commercials during the commercial break which I'm not going to do but I could if I wanted to then so to make some money out of the course instead of just getting paid for it by the how alright thank you guys for the nice words and for me this is everything so I will see you guys then next week at 2 p.m. alright then stop the record