 so that we can put it on Moodle later. All right, so welcome back everyone. Also, people who are watching the recording. So what Morgan was doing is creating an F1 interbreed with X-linked genes, right? So these two phenotypes that they were studying, they were located on the X chromosome. And that means that when you are talking about males, males are called hemizygous recessive. So they are not heterozygous, they are hemizygous. And that means that they only have one copy of a gene or a DNA sequence in a diploid cell. And of course, we nowadays know that males have only one X chromosome and a Y chromosome. But that means that males are more likely to pass on recessive alleles to daughters and they do not pass on any X-linked alleles to their sons at all because they pass on the Y chromosomes to their son. And this is also why when we're talking about Mendelian disorders, which are located on the X chromosome, you see that males are more susceptible to this because they have no backup. They only have one single X chromosome. So if they have a broken gene on the X chromosome, they are always affected. Well, a daughter might be lucky and might have one broken copy, for example, from the father, but have a functional copy from the mother. So when we are talking about genetic disorders or diseases, especially when they are X-linked, males are much more affected or quicker affected than females. All right, so hemizygous recessive. I hope that that's clear. So this is more or less the crossing scheme that they did and using an F2 crossing scheme. So what you see here is that you have white eyes and miniature wings, a female, you have a wild-type male, but the male of course has a Y chromosome which is denoted by this little arrow. So you see white eyes, miniature wings. Here, you have a wild-type eyes and wild-type wings. For some reason, the W and the M are a little bit confusing. And then when you cross these, then of course all the females that you get will be wild-type and all the males that you get will have white eyes and miniature wings. If you then cross these two, then what do you see? You see that in the F2, now there are different phenotypes that can occur. So of course, we can get back the parental phenotypes which means that they have white eyes and miniature wings. So they look like the mother. We get individuals back which have a wild-type appearance. So they have normal wings, normal eyes. And something interesting occurs in the F2 because you also get individuals which have white eyes but standard normal wild-type wings. And you have individuals which have wild-type eyes and miniature wings. So here in these individuals, there was kind of a recombination. So in the female, because the females are heterozygous, they have both the W and the M gene, plus they have the W plus and the M plus gene, the genome can break between those two genes. And then when we can see is we can then count up in the F2 to see how many individuals we get. So in this case, we get 750 which have the phenotype of the mother, not the mother, but the original mother. We have 791 which get the phenotype of the original father. And then we have a certain number of recombinants where a recombination took place between the white eyes and the wings. And here we have a number of recombinants which have wild-type eyes and miniature wings. So now we can then see that if we just, so if you do the experiment, and we can see that there are 51541 individuals which inherited the original parental genotypes or the phenotypes in this case. And you see that there are 900 individuals who are recombinant. So from this experiment, they found that in the total offspring that they had, so 2441, there were 900 which have been recombinated. And then the idea what they had is that, if you then calculate the ratio between the amount of recombinants versus the total amount of individuals that you have, that you get a kind of distance measures. So how far are these two genes apart? And then you can very easily calculate that as a percentage. So when you calculate that as a percentage, you see that they are 36.9% away from each other. So the distance on the genome, on the X chromosome between the eye phenotype and the wing phenotype is 36.9 centimorgans. That's what they came up with. So they did these types of experiments, not just for these two phenotypes, they did those experiments for a lot of different phenotypes. So if you look at the F2, we have some non-parental phenotypes. So in the F2, the most frequent phenotypes for both sexes were the phenotypes of the parents in the original cross. So wide eyes, miniature wings, and red eyes with normal wings. The non-parental phenotypes occurred in around 37% of the F2 flies. Had this is well below the 50%, which is predicted if these two phenotypes would be independent, because if they would be independent, you would expect half of the individuals to be recombinants and you would expect 50-50 recombinant versus non-recombinant. However, because you see that it's only 37%, they concluded that these two genes must be located on the same chromosome and they must be located on the sex chromosome because of their observation in the F1, where females are all not affected while all males are affected. So they did this for a whole bunch of them and then Hamm Morgan did his proposition that during meiosis alleles of some genes are sort together because they are near each other on the same chromosome. And then recombination occurs when genes are exchanged between the X chromosomes of the F1 females and this crossing over event occurs at the fourth chromatin stage of Prophase 1 in meiosis. Of course, this was not his proposition. This is something that was figured out later. But each crossover event involves two of the four chromatids and all chromatids may be involved in crossing over as chiasma forms along the aligned chromosomes. So we will have a nice picture of that to kind of visualize that more. However, this was his proposition and this is why 100 years later we still talk about Thomas and Morgan and about his experiments with Drosophila. And so imagine two Mendelian traits. On the chromosome, they can be very far apart and then there's a low chance that the Ostrung will get both phenotypes because there's a high chance that when you have two chromosomes next to each other that there will be a recombination between the chromosomes. And so if they are far apart, and then the percentage, the distance between them will be very close to 50%. It will be less than 50% because if it is 50%, the two genes are located on different chromosomes. But if it is lower than 50%, but not very much lower, like 37%, it means that these genes are relatively far apart. If they are very close together, located on the chromosome, of course, then there is a high chance that the offspring will get both phenotypes. So that means that the percentage of recombinants, so the number of recombinants that you observe in an F2 will go down, meaning that the percentage, so the recombinants divided by the total amount will be smaller as well, so the distance will be closer. So this is called a two-point cross. So a two-point cross is when you take a heterozygous individual, an ABAB, and you cross this with a homozygous individual, which is AABB. And this, like I showed you, this can be used to determine if genes are linked or if they are independent. So if two genes or phenotypes are on the same chromosome or if they're on different chromosomes, and when they are on the same chromosome, you can estimate the distance between these two phenotypes by doing the computation where you take the number of offspring, which have recombinant phenotypes divided by the total number of offspring, and then you multiply by 100. So you get, in this case, in this example here, you get that there are 17 map units difference between the individuals. So that means 17% of the individuals in the F2 is recombinant out of the total population. So when we talk about genes in point crosses, we mean phenotypes, which are Mendelian. So we're not talking about genetics yet. All of this is theory, which was developed way, way before we knew that DNA existed, and these things occur in many different crosses. So the point cross that you see here is not a two-point cross, but it is a three-point cross, and that is because there are three different genes, so to speak. So you have an A, a B, and a C gene. And here, when you use a three-point cross, you can also determine the order of the genes. So if the order is A, B, C, or if the order is A, C, B, or if it is B, C, A. So these things you can also determine in a three-point cross. But we will come back to the three-point cross. If we talk about these Mendelian phenotypes, and we want to do this type of mapping, and the difference between an individual, which is heterozygous, needs to be very clearly different from an individual that is homozygous. For example, a Mendelian phenotype, because for complex phenotypes, these point crosses don't really work because there's not just a single gene involved, there might be two or more genes involved in creating your phenotype. And then, of course, this whole theory of Mendelian genetics, and Mendelian inheritance doesn't work to determine the order of things on a chromosome. So again, the two-point cross example, we set up the test cross, we look at the individuals that come out, and we cross again with a double mutant. So we have just a wild type, then we have a mutant. We get offspring with this, of course, heterozygous, and then we cross this with a homozygous double mutant, and then we look to see how many recombinants do we get compared to the total amount of individuals that we get in the F2 cross. So, of course, you have to literally do hundreds or thousands of animals to figure out the order of two of these genes or the distance between two of the genes. So, in a three-point cross, you can determine if genes are length and independent, just like in a two-point cross. You can get the distance between the genes, between gene A and B. You get a distance between B and C, but you also get a distance between A and C. And because you get three distances, and you can then infer the order on the chromosome. So if the order is A, B, C, B, A, C, or A, C, B, so you can see that, well, if the distance from A to C is small, the distance from A to B is big, and the distance from B to C is small again, then of course the order on the genome would be, well, first you have A at the beginning, then B at the end, then C is somewhere in the middle because the distance from A to C is smaller than the distance from A to B. So you can then figure out the order on the genome. And three-point crosses are almost always done between a heterozygous or an individual which is having more or less three mutations, and an individual which is wild-type, which has no broken alleles. All right, so when you design an experiment, you want to collect data on as many phenotypes or traits as possible. And an example would be that you take an individual which has genes P, R, and J, some with mutations, so a heterozygous individual and you cross it with a homozygous. And so in the progeny, each gene has two possibilities. So for the three genes, there are eight expected phenotypic classes in the progeny, so two to the power of three, because every gene can be in two forms, and then they can be either independent, so when they are independent, you will get eight different types of individuals which have a combination of these three phenotype classes. So here's a little bit of an example, and this is the example of Mendel. So here you can clearly see that in this case P, R, and J are, for example, yellow, elongated, and dry, which is the wild-type pepper, or wild-type thingy here. And then you have, for example, a purple, round, and juicy. So these are very clear phenotypes which you can distinguish from each other, and then you cross them, and then in the offspring, here you see the following phenotype, so you see the wild-type, you see the purple, round, and juicy, so these are the parenterls, and here we see the other possibilities, so the other six possibilities that can occur, and then you look at the numbers, and then based on the numbers, you can then calculate what the distance is between the yellow versus the round, and the distance versus the yellow versus the dry, and the dry versus the elongated, so you can get different distance measurements. And it's every time the same thing, you take the number of recombinants, and you divide it by the total amount, and you do that for each of the recombinants, so you see yellow and purple, yellow and purple, yellow and purple, and so you add up, in this case, these two for one distance measurement, you add up these two for the other distance measurement, and you add up these two for the third distance measurement, and then based on that, you can figure out the order of the genome. So one of the assignments for today will be to analyze one of these test crosses, and to do, or to show, and to kind of get a little bit of a hang of how to do a two-point cross and how to do a three-point cross. All right, so the recombination frequency is the ratio of non-parental phenotypes to the total amount of individuals. It is expressed as a percentage, which is equivalent to the number of MAP units or centimorgans between two genes, and so if a hundred out of a thousand individuals display the phenotype resulting from a crossover, then the recombination frequency is 10%, and that means that A and B are 10 MAP units apart on the genome. All right, so Thomas Morgan Hunt did this for a lot of phenotypes, and then they ordered these phenotypes on a genome, and then they found more or less, so they looked at all kinds of different phenotypes, for example, the bristles on the fly. They did this for the wings, they did this for the shape of the leg, and then in the end, in 1917, they came up with the following genetic map, and this genetic map is still accurate today. Even with all the research that we did on DNA and mapping genes and knowing where genes are, the genetic map that was drawn by Morgan in 1917, based on his observation of Mendelian phenotypes is still accurate up until like five centimorgans today. So it is, the quote here is a quote from, I don't know whose quote it is actually, but the quote is that Morgan's theory of the chromosome represents a great leap of imagination comparable with Galileo or Newton. It's a massive step in genetics, going from having just a theory about genes and them being on a string to having an ability to kind of create these maps of where on the genome are these genes located. And this is all 50 years before they even know that DNA was the carrier of genetic information. They didn't know anything about that DNA existed or what was the thing that was inherited, but they could already draw maps of how far certain phenotypes were from each other, and you can see that literally tens of thousands of little drosophila flies were used, all of them were crossed in a three point crossing structure, and you can see here all the different phenotypes that they used in 1917, like minute bristles, rough eyes, cardinal eyes, javelin bristles, and so there's a lot of very interesting phenotypes that they had, they had a lot of these drosophila mutants and every time that they found a new mutant, they would do a two point or a three point cross and then put the new phenotype on the genetic map that they were filling. And it's a massive, massive advantage that we have this nowadays for drosophila. And it also made drosophila one of the most popular animals in genetics, which is one of the biggest leaps forward in genetics. All right, so when we look at genetic maps today, then we have high resolution maps, meaning that we have a marker, probably every like 10th of a centimorgan, and these are genetic markers, so real markers that we obtain by things like PCR, but they are also still phenotypic markers in a lot of the genetic maps. So if you look at mice, mice also have a lot of Mendelian phenotypes and the genetic map of mice has real DNA markers in there, so where you have to do a PCR test to determine if something is like the short fragment or the long fragment, but there's a lot of classical phenotypes still on the map. And you can only do that, you can only put a phenotype on a map if there's a clear difference between the phenotypic observation. And of course nowadays, we can do whole genome sequencing and we can determine the exact position of a gene up until like one base pair precise. And these physical maps use molecular tools rather than data from crossover studies, but a lot of geneticists still like working on the original like genetic map. Instead of using a base pair map, they still use a genetic map, which is kind of supplemented with the phenotypic information. And of course, this phenotypic information can be used to kind of validate if you did not mix up your individuals because if you look at an individual and this individual should have gotten the red eye phenotype, but your individual does not have red eyes, then of course you know that somewhere some individual got in the wrong test tube. And these things you can't really see when you are using genomic sequencing, because then you just see base pairs or you see that long fragment, short fragment, there's no expectation on what you should get from a certain cross. But nowadays we can use molecular technologies, but still the genetic maps created by the phenotypes are still very useful, especially when it comes to kind of when samples have been mixed up. All right, so there was all that I wanted to say about genetic maps in Thomas Hunt Morgan and now we are coming to the complex phenotype part. So complex phenotypes are phenotypes which have differences in many genes and all of these differences just contribute a very little bit to the whole phenotype that you observe. So things like human stature, so what's your height is something in which like hundreds and hundreds of genes are involved. The same thing goes for a phenotype like obesity or flowering time or milk yield. And these are all phenotypes in which there's no single gene which causes a massive difference. No, there are dozens of genes which all cause a little bit of a difference. And so in that case, when you are working in genetics, you are trying to determine which locus or which part of the genome has an influence on your phenotype, be it human stature or be it obesity. And you try to assign the variants that you see in the phenotype to these little parts of the genome to get an overview of where there are genes which are involved in regulating milk yield, for example. All right, so when we talk about complex phenotypes, we talk about the genetic architecture underlying a phenotype. So which genes are involved and how much do they contribute to the eventual phenotype? And there's two major technologies or methods to kind of deal with these kinds of complex phenotypes. And one of them is QTL mapping. And in QTL mapping, you use an inbred population to study complex traits. So you take two or three or four or five founder animals and then you cross them in a certain way so that you kind of determine the structure of the genome. So that you know that every individual has 25% from the original mother and 75% of the original father. And so you can, because we can do this in, for example, mice or drosophila or cows, well, not so much in cows, but in many of these populations, we can kind of design how the genetics of our populations should look like and more or less based on the theory that we have from Mendel and Thomas Morgan. The other side is to do a genome-wide association study. So a genome-wide association study has the same goal as a QTL mapping experiment, find genes which are contributing to a complex trait. But in a genome-wide association study, you use a natural population. So for example, humans, we cannot decide who is going to procreate with whom. So we just have to do with what we got, right? Because we can't interfere in human reproduction and you cannot set up experiments that's highly illegal. So then we're forced to do a genome-wide association study. So both of these try to do the same thing, but the QTL mapping is based on the original theory of Mendel and Thomas von Morgan, where you're setting up a structured population, also called an inbred population. And in a GWAS, you are looking at a natural population and hey, you have no influence on the genetics of that population, but you can use both of these different populations to study which genes are underlying a complex trait. All right, and then this was originally the place where I decided to have the break, but we'll just continue since I've only been recording for 25 minutes. So before the break, we talked about Mendelian and complex phenotypes. I told you about what is linkage. So linkage is more or less how far apart two things are on a genome and how you can make genetic maps from Mendelian phenotypes using a two-point cross to determine the distance and then using a three-point cross to determine the order of how things are located on the genome. After the break, which we will just continue, is that we will look at some databases with different phenotypical information and I want to say some words about the statistical analysis of different phenotypes. And of course, the statistics will come back and the databases are something that we will need for the assignments because the assignments will of course have you taking a look into the database and trying to figure out, for example, what gene in humans is causing color blindness in males. All right, so a little bit of a definition. What is a database? So a database is an organized collection of data. It is the collection of schemas, tables, queries, reports, views and other objects. So in a way, when I think about a database, I always think about like a big Excel table which is storing data and has like references from one sheet to another sheet. So the tables are of course very clear what it is. The schema is more or less like what are the different columns. The queries are, well, how would you query it? Like you have like in Excel filtering options to say, well, only show me elements for which column number two has a value larger than five. And then a report and a view are more or less how are these things linked to each other? So how is one table related to the other tables? And of course, physically, when you talk about what is a database, it's nothing more than a dedicated computer which holds the actual database and has database software running on there and things like a communications API to talk to the database. But so a database, spiritually, it's more or less a collection of organized data. Physically, it's a server. It's just a computer somewhere which has a database program on there. All right, so some of the phenotypes or phenotypical databases that I want you to know and want you to look at is the IMPC, which is the International Mouse Phenotype Consortium. This is a very interesting consortium where they knock out all of the genes in mice one by one. So there's told you that there are around 20,000 genes in mice, so what they are doing is they are using CRISPR cusp and other molecular technologies to knock out a certain gene out of the genome, then they make a cross. So they reproduce animals which have this gene knocked out and then they look to see how their classical phenotypes change based on if this gene is there or if this gene is deleted in the genome. And this is a massive undertaking. There's literally like hundreds of universities involved and the database is a really good database if you want to know what your gene is doing in mice. So if you are during your PhD thesis and you are interested in a certain gene, like retinol saturas or something like that, then you can go to this database and you can see, well, what happens to a mouse when this gene is removed from the genome? The OMIM database is the online Mendelian inheritance in men. It is the place to go if you want to know about Mendelian phenotypes or Mendelian diseases in humans. So it's a big database and it has more or less information on any Mendelian phenotype that we know of. For example, where is it located? Who has studied it? Which papers have been published about this phenotype? And it's a really, really, really good resource if you're interested in Mendelian phenotypes. Then there's the Gen2Fen database, which is the genotype to phenotype database, which is more or less the same thing as the OMIM database, but it is more focused on complex phenotypes. So they have like Mendelian phenotypes in there as well, but they are a collection of all the relationships that people have found using gene-wide association in humans between genes and phenotypes in humans. And then there's GeneNetwork. I always want to show GeneNetwork because I've been working with the people who develop it. So GeneNetwork is developed by Rob Williams, who works for the University of Tennessee, Memphis. And they have a big phenotype database on mice, not just any mice, but BXD mice. So this is a cross between two inbred mouse strains. And there's around 150 to 200 kind of immortalized mouse lines that came out of this. And they have like a massive, massive amount of phenotype information, but also endofenotypes. So things like gene expression data, protein measurements, metabolite measurements. So if you're ever interested in, well, I have this gene or I have this phenotype that I'm interested in, which genes in the genomes are correlated to my phenotype of interest, then you can go to this database. Of course, there's hundreds and hundreds of more phenotypic databases, but you have to make a cutoff somewhere. So hey, the ONEM database is really good when you look at Mendelian phenotypes. The IMPC is also really fun because they are kind of knocking out genes, which is kind of a Mendelian thing, right? So you knock out a single gene and then you see what changes in the mouse, but it's not really Mendelian because they're kind of muddling with the genome and not so much with the phenotype. And then the other two databases are more or less for really complex diseases, so complex phenotypes if you want to look into that. All right, so IMPC, this is more or less what it looks like. I wanted to do more or less a live demo at this point, so let me see if the database still looks the same. Not so much. So at the IMPC, their goal is to produce germline transmissions of targeted knockout mutations. So what they do is in embryonic stem cells, they have 20,000 known and predicted mouse genes and they test each mutant mouse line through a broad base primary phenotyping pipeline in all major adult organ systems in most areas of major human diseases and they provide a centralized data service and portal for free, which is really, really good. So currently they have knocked out around 7,022 genes of the 20,000 genes that are in mice. So let's take a look at the website. So let's see if this works. So here we see the current IMPC website. So there's two ways of searching, so you can search for genes and you can search for phenotypes. Yes, just use the cookies. So does anyone have a gene of interest where they are interested in, like many of you are probably in a master phase of your study, so you either are doing a master project or thinking about doing a master project or of course when you have a phenotype of interest then just throw it in the chat and we can have a quick look in the IMPC database. I didn't really prepare, I have some of my own favorite genes so we can look at that, but I think it's more fun if you have genes. All right, so SOX21, first one on the list. So SOX21, it is called the Strygene. It's the sex determining regions on Y. So you can see here what they have on it, so we can just click on the link. So here you see that the MGIID, so here they have not assessed viability yet. There is no embryo viewer yet and it is currently not registered for phenotyping. So phenotyping is currently not planned. So unfortunately SOX21 is one of these genes which probably when they tried knocking it out produced to be lethal. It happens sometimes. So sometimes you knock out a gene and then you figure out that this gene is essential for living. But they did do, you can still order mouse which have a mutant in here. So you see that they have two mice which have been, so here they produced mice, so here the whole gene was deleted. So there's a deletion and then the other one is a gene which is a vector knockout so they put like a lock set operon into the gene. But unfortunately these have not been phenotypes so you can't really see the nice thing about this gene. So one of the genes that we study in our group is BBS7. So just to show you how the website looks when there is actually phenotypic data. So here you can see that they produced embryonic stem cells, then they produced mice and then they did push all of these mice through the phenotypic database and then we get the newsletter. No, we just have to click on it. And then when you click on it you see here in an overview more or less the things which are significantly changed and what is not significantly changed. So you can see for example that the cardiovascular system of mice has no significant changes when you knock out the BBS7 gene. However, their behavior or neurological behavior is significantly changed and one of the things that you see here is that there is a significant change in the metabolism of these mice and in the adipose tissue. So our group, we study something called the Berlin Fat Mouse and we identified that BBS7 is one of the top candidate genes to explain why the Berlin Fat Mouse is fat because there's a mutation in our gene and this mutation causes these mice to be well, more or less extremely obese. But then you can see that there are four phenotypes which are significantly different between mice which have a knockout. So they're either heterozygously knocked out or homozygously knocked out. So you see when the gene is knocked out heterozygously you see that there are abnormal lens morphology which means that there are some issues with the eyes and there is abnormal pre-pills inhibition and they have an increase in sodium so there's more salt in their blood and then you can then click on it and you can get more information. So they for example looked at all of the different anatomies so they have images as well when you want to see where certain gene is expressed and they also have like for each of the mice that they produce they do all kinds of x-ray images to show you what is different. With these mice. So when you for example go to the abnormal lens morphology and then you can get an overview of what it is and then in total they had 3984 mice that they have a state and then you see actually where the differences are coming from. So you can just look at a gene and you can look at for example measurements so you can get the data for different phenotypes and you can look at the data and so here you see for example that they if you would think about grip strength and then you can see for example the grip strength on the bottom and so you see that heterozygous animals have a slight difference in grip strength but it is not a significant difference but you can see here that the pre-pulse inhibition which is an assay that is used when you scare mice and then you can see that they are more easily scared than normal wild-type mice and you can look at for example one of the things is for example the metabolism and here you can also see that that there are no massively significant differences but you can see that on average that they have 70 the effect size is there so hey you see that the fat mass the fat lean mass is different from the individuals which have a knockout compared to the individuals that don't so it's a very good first start to kind of get an idea of what your gene might be doing so if you are working on a gene in the future then it's always worth to put the gene into the IMPC database to see if there's a mouse knockout and to kind of get an idea what this gene might be involved in and so in this case we learned that hey it is significantly associated with mortality because if you knock out two gene copies so hey you make a mouse which doesn't have this gene at all then it dies before it's born hey if you look at the metabolism you see that there's a significant change in the metabolism of these mice they have some behavioral issues and there is a significant difference in the morphology of the eyes and so if you are interested in like oh so what would be the best tissue to study this gene then you could say well probably brain or eye tissue or fat tissue would be very good candidate tissues to begin with and like I said they currently have 722 knocked out genes so it is almost half of the genes in the genome that they were able to knock out at this point and get some phenotype data on and hey you can search by human disease you can search by phenotypes so if you are interested in for example length or something like that length in the mouse is not so BMI for example that doesn't find its homeostasis metabolism so you can just click through and then hey there's two and a half thousand genes which have an influence on that and then you see that something like length of tail maybe yeah that would be a good so that would be tail length abnormal tail length it's called so when you look at abnormal tail length you see that 9 genes have a significant effect on the length of tail so it's the CANT1 gene SIP561, DNA1, FGF7, HOC C12 so you get a list of genes which they found to significantly affect the length of the tail and of course this is a very good resource to put next to your own data that you have collected to see if there might be an overlap or to figure out which tissue you might want to look into so there's a lot of genes that people might be interested in and if you're lucky they have a knockout if you're unlucky then they don't have a knockout but it's a very good database for a first start to kind of get an idea what might my gene be doing or which genes might be interesting to study when I'm interested in a certain phenotype so also human diseases so I don't know if they have Alzheimer no so when we go to human diseases rare diseases like I have a good example of a human disease which they might color blind no you can't test mice if they're color blind so what would be, things like stroke or something would be probably phenotypes they have a normal blood circulation decreased cardiac stroke volume yeah that's not really what I wanted so if you, yeah obesity will get you like hundreds of genes so abnormal body weight, abnormal body size alright I'm just going to copy paste that one to SARS I'm interested if they, no there's no perhaps if I search it for like this no they don't have a phenotype called like that but if you are, that's probably something blood right so blood disease or blue skin disease so that would be skin skin haemorrhage, scaly skin, flaky skin, abnormal skin loose skin, dry skin, reddish skin, cyanosis abnormal skin pigmentation that might be the one that abnormal skin coloration so it might be that it falls under this category and of course like you have images for all of them so if you're interested you can go through all the images of the different skin types of the mice to see if one of them might have had blue skin so it's a nice explorative database to get an idea of what your gene might be doing so you can also see that this thing is so this gene is the tail color so the skin color of the tail there's also the foot pad skin color underneath the foot of the mice and you see that there are literally like 199, 112 genes which are significantly associated with an abnormal skin coloring that they found in their pipeline so a very interesting database if you have a gene or if you have a phenotype throw it in, see if something comes up if something comes up then you can kind of investigate further but it is as since they are only halfway through and of course there's phenotypes which are only occurring when you have two broken genes or when you have three broken genes together but this gives you an idea of what every gene individually is doing in the genome and if it is important to have this gene alright so far the IMPC database there's more information on the IMPC and it's a good starting point if you have to write, for example it's an inhibition of hemoglobin like through nitrite toxicity not actually skin coloration well probably if you would just search for hemoglobin that's a gene right? is it hemo or hemaglobin? I think it's hemoglobin should figure out what that gene is there is probably a hemoglobin kind of gene hemoglobin, it's protein what's the identifier for it? HPB so it's just called HPB alright so they have phenotype data available so we can look at this one to see what if they have any skin colorations come on, go internet alright so here we see that there is significant differences in the immune system so abnormal red blood cell distribution abnormal mean corpus hemoglobin increased red blood cells so these are the phenotypes which are very significantly affected when you knock out hemoglobin you can see that you can actually knock it out homozygously and it's not lethal which I would think that you would need hemoglobin but Anna Margareta enabled emote only mode for this room okay thank you for disabling that otherwise people would only be able to talk in emojis which would make it a little bit difficult that's okay, you can play around with all the things like I have this nice button on my stream manager which says start watch party I have no idea what it does if you click it we will be in a world of hurt alright so far the IMPC database it's just a very straightforward database it's interesting to search through it and if you're very interested in a very specific phenotype you can always see if there's anything that they have there as a starting point so you can go from a phenotype to a gene and you can go from a gene to a phenotype and the gene to phenotypes is very broad like you can see that they measure literally anything that you can measure on a mouse so they do more or less all the developmental tests and all of the other tests so that's a good database to have a look at alright so back to the PowerPoint OMIM so OMIM is a very similar database but it's not based on people knocking out things in a mouse or in another organism it is the online Mendelian inheritance in men so it is the database when you are interested in Mendelian diseases there is one there's one assignment or two assignments for today where you have to go into OMIM and figure out what is causing color blindness but just let us first look at the definition or the way that they describe themselves so OMIM is a comprehensive, authoritative, compendium of human genes and genetic phenotypes that is freely available and updated daily so the full text reference overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes OMIM focuses on the relationship between phenotype on one hand and genotype on the other hand so quick look at the database when we look here at OMIM this is how it looks and here we could actually probably look for your methylomoglobin so let's just try that so here you see that there are 17 different entries for your phenotype of interest testosterone and the first four are methylglobina due to deficiency of methylmoglobin reductase so this actually is a Mendelian phenotype so when you click on it they have all kinds of different names and different descriptions then you can go down it shows you where the gene where the phenotype is located because it's a Mendelian phenotype it has a location on the genome like a phenotype in Drosophila and they put it at location 22Q13.2 which means that it's on chromosome 22 and then Q13 has to do with the banding in humans of course you can figure out which location it is exactly by just clicking on it and it should give you an overview of the whole chromosome so it's this cytochrome B5 reductase 3 which is causing the methylmoglobinemia it is a difficult term so and you see that it's next to this alpha 1,4-galactosylotransferase which is actually causing different blood group types so it's in the neighborhood of other phenotype or other genes which have different phenotypes but if we go back to the overview and then you can see that it has a nice little bit of text describing the phenotype it's autosomal dominant it's referred to as the M type it is caused by a variation in hemoglobin A or hemoglobin B and then it has some description of the phenotype that we're interested in then it has it is also associated it's also supposed to be associated with heart disease so we can see so Gibson and Barcroft so the phenotype was first described in 1948-1945 correctly concluded that erythrocytes from affected individuals were unable to reduce form continuously at a normal rate credited with the identification of an enzymatic defect in erythrocytes increased circulation levels of witches brown gives the skin a bluish color in the normal state about 1% of hemoglobin exists as metaglomins individuals become symptomatic when they rise above 25% Fuscular collapse, coma and death can occur when metaglomins also blow 70% in total hemoglobin so it is actually a pretty dangerous disease if it's not and then you have the different subtypes which they describe and the nice thing is all of these are link-outs to real papers that you can just click on and read the original paper but it's a very very good entry point if you want to learn something about a certain Mendelian disease and it gives you an overview in this case an overview of around like 80 years of research in just a single web page it describes the different types who have worked on it when they've worked on it it's even very detailed and biochemical features the diagnosis how to manage it clinically the molecular genetics population genetics they have a really really good overview of a disease so if you're interested in things like in a Mendelian disease and you want to learn more about it then definitely check it out in OMIN because OMIN is the authoritative database when it comes to Mendelian diseases and you can see why because this is just one of four pages that they have about this disease they have a very good overview of citations and it is all curated so that means that real scientists or well real scientists that sounds a little bit weird but it means that real people have checked the data and made sure that everything is correct which is different from a computer searching through PubMed and trying to find out certain things so a lot of a lot of citations a lot of information about your disease that you are interested in and it might teach you something that you didn't know it might confirm stuff that you already knew and it will give you a whole bunch of citations to start working on and especially the latest citations are interesting because that shows you how far people have gotten so what is the latest research on your disease of interest of course OMIN is very focused on disease and that's the only real problem that I have with it because in many cases especially in biology or in kind of genetics you're not so much interested in diseases but you're interested in like phenotypes that we do is for example look for genes that have an influence on milk production in cows or we look for genes which are in plants increasing yield and then of course OMIN is not the right source to start but a very good database in case you ever get to work with human genetics and there are many many different diseases that you can look into so just color blindness it will give you a whole bunch of different types of color blindness like partial proton, deuton series and you can click on them and then you can see that again they have in this case a slightly shorter history but it's a very good starting point if you want to learn or want to get into a certain field if you're ever applying for a job and you know that these people are investigating methamoglobinemia then read the OMIN page on that disease and then hey you have a very broad understanding of what people in this field have been doing in the last 20 to 80 years which of course is a big chance of increasing your chances of getting a job so very interesting database very very interesting phenotypes that are in there and of course things like earwax should be in the database as well so uproclined grand secretion so here you have the earwax wet and dry which is located on chromosome 16 12.1 and then you can click on this and there's a whole description on earwax and how population genetically it is and the original Ainu population of the island of Hokkaidu has an exceptionally high frequency of the dominant wet earwax phenotype compared to those of neighboring Asian population so they really give you a good overview of current knowledge in the field of this phenotype alright so enough or you can open up the website yourself and throw in a couple of phenotypes I just wanted to show you guys that it's a really useful database if you're interested in Mendelian diseases alright so next database that I wanted to show you guys and this is actually a screenshot of the old version I think the new version is live so this is gene network it's a database which is a group of linked datasets and tools used to study complex networks of genes, molecules and higher order functions and phenotype it contains more than 25 years of legacy data generated by hundreds of scientists together with sequencing data and massive transcriptome datasets such as expression genetics or EQTL dataset and again this is a very mouse-centric database there are databases very similar to this for plants like Arabidopsis or for say elegans so hey if you are working on a different species then just googling around a little bit will give you a phenotype database unique for your species gene network again a database based on mice but that's just because I have a mouse genetics background so let's take a look at the database so gene network is genenetwork.org and this is the new version that we are currently working on and you can select your species so you see that they have mouse data but also human data data on rats, on monkeys Drosophila, barley, Arabidopsis poplar which is a tree soybean and tomatoes and I think they are working on switchgrass oh you can't see my oh that's interesting that it doesn't capture the drop down menu so when I click the drop down menu it tells me which different species I can select then when I have selected a certain species that's bad that it doesn't capture that so for mice the BXD family is the the main database that it originally started with and they have many many many different families of mice so for example the new collaborative cross mice are in there which are mice which are generated from eight different founder strains but also the BXH family they have the mouse diversity panel and then they for example have aged BXDs they have like longevity studies they have the cannabis pilot that they did so if you're interested in these genes are correlated with cannabis use in mice then they have a database specially set up for that where you can then cross reference these genes to for example well do these genes that are correlated with cannabis use also show for example an increase in fat mass you can think the mice might get like the munchies so they want to eat a lot of stuff so it might be genes that when you give cannabis to mice comes up also in obesity research but they have a lot of different families a lot of different data sets so there's a lot of different information in here and again you can just get any gene so the gene that is actually the standard gene that I always search for is the SHH so the Sonic Hedgehog gene is one of these genes which is kind of famous and I can show you a little bit of how the database looks like so here you see that we're looking at a single probe which is located in the Sonic Hedgehog gene which is distal to the 3 prime UTR and then you can see here for example you can do some statistics so they have measured 71 BXD mice the mean expression of this gene is 9.2 and you see what ranges but the nice thing what you can do here is for example you can calculate the correlations and then you could say well what gene does this Sonic Hedgehog gene correlate with what might be the interaction partners of this gene there's different mapping tools so you can just do a QTL scan so you can take the expression of this gene and then go across the genome and see if there's a locus on the genome somewhere which is correlated with the expression of the gene and then there's then if you are looking at your data you can also live edit the data if you wanted to and then use the data that you edited to do QTL scans or do correlations so they have a whole bunch of tools in here so they you can for example look at different snips in the mice or the humans or the barley or whatever you selected you can do a FIWAS which is a different type of genome-wide association you can browse the genome and there's a lot of additional stuff in there but I just wanted to show you the database and especially since I've worked on the database as well and I've been contributing back data so a lot of the data on the Berlin FATMouse is also found in this database so and it's not only just the BXD mice but there are other mouse strains and there's humans there's barley and there are popular trees so the database is getting bigger and bigger and here above you see actually more analysis is that you can do when you select like a group of genes so you can for example do weighted co-expression gene networks or CTL maps or other types of analysis like network analysis so one of the nice things is all of this data is of course free and you can just download like large data sets if you want to which of course is very useful different types of searching you can search by different chromosomes and positions and I always like this visualization where you can see who is currently using it so of course I'm the one in Berlin using it but there's also someone in Atlanta Georgia currently on the website so very interesting database a lot of free information and there's probably a couple of nature publications in the database which have never been discovered because like there's so much data that it takes like hundreds of years to do all kinds of analysis to kind of figure out what you want alright so that's gene network very mouse centric in that sense but it's becoming more and more broader could you check for a relationship between Sonic Hedgehog and Sox21? yeah sure you can do that so let's go to Firefox then we would do Sonic Hedgehog so we would search for Sonic Hedgehog first we would add the two probes to my shopping cart so I will just say create a new collection so create just create a collection like that so then I have the two Sonic Hedgehog genes in there then I can go back and I can look for Sox21 I can search then here so Sox21 has three genes so I take the three genes and I add them to my collection so now I have a collection of the different genes and now I could do for example correlations and then it will do the correlations between them so you can see that so here we see Sox21 the first three so there is actually a significant correlation between Sox21 and the Sonic Hedgehog when you would click on the correlation I think will give you a scatter plot so here on the one axis you see the 509 which is the Hippocompus expression of Sox21 and on the other axis you see the expression of Sonic Hedgehog so you see that when you the higher your expression level of Sonic Hedgehog the lower your expression level of Sox21 so there seems to be a relationship between so a negative correlation between the two which might mean that they are part of the same pathway or that they are co-regulated in some way but you see that this relationship is not perfect but you can look at the individual points and you can then see what their relationship is and of course because these are probes on a microarray they are targeting very specific parts of the gene so if we would go back then you see that only for this probe here together with the fourth probe is there a relationship which is significant and strong there's of course a significant relationship between Sonic Hedgehog and Sonic Hedgehog which makes sense of course but then the first probe for example to Sox21 with Sonic Hedgehog but it's a very good database and this is just a single tissue right we're only looking here at the hippocampus which is a very small part of the brain you could look at the same relationship and then see for example in fat tissue or in heart muscle or in liver if there is also correlation or if there's a stronger correlation because it might be that in the brain they are co-regulated together but in fat tissue that they are not so there's literally hundreds and hundreds of different tissues and you could see for example also if you take the young BXD mice so the ones which are younger than a hundred days compared to the ones which are over 200 days old to see if this relationship changes over time and of course all of this data you can just download you can put it in R and then do your own analysis on it but a lot of data like literally if I've looked at the database once and I think the whole database is like 17 terabytes in total which is 17,000 gigabytes of data which is currently stored in this one database and hey you wouldn't say it from the web design but it is a very very useful database like 25 years of data available for free and there's probably like I always say there's probably a couple of nature publications in there just waiting to be discovered so alright so if anyone has any other questions then just click around the website for yourself oh break time yes yes definitely break time although yes let's do a break first 10 minute break I will stop the recording now before the lecture becomes too long