 So now I will present you the tools which allow you to take advantage of the data that we integrate and structure in BG to find biology basically. So the first thing we have is that we have expression calls as you've seen and so one thing is that similar. So, one thing is our gene page, and our gene page, Frida has shown you some highlights of it before you're going to see for given gene all the expression conditions defined by anatomy development sex or strain. You can restrict the data to see only the information from RNA Seq or single cell RNA Seq and C2 hybridization and so on. And you have the separate list of which in which I took a structures gene is significantly present or significantly absent, both from the direct observation and the propagation that Frida showed you. So, you can call a gene present in the brain because there was an experiment done on the brain or because it was expand on the cerebellum because it was present in the cerebellum it is present in the brain. So I have directly on each gene page orthologs and paralogs so homologous genes and you can click and it takes you to a tool I'll show you a bit further of comparison of expression between genes. And of course we have cross references to uniprot ensemble gene clouds and so on. And so this is. And then we have also similar to how you can browse the process data that and the raw data that they just showed you can browse also our expression calls to here I have looked for the gene. The expression calls of the gene hogs the eight in zebrafish I think, and you have here where the information comes from, and for each of them you can eventually click and find all the. Where did we get this information from and what is the exact values and so on for all the calls per gene. So I'm going to show you and do a small work club. Okay. So if you, I'm going to ask you how you want to use the gene page so I'm starting the work club. So, you've seen all the types of information we can have and you can configure the gene page to see your favorite information and priorities so here you can choose in this work book club where you want to see only an article entities only development stage only sex only strain, or some combinations so I didn't put all the combinations possible so if I didn't put your favorite one you can just click other. The votes are very spread. So, most of you want to see either only anatomy, which is what we show by default if you go to our gene page. So if I click here on a gene, you see here by default. It's loading here you see an article entity only. So you see which other entities where this gene is expressed in priority, but also many of you would like to see anatomy and development stage so age or embryonic stage. And then some also other features so that's why we have all these options that if you want to see primarily only strain or sex or some combination, you can see it to my PowerPoint. And so another the next one I'm going to present. I think it's very original to be G. It's we call it top and at. And so the idea you probably most of you know gene ontology enrichment so you have a list of genes that you got from some analyzers or some experiment you wonder what do these genes do, and you plug them into some either website or our package and it tells you. You have more kindness and transcription factors than expected by chance in the background genome, and you learn something about the function. And you can also learn something about the function of your genes if you look where they expressed and so we have implemented exactly the same type of test or the mathematics is the same. So we have comparing whether we have over a presentation of genes annotated to specific gene ontology terms we think if we have we look if we have over a presentation of genes annotated to specific anatomical terms. And so we can compare these lists so here we have. For example, from an example I have a list of genes and 46 of the genes of this list are found in the frontal pole, which is a part of the brain is this more than I expect by chance. Well for this I'm going to do contingency table so I have 46 genes from my list which are expressed in this frontal pole this you brought an ID, but they also have 3464 genes from my list which are not expressed in the frontal pole so some are but most are not, but if I look at all. And if I look, sorry, these are other genes expressed in front of Poland from my list 56 I expressed in not expressed in the frontal pole sorry for this. So, I have 46 out of this total 46 plus 56 which I expressed in the frontal pole from my list was from the whole genome. I have 4664 out of more than 36,000. And 35,000 sorry. And so this is six times more than expect by chance I would expect much less sorry I'd expect six years so this is 7.6 times more than expected by chance I would expect six genes from my gene list to be expressed in the frontal form. I have 46 that 7.68 times more than expected by chance, and this is highly significant by an exact Fisher test. So my gene list is over represented in the front of pole and you can actually reproduce this exact analysis here. So for example from the top and at pages genes which were found by various genetic studies to be associated to autism, and they, the frontal pole is a part of the brain known to be associated with autism, and these genes are much more expressed in this part of the brain then you would expect by chance. So if we do this for every anatomical structure so for every anatomical structure where you have expression, we're going to take our gene list, look how many expressing this anatomical structure how many are not expressed, do this for the background other genes. Do a Fisher test, we could also do a hyper geometric test, and we use a deconvolution of the, we have the possibility we use the code from top go which is a package to do. The gene ontology enrichment which allows to take into account the graph, because obviously if I have genes are more expressed and by chance in the frontal pole they're also more expressed than by chance in the brain in general. So I don't, I want to not repeat 10 times the same information because they have all these levels of information they expressed in the frontal pole in the brain and the head. This is redundant information so we can deconvolute it's called the ontology graph by different methods and we've basically modified the code from the top go package which does this for gene ontology so we can do it for anatomical enrichment. Now I want to attract your attention if you do these enrichment analysis whether you do them for gene ontology or whether you stop a nut. They're very powerful potentially they're very interesting but they have pitfalls that you should be aware of. And the number one pitfall is background so if I compare. The genes which were found by whole genome scanning. My background should be the whole genome so that's pretty easy, but if for example I have differentially expressed genes from an experiment. And I want to know what is the gene ontology enrichment and what is the top and that enrichment. The background is only the genes for which I could do the differential gene expression which means the types of genes I could access for my protocol maybe I only took poly A maybe I only took coding genes and so on. And the ones for which I had sufficient data so genes which for example I filtered in the first step is being not expressed with some arbitrary 3pm cutoff or will be G tools. I didn't do differential gene expression so they cannot be studied so I want to remove to have as a background as my expectation here other genes only the gene which I could study which could be in my genius this is very important because if you do this wrongly. Your results you're going to think an organ is over represented in your gene list where actually it's over represented in your experiment in general. So for example if you look at g was results, they tend to more easily find long genes so your sample will be over represented for long genes. If you look at genes which were duplicated in one species right up to another you can only study the genes for which you find homologues between the two species and so on. So you want to take the right background. The pitfall is simply multiple testing so we implement FDR correction there are different tools to correct, but you just think that because you're doing this over every term of your ontology, and we have tens of thousands of terms. You're going to have a lot of gene multiple testing sorry. And the terms are not independent I told you frontal lobe is not independent of brain. So we use the algorithms from top go to correct for this. And the best is available both on our website and in our package bgdb. So on the website, they're not all the possible options but it's very easy to do it's like most of you are used to tools to do gene ontology enrichment, you paste your gene list. You choose a background or you take the whole genome by default. And you just click analyze, and it's going to give you the result. So you want to be much more to use much more parameters or simply to include it within your R pipeline, we have an R package where you can not only do the same things but you can also for example specify. I only want to see expression in a certain stage of development only in old individuals only in mid development and so on so you can be much more specific. So I'm going to ask you in the Google doc to give me the right to me a description of a gene list that you have tested or would like to test and top that so which is the gene list which comes from your work or your interest where you would like to see okay where these genes expressed, you know, it could come from a medical question evolutionary question gene family anything which is of interest to you. Now something we are also quite unique about this homology so in now we didn't really develop this because we don't have time to show you everything but in our curation work we also do manual curation of homology relations at the anatomical level so most of you are probably used to finding autologous genes for various methods, but also you have homology at the anatomy level so the human brain is homologous to the human to the mouse brain that's kind of trivial but when you get to more specific substructures it's less trivial and when you get to more distant species say comparing a human not to a mouse but to a zebra fish or to a fly, it becomes less and less trivial. And so for this, the homology at the anatomical level is curated by reading specialized literature so evolutionary developmental biology, comparative zoology, paleontology, all this literature, and we have substituted the specific homology at the anatomical level and we have a tool, which allows you to access it so this tool simply, the anatomical homology browser, and you put anatomical terms from your pathology and you choose species and it gives you the homology. And so for here, for example, I put all the tissues, which I found in the human GTEC large data set you've heard about several times. And as for homologous between human and zebra fish so I find, which of these tissues which of these samples that have been studied in this very large human data set could I also study in zebra fish. I said, there are several here which don't have homologous. And I find also several which have clear homologous so hypothalamus of human and zebra fish is the same word for the same thing for homology. I also have cases where it's different terms, so the hot left ventricle or the primary hot field here or here the lung and the swim bladder. So the mammalian or tetrapod lung is homogenous to the swim bladder of tell us fishes. So if you wanted to compare the gene expression between homologous organs, this is the samples you should compare. And now that's purely at the level but now we can leverage this homology at the level to compare gene expression between species. So, as you know, many. The resources give you autologues and paradox genes that you can compare between and inside species. If you want to compare not just the gene sequences but the expression between species need to say, What am I comparing. What does it mean to compare gene expression between a human and zebra fish I need to have the structures in which I can compare I cannot just say this is the level of expression in a human or in a zebra fish I want to say the level of expression in the lung and in the swim bladder in the, what do they have here in the hypothalamus of each species and so on so we combine these two information. So if you go to this page called expression comparison you can put the gene list from several species here and like to open up we could put only one species, and it's going to give you the structures which are homologous between the species represented which have expression in these for these genes and by default it ranks it on the expression score, but you can change the ranking by clicking on these columns here. And so for here I have clicked on the example of SRM for which is a vertebrate brain specific gene from the literature, and it gives me the orthodox of this gene and various vertebrates. So here I have that there are three species with data in cerebellum cortex, not all species have such detailed data, and that's the highest expression there are nine species, among these which have data in the cerebellum and that's the second highest and so on and see that I have very. I have brain parts, which come up which have the expression of all the genes which have expression expressed there I don't have any which are absent and they have very high scores, which are shared between these pieces. We can compare any gene expression between species very easily like this and on our gene page there's a link on every gene page to the orthologs and the paradox which are predefined by some other by finding resource, which bring you directly here we can compare directly so you have your gene of interest zebra fish you think okay how conserved is this expression other fishes, you click on orthologs you click on the level of taxonomy of fishes and you click comparison or bring you here and give you the answer and we are the only to give you this answer because we're the only ones who annotate to the homology of anatomy, and who integrate all this information so here, maybe in one species it's iron is sick and another species microwave and another species single set iron is sick. It's all brought together to give you this answer which is the biology that we need of how conserved is the gene expression between these pieces. So that's the end of this course. So, I hope that we've shown you how BG could help you understand gene expression which as you've seen I think over the morning is rather complex concept and with many different parameters which intervene to help you do biology which is really our goal. So, as I said I would have to leave a bit early but we can start to have some we can already answer your questions now and then Frederick will stay a bit later than me I think. Before that things considering the interest for the expression comparison and the question the UNG maybe I would like just to show briefly that you can do the comparison actually. So, yeah, if you go to the gene search, you enter UNG and you can click on the first link here. And then you have a link here autologues or autologues here, and you get the autologues at different taxonomic level so and you can automatically compare expressions so here at Colorado level. You can see that it's a bit slow already did it. So those are all the gene modifier of UNG genes. And what you can see that apparently there is a huge conservation a high conservation in ovary and testis reproductive system or generally, and, and the crying system so I don't know if it makes sense digestive system. So in the renal system so apparently it's a lot of reproductive digestive and the crying renal. I don't know if it makes sense to you. I have a couple more questions about you and G genes. But yeah, just wanted to show that then it's not an enrichment as in top and that but you can look for the expression conservation between species.