 So, I'm going to talk to you today about, you've heard a lot about networks, and you've heard a lot about gene function and gene ontology annotations, and I'm going to talk to you today about how to use networks in your own research to say something about gene function, either about the function of genes that you don't have an assigned function for that have come up in an assay, or to say something about a gene list that's come up in your own work, and you want to get a little bit more information about something about the functional annotation in that list, but also how the genes within that list interact with one another, and I'm going to be largely focusing on concepts. There's a whole field of gene function prediction that I'm going to try to cover, but largely what's most important for you, from your point of view with the interfaces that you're using is to understand some basic concepts in the field, and I'm going to illustrate those concepts with a, I think, a very nice web interface that Gary Bader and I, our labs developed about seven years ago now, called Gene Mania. It's not the only such interface, and we'll go over a couple other, at least one other interface, that provides similar functionality. So I like to start with some learning objectives, and do we have a pointer? I guess I have a pointer. Okay, so I'll start on this side of the room and then go over there, so everybody gets some time with me. So the main concepts that are important to us are the concept of what's called a functional interaction network. I don't know if you've heard that term before during this time here. I mean, the Lincoln and Robin stuff is a functional interaction network. Did they use that term? Yeah? Okay, great. So you already, it already went through the way there. Also, the idea of guilt by association, this is a way of inferring gene function. It's the standard algorithm that people use. And the systems that I'm going to be showing you, I call them gene recommender systems. I call them gene recommender systems, because there's a lot of other recommender systems in your life, like Amazon recommends you books, whatever radio app you use, recommend songs for you, that sort of thing. I guess Facebook has started to recommend you stories. And so these systems, they basically say, oh, you like these genes, well, here's some other genes that you might like as well. And that's one good way to think about how they work. So in order to have these gene recommender systems work for you, I'm going to go over here now. You need this concept of combining together networks in different proportions. And that's a context specific network weighing scheme. It's kind of a long phrase, but it'll become clear what that means during my presentation. There's multiple ways to implement guilt by association. So guilt by association is the way in which we figure out what a gene's function is given the function of its neighbors in the network. And one of them is called direct interaction, the other one is called able propagation. These are algorithms that appear in different forms in a whole lot of different fields. So here it's a good idea to get an idea of what they mean, at least conceptually, what the differences are. And then we're going to be able to use these gene recommender systems. And Barani has made a very nice assignment to use the G-Manias for you to become comfortable with using them in your own work. And really they answer one or two types of questions. The first question is, here's a gene, what does it do? What functional information can I collect about it based on how it interacts with other genes? And the other type of question is, here's a list of genes that I like, give me more genes like this. Along the way, we're also going to be able to answer questions like, well, here's a list of genes. How are they all connected to one another? Are they co-expressed? Are they all members of the same protein complex? And G-Manias and other gene recommender systems will allow you to do that, to make those types of queries as well. And the last thing is that, at least what G-Manias does, is it allows you to select different network waiting schemes. And there's some art in how to decide what the network waiting scheme is appropriate. And that depends on the type of question you find answered. Great. Okay, maybe I have like a little mouse that, yeah, there we go. Is it, this is a little type of Mac where that thing gets big if I move it. Okay, so here's the outline. I've talked about everything so far, you don't need me to go through the outline. Okay, so here's the idea, and this is where we start from. Is that you've heard a lot about all these various different types of networks, or various different types of ways of measuring gene expression or profiles about genes? Now, and you want to incorporate all these in your own research, how do you do that? Right, I think that's an important question. Right, you know, ideally when you start asking questions about genes, you should be asking questions about gene function. You should be asking questions using all the available data. Right, and there's a lot of available data. Right, so how do you like work your way through all that data and say something interesting by incorporating all these different sources together? So the way that we thought about this, and a lot of other people in the field have thought about this, is this concept of the functional interaction network. And because you've seen it before, I don't need to spend very much time on this slide, but I'd like to review it any ways because it's here. And so the concept of the functional interaction network, I think of this as coming from some of the early papers in the genomics field, and this is the famous Mike Eisen paper, where what they, what was done in Pat Brown's lab is they took, is they had a microarray that measured gene expression for a large number of genes, and then they put yeast in a variety of different conditions, either they, you know, they put it under nutrient stress, they profiled a bunch of different yeast deletion mutants, and they looked at gene expression or relative gene expression of the given gene in that population under all these different perturbations or different conditions. And then the gene expression profiles are shown here by heat map, where the rows correspond to the profiles, and the columns correspond to the different conditions. And just in this nice figure, all they did is they did a zoom in, and then they showed that if you look at certain reach, and then these correspond gene expression profiles are clustered by their similarity. And if you do a zoom in and you look at specific regions, you can see that the annotations that genes had, I wish there was a way I could be two places at once, the annotations that they had, we're all basically similar to one another. Right. And the idea simply is if you have a similar gene expression profile, often you have a similar function. Right. And then you can, not always, but often. And you can expand that idea. If you interact with another gene, you often have a similar function. You share at least some aspects of your gene function. And so, you can use that to assign or make an initial guess at function of genes that you don't know, and people do this any day, every day anyways. And but the way that we're the formalism that we're going to use to do this, I'm going to move over here now, is to make a network out of it, where the nodes represent genes, the links between the nodes are going to sometimes call those edges, because I'm a computer scientist. The weights of those links represent how correlated they are. And then if you lay the network out using some of the network layout algorithms that Gary told you about, you'll find that you have two genes that are called UNC1 and UNC2. And I know UNC means something different, but it's an UNC with a K. The unknown genes, if they group with genes of known functions, this gives you some evidence that the unknown genes also have that function. So this type of network is called a functional interaction network. And the way we're going to interpret it is the weight or the strength of the link between genes is going to tell us something about the shared, the fact of whether or not they share at least some aspect of their function. That's the functional interaction network concept. There's various different types of real interaction networks, like the physical interaction networks that people have talked about, or the genetic interaction networks, which I've talked about. But a functional interaction network is an expansion of that concept that just says, well, the strength of the link between a pair of genes is some measure of the likelihood that they share at least some aspect of their function. And there are a variety of different types of functional interaction networks. And if you look at the gene menu website, we have thousands. And so you can directly measure interactions. So you can come up with a network that tells you something about the pairwise information, right, or about the fact that two genes are co-complexed, using various like protein assays. You can come up with you can directly measure pair, whether or not two genes have like some or in epistasis or have some sort of epistetic genetic interaction, make a genetic interaction network. You can infer interactions like in the previous slide, where you link two genes together, if they're highly co-expressed across a bunch of different samples. Or you can try to infer interactions for multiple data sources by like taking a bunch of these networks together, and then summarizing all the interactions that you see. The easiest way to do this is like to put all the interaction strengths in some common scale, something like between one and zero, where one is the highest interaction, and then you just sum those weights together across all the different networks. And that pretty much describes most of them, the algorithms that combine these data into one network. Okay, but that or something slightly more intelligent is what I'm calling a context independent functional inferred interaction, where you're combining interactions for multiple networks together is context independent, because you're, it doesn't depend on what question you're asking. What network weights, what networks you use to provide you information about gene functions, that that doesn't change based on what functional questions you're asking. Okay, so that makes sense. There's another way you could do it, you could make the you can make the contributions of different types of data depend on what question you're asking. Okay, for example, if we want to ask whether or not two genes are co complex or in this or in our in the same cellular compartment, we would care a lot more about their protein interactions, maybe a little bit less and their tissue expression profile. That makes sense. Okay, and then so that's a functional interaction network. Now, now there's two main questions you might want to ask when you're doing what we're going to call functional prediction. One of those questions is, what does my gene do? Here, this gene came up. Actually, the way I use this is I'm seeing in a seminar and someone says like a gene name, I open up gene mania type the name in and see what see what I could figure out about that gene. But that basically like, can I say something about the gene's function by looking at all the other genes it interacts with? Okay, the other question is give me more genes like this. So this may be a little bit more still make sense. Say you're interested in the wind signaling pathway, and you want to find more genes that are likely involved in wind signaling, what you would do is just put the list in of genes that you know are involved with wind signaling and asked for more genes like those. Or find more kinases, find more members of protein complex, you can come up with various different ways to answer this type of question. Okay, so question number one, what does my gene do? Alright, so the input here is all the data you get your hands on in a convenient format. Right? Ideally, and then you have a query list. So what does CDC 42 do? Yeah, probably a volume of cells. And then you put these two things into the gene recommender system. And what the gene recommender system is finds you other genes like CDC 42. And then here, what I've shown on the slide is, is there's like a little network where the links between genes represent what evidence says that those genes two genes interact. And then you just do an enrichment analysis like the like the type that Gary talked about yesterday. Okay, asking within the neighborhood, what gene functions are enriched? Right? And I just shown those gene functions there. It's very simple thing to do. It works surprisingly well. In fact, very well. And it's a good place to start from. Okay, and so when you do this, now, obviously, you can use all the types of networks, which I've indicated in blue here, except the context dependent ones, because when you put a single gene in, I don't know what question you're asking. Right? It's just a single gene. Right? And, you know, if you ask me what is p 53 to, I mean, my answer to that question depends a lot about on who you are and who I think is asking what question you're actually asking. You concerned about his biochemical function, or you're concerned about the fact that it's tumor suppressor, are you concerned about where it lives in the cell? Those are all different questions you can ask about the same gene because gene function has a lot of different aspects. Right? So and one of the ways to capture different aspects of gene function is just change the interaction networks that you're paying attention to when you're looking at guilt by association. Right? You could say, what does p 53 do? Here, I'm going to give you all the protein complex, the networks to tell you about protein complexes. And then the question is, well, what what proteins is p 53 interact with? And that might tell you something also about a subcellular localization. Okay, right. But if you want to answer a more general question about gene function, one has a specific question, you need some sort of context dependence. I guess I just said that. So I'll say what I said again, which is like, if you want to know about the interaction, you know, where pre 52 p 53 lives in the cell, you want to focus your attention on certain types of networks. Okay. So how would you term determine those networks? So, you know, the networks themselves are a little bit mysterious. Not everybody knows what every what every network means. And it's not even clear that the people who actually have generated the networks in the first place can say a lot about what those that what type of information a network can give you. So so one of the ways that you can define what question you're asking is proved to provide by some context, right? And so the first time I came up with this idea, I was I was giving a talk in Memphis. And you probably know what Memphis I mean. But in case you don't, here's a couple other cities like Memphis, Knoxville, like Nashville, right? Now you know, Memphis is in Tennessee, and it's not some like ancient Egyptian city, which are listed afterwards. Okay. So the context itself is what tells you what the question is. So if you want to answer the question, give me more genes like this, and you provide a gene list, what you might hope is that that gene list could tell you or could tell the system, what is it the question you're asking is. That makes sense. So so the way in which the gene media and other people who do gene function prediction use that gene list is a unit use that gene list to determine what networks are important. Right. An easy way to figure out what networks are important in this, for this for the question asked or asked by this gene list is to say, well, what networks connect together members of this list, but don't really connect together members outside this list. Right. It's not going to be perfect. Right. But if you have a general trend of networks that have higher connectivity among genes in this list and relatively lower connectivity outside the list, you can use that as a signal to tell you what networks what weight to assign with networks, what networks are important for that gene list. Okay. And so that's what we do in G mean. And so this is a a obviously a screen capture of the G mania web interface, where I put that list in that I had on the previous slide. And this is what G mania spits out. Okay. So let's go through this a little bit. The next slide tells you a few of the things that are in here. And you can play around with it on your computer as well. We can play around with it. I've got it loaded up on my web interface too. So let's let's try going through it this way. And people start to look for it. And I'll show you. All right, each of these like little dark nodes, they represent genes. Obviously, you've seen this before, the links between genes indicate interactions. And they're colored according to the source of the interaction. And over here on the right hand side, this is a list of all the network types that were involved in this picture. And there's a little bit of this little percentages, but by those network types, and those are the percentage of total weight assigned by that. So for that gene list before, the networks that were most important were the consolidated pathways, meaning these are what we did is we created networks out of pathways that were put into databases. And then if we want to estimate if we want to figure out genes that are in that list, we look for gene, you know, we look to see whether or not the genes are in the same pathway. Yes, all these genes in the same context. Okay, the last couple of things I'm going to show you at least on this slide here is that down here in the lower left hand corner, I have listed a bunch of different gene ontology process names, and the the nodes are colored according to the which gene ontology biological processes they're annotated into.