 Check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check check It's my great pleasure to introduce Samantha Anderson Sam on her thesis defense. This is an important day for Sam. It's an important day for the lab as well. And the main particular is it's been a while since we graduated as a student. It was 2015. She's the first of the second group of grad students. And actually, she took the- Okay. I'll bring my microphone and look close. Is that better? No? It's my accent. So she took the baton from a student who was graduating, Ben Moeller, in 2015. And she's now doing the same with the next generation. Kind of an exciting time. Two more graduations are coming. For me, this is kind of a special time because a PhD and I'm telling this mainly for the family. I'm not sure if they know it. It's a very transformative time. These students come. They're great, brilliant, out of college. Very well studied with some research experience. But now they come at PhD. They're given five years and a difficult problem. And it's all kind of, well, go and figure this out. And when it ends, kind of, you can tell how much progress they've made. So Ram took a first project from Ben. It was sort of already ongoing. So maybe that was easier. But she really put her heart into it and collected some really good data right away, made the project better, got a good publication, and combined some experimental work and computational work. And the computation was high throughput. The experiments were kind of low throughput. And we thought, well, if we could only increase the throughput of the experiment, imaging what we can do. And at the time, Watsan Rama was joining the department as a new professor. We decided to go in a new direction. And she took it with courage, head on, and develop a whole new method for, this is going to be the second part of her talk. And the other day when she gave her presentation in group meeting and told you, well, you're making something too easy. Don't sell yourself short because there was a lot of things to figure out. It was a new thing. And she's starting to see that she will show you a little bit of data. Now finally, she's gotten there and she's starting to actually using this method to collect data, but this will happen the next month coming forward. What I really appreciated of Samantha is kind of the can-do attitude. She's a doer. She knows how to do things very well. But she knows I want to make the perfect, the enemy of the good. She is motivated by challenges. Yeah, even one time I remember receiving a text, sorry Alessandro, I can't come to the meeting tomorrow. I'm here with a broken neck. My God, what's going on? Then I called on her and I just felt my back broke my neck into vertebrae. It's okay. And then one week after she was on a plane traveling to a conference. So yes, I guess that's always been, I guess a major strength. The other thing that I want to highlight is she from day one knew what she wanted to do in life, which is a good thing. She wanted to go in science policy and she spent a lot of effort in that. And without that destruction, I'm going to put it in quotes, I think she would have achieved even more. But in retrospective, that destruction actually was, I believe, her main motivator because that's why she wanted to do what she was doing. So I think she managed to handle two important aspects of a professional demand in a great way. So that's kind of an example for everybody. Well, without taking much more time from her, it's my pleasure to introduce Sam. Well, thank you, Alessandro, for that incredibly kind introduction. It was very nice to hear. And before I get started today, I just wanted to set out my dedication for this thesis and my thesis work. To my grandparents, Rose and Ralph Newman, who were always very supportive of my time in education, and they both passed away while I was in graduate school. So this is dedicated to them. And so let's just jump in and start talking about what we do in the Senus lab. So in particular, the Senus lab studies membrane proteins and in general, membrane proteins serve a variety of different functions in biology. And for example, they act as transporters and ion channels which pass things inside and outside of the cell. They act as enzymes that catalyze important biological reactions. They also exist as adhesion molecules which help create extracellular structure. And most importantly to today's talk, they also act as receptors where they sense what is going on outside of the cell and transmit the signal through the membrane to perform actions inside of the cell. And now in particular in the Senus lab, we study membrane protein folding. And membrane protein folding occurs in two stages. In the first stage, individual helices insert into the membrane and then they associate together to form various oligomers. And in this case, in this picture, we see a dimer. Now this is true for the single-pass proteins that I show here, proteins that have one helix that go through the membrane. But it's also true for multi-pass proteins where you have 6, 7, 10, 12 helices that come together independently and then they associate into a globular form. And so this process is incredibly difficult to study because it involves not only these protein-protein interactions but also interactions with lipids, with solutes, with the solvent around. And so it becomes both a biological and biophysical challenge. And so in order to tackle this problem, the Senus lab and what I've done today is I break this into digestible segments and these segments are called motifs. So what is a motif? In general, what a motif is is it's something that lends structural and functional insight into entire groups of proteins. And in our case, they're characterized by a distinct structure. So they look one particular way and they're characterized by a signature sequence. Amino acids form the same patterns over and over. They're also commonly found throughout nature and they're suited for multiple functions in very different contexts. And so one example of a motif the structural biologists in the room might have heard of are called coiled coils. And so this is a motif where you have two protein alpha helices interacting together and these were studied very well in the early 1990s. And so because of the studies that we've been able to do, researchers found that coiled coils are characterized by heptad motifs. So you have A, B, C, D, E, F, G that repeats over and over and over throughout these helices. And at the A and D positions you have hydrophobic amino acids and at the rest of the positions you have polars. So what this does is it allows for the hydrophobic effect to bring these helices together in a coiled fashion. And so just as an example of why studying this motif and in general motifs is important is we can look at biological systems in different ways. And so this is just one example that uses the coiled coil but there are many others. And so in this system you have a coiled coil that connects a hinge to these various heads. This is actually the set of proteins I studied in an undergrad. And so what I did was I stared at these micrographs over and over and I was able to measure the length of these coiled coils and all the different types of SMC proteins. But if even more than understanding biological systems we can actually use this information to design new proteins. And so in this particular design researchers used the ability to design these coiled coils and make them come together in different shapes for bio nanotechnology purposes. And so like I said this is just one example that I'm not going to talk about today anymore but motifs can be used to understand wide ranges of proteins and functions. And so in the senate lab we study a particular motif and this motif is called the gas-right motif. Now it's characterized like all other motifs in that it has a distinct structure. So very similar to the coiled coil it's two alpha helices coming together but in this case it's in a membrane. And it's also characterized by a negative 40 degree right-handed crossing angle so this helix in front is pointing up into the right. Energetically the gas-right motif is characterized by an alpha-carbon hydrogen bond network and tight bander walls packing. So I'm going to take an aside here for a moment to talk about these hydrogen bonds. So typically when someone thinks of a hydrogen bond they normally think of an oxygen or a nitrogen donating a hydrogen to another electronegative element. However in this case we have a carbon donating its hydrogen. And the reason this is possible is because inside the peptide backbone the nitrogen and oxygen groups surrounding it are full electron density away from the carbon allowing it to donate its hydrogen across the aisle to the carbonyl on the opposing helix. Now this type of hydrogen bond is weaker than the canonical nitrogen to oxygen hydrogen bond. It's about half the strength but I said networks when I said that this is what characterizes the motif. And so these bonds form four to eight of them in between two helices making it a powerful driving force for protein protein association. Now back to the motif. So the second part I said energetically is for tight bander walls packing and this occurs because of the signature sequence of the motif. So you have the amino acids glycine, alanine and serine forming a small triple X small sequence motif which usually has at least one glycine in there. And so what this does with this spacing it puts both of these small amino acids on the same face of the alpha helix and when you put two of them together it allows for very close contact between the helices and therefore very close packing. Now finally a motif means it's very commonly found throughout nature and so there are 21 solved single pass dimers that look generally like this and half of them use this gas right motif meaning that they have this negative 40 crossing angle and these small amino acids. And so we know that they're very common but what I haven't told you yet is how they play a functional role in biology. And today I'm just going to tell you about one particular example of an important gas right motif and that is of the Epidermal Growth Factor Receptor, EGFR. And so this study by Pehugia in 2018 genotyped over 100,000 different tumors of long cancer, breast cancer, gastric all these different types of cancers and what they found very commonly is that in the EGFR receptor these four mutations seem to be very important meaning they're found very often in tumors. And so what these four mutations do is it turns a wild type protein where they're interacting together into a almost permanent dimer meaning that the bottom half of this receptor is always on constitutively active leading to uncontrolled cell proliferation which is probably the most characteristic part of cancer. And so I've told you now, and then you might be thinking, okay great EGFR receptor is important but what about the gas right motif is important. And so if we look at this particular mutation one of the four most common you see this G660 aspartate this glycine is inside the gas right motif meaning that it turns a weak hydrogen bond which I just described into a very strong hydrogen bond forcing these helices to come together permanently. And so I've shown you that these gas right proteins are both common and they're biologically important and so it's very critical for us as researchers to understand and study this motif in order to gain information about biological and disease phenotypes. However, studying these proteins is incredibly difficult. And why is that? And I'm going to show you just one example of how these proteins are difficult. So if we look at genomes one third to one quarter of all proteins in every genome we've genotypes are membrane proteins. They contain these types. Furthermore, 50% of the drug targets currently on the market 50% of the drug targets currently on the market are also membrane proteins. However, if we look at the protein data bank only 3% of the structures in the protein data bank are actually membrane proteins. So what this means is that there's a giant discrepancy between what we know is important and the information that we actually have to study them. And so in our lab we take a different approach to studying these membrane proteins that we have in order to tackle this challenge. And as a metaphor for that I'm going to show one of our lab's favorite images or paintings by René Marguerite called Clairvoyance. And in this painting he's looking at an egg and painting what it will become. And in a similar way in our lab we look at the amino acid sequence of a protein and we want to see what it will become and also how it will get there. And so previously in our lab our students took a computational approach to study this gas-right motif and they used a computational algorithm which I'll refer to CADM from here on out which explores a very limited conformational space. We look at the distance between the helices, the crossing angles and various others small conformational changes. In order to evaluate these predicted structures we use a combination of van der Waals and hydrogen bonding because as I told you before we think those are the most important energetic characteristics of this gas-right motif. And so by using this simple algorithm because there's a lot of other forces that we could consider the students in the lab were able to predict the structure of known gas-rights very well. So in yellow here we have the experimentally determined NMR structures and in blue we have our predictions. And so if you look specifically at the crossing point of all of these proteins you'll notice that the structures overlay very very well. And so we know based on this data that a combination of alpha-carbon-hydrogen bonds and van der Waals forces are sufficient to predict the structure of known proteins. But I thought I could do more. And so we know that I can predict structures but is this algorithm also predicting the stability of these proteins? And so I took both an experimental and computational approach to address this question and first I had to get some proteins to look at to study. And so we ran the whole human genome of single-pass proteins through our catam algorithm and they all produced, not all, 50% of them produced models that we could study. And so that's about 1100 different proteins. And so you can see here that there's a wide distribution of energy scores. So this is the score that our catam algorithm puts out. And negative 80 means these are very strong dimers. Zero over here means these are very weak. And so I wanted to study all of these proteins because I wanted to understand the whole gas-rate motif and what that looks like. However, the experimental challenges to that are very high. And so if I really wanted to understand the specifics of the gas-rate motif, I could take very quantitative approaches like using FRET or NMR or analytical ultracentrification to just really tease out what each individual force is doing. However, I didn't want to know about one protein because those types of experiments would take my whole Ph.D. to maybe characterize a handful of proteins. I wanted to learn about the whole motif. And so I took a different approach rather than a highly quantitative one. I used a genetic reporter assay. And this genetic reporter assay called TOXCAT takes our transmembrane domains of interest. We fuse them to a transcription factor and to maltose binding protein. When the helices dimerize, so do the transcription factors, which then turn on a reporter gene. So in this way, the reporter gene outputs the dimerization, but the maltose binding protein up here is what evaluates insertion. And so just to show you how the second part works, maltose binding protein binds a sugar called maltose and it carries the maltose binding protein to the transporter, which then pushes the maltose through the cell and then the cell can survive on this maltose. And in our system, what this looks like is we have an E. coli cell here. And then in purple, I have these little transporters and here I have our constructs. And so what the maltose binding protein does is it binds a sugar, these little disaccharized here, and carries it over to the transporter and now the cell can live when maltose is the only carbon source. But if we have transmembrane proteins that don't insert into the membrane and they all get aggregated over here or the maltose binding protein gets cleaved off for some reason, then only very little maltose can get through to the transporters via diffusion. And so what happens is these cells start to starve and then they die. So what this looks like experimentally, rather than schematically, is that we have our individual maltose plates here that are made out of minimal media where maltose is the only carbon source. And when we play out a transmembrane domain in the E. coli, it grows, but if the protein's not inserting it doesn't grow anymore. And so in this way, we can exclude constructs that we can't understand what's happening with the insertion portion. And so again, I'll take you back here. So the toxar, the reporter gene, measures dimerization and the maltose binding protein can evaluate insertion levels. And so I performed this assay for a number of constructs that I will populate in a second. So on the x-axis I'm comparing these dimerization scores to our predicted algorithm where again, weak dimers are over here and strong dimers are over here. And then on the x-axis I have the output of those dimerization reporter gene where strong dimers are going to end up over here and weak ones are going to end up over here. So in an ideal world, we would have a correlation that looks like this. And so when we actually look at our data we can find actually a pretty good correlation between the catam energy score and the reporter gene. And so what does this actually tell us about the gas write motif? It tells us that our catam algorithm which only uses van der Waalsen hydrogen bonding energy is correlated with the actual experimental ability to dimerize. Now in order to gain some more information about the gas write motif, I divided this data up into various bins to evaluate various trends. And so if we look at all the sequences here which you don't have to really pay attention to but what you might notice is that there's a glycine all the way down. And so we designed it this way so that the G part of the G-triplex G motif is in the same position for all of these proteins. And so at the bottom we have these weak ones which is this bottom row and for the top row we have these top ones over here. So if we look at the occurrence of G-triplex G rather than alanines or serians we find that the majority of them occur at the top. So the stronger an association gets the more G-triplex G's there are. And so if we plot this on a graph we see the less stable, the pink group over here more stable blue group over here and it goes up to 100%. Now if we think about the G-triplex G and if a glycine is required here there's two different ways that the other glycine can go. Can either be four behind or four after. And so four behind would be over here and four after would be over here and what we find is that the stronger dimers have glycines in this I-4 position rather than the I-4. And so this means that this is the important place for the two glycines to be rather than at a different place. And so we saw that there were sequence trends for stability. And because I had the computational models to compare it to I also looked at the structure and energetic trends that we could also analyze with the models of these proteins. So we have these two helices here and we saw that as the stability increases the helices become closer together. It goes from about 7.1 to 6.5 and so the helices become closer together and they also become more narrow meaning that the degree between these two helices goes from about negative 52 to negative 39. So the helices are coming closer together and they're getting narrower. And so these geometric trends led to energetic trends which is that the number of hydrogen bonds increases as well meaning that the stronger ones have exactly 800 bonds there's no even error bar there. And then because the number of hydrogen bonds increases so does the total hydrogen bonding energy. So I said hydrogen bonds and van der Waals were both important so now moving to van der Waals as expected the van der Waals energy increases as well. Now there's two different ways the first way is that there are more of them meaning that there's a greater surface area between the two proteins and as you might imagine if the helices are coming closer together and the crossing is getting narrower we would expect to see an increased surface area however we find that that's not true actually the surface area remains constant throughout all the different groups so what that actually means is that van der Waals aren't become there aren't more of them now they're just better, they're better energies and the distance between two atoms are approaching a more optimal position so the packing efficiency of the dimer increases with stability. Now what does this tell us overall about the gas rate motif? It means that we know that glycines in very specific positions in the membrane are important for association and those glycines lead to a structural change of closer helices and narrower angles these structure changes lead to energetic changes and these energetic changes lead to an overall increase of experimental stability so this was really the conclusion of my first paper and I was really excited about it and we learned a lot about the gas rate motif but I'm going to bring up the data here just for one more second to talk about because the data collection of this y-axis took me over two years while the data collection of this x-axis took two days and so there's a big discrepancy here about what we can do experimentally versus what we can do computationally and just so you can understand why it might have taken so long to get this y-axis I staged this picture earlier this week because this is what doing Toxcat looks like and so what would happen is I'd wake up in the morning I'd come here, I'd start some day cultures and then I'd go swimming for two hours and then I'd come back and then I would sit in this room for six hours and so if those of you who can notice here I have two pipettes here two pipettes here ready to go I have two ice buckets and that room is very cold it's very cold so what would happen is that I have to take a measurement every two minutes so sometimes I would have to like run to the bathroom and run back I'd have to run, go spin down some cells or do something else but I had to be in this room every two minutes and so sometimes actually Zach when the Pag Lab was still on our floor it would sit on a stool here and eat lunch with me so I wouldn't be alone all day so that was really nice and so a responsible experimenter doing this doing this assay can do maybe eight samples in a day I mean if I wanted to work 48 hours straight I could do a lot more but responsibly I did eight samples a day and then you have to do replicates and all the other types of things and so I was frustrated because of this like experimental constraint I had to be very specific and very precise about what constructs I wanted to test and so I had to make various controls that maybe don't represent the natural world and so in the second half of this talk and the second half of my PhD I really spent the majority of my time developing a way that I could make the experimental work move as fast as the computational work and so with that I performed a collaboration with Vatsan Raman who again just joined the lab or joined the department as Alessandro said earlier and I'm really thankful to all the people in his lab for their support and training me in a new thing that my lab and many other labs didn't have experience in and so this experiment is called sort seek and the sort part comes from cell sorting and the seek part comes from next generation sequencing and so it begins by ordering a gene chip from a company with a library of transmembrane constructs and these libraries contain 10, 20,000 different DNA oligos and so I'm going to tell a story about when Vatsan joined the lab or joined the department and that was when he came he did his interview here there was a student lunch and we were all in the student lunch and he just glosses over the existence of this and everyone in the lab in particular remember Ellen Crummy we were all sitting there and we were like, wait, wait, wait we don't care about anything else you're about to say we need you to explain what this is because we had never heard of it and it was very new technology and so we were excited to learn the fact that we can order lots and lots of different DNA to be exactly what we want and so what that means is there's no more error-prone PCR there's no more degenerate primers you can order 10,000 different sequences to be exactly what you want it to be so I can be very specific about what experiments I want to do next so we ordered the gene chip and what you do is you transform this library of constructs into the same assay that I explained later where the reporter gene is green fluorescent protein, GFP and so then all of these are in cells and the cells express an amount of GFP based on the dimerization propensity and so what happens is you put it through a flow cytometer and the laser shoots out and tells you how much GFP is in each individual cell and then the cell is sorted into the various populations in which it belongs and so that's the cell sorting part the next generation sequencing part is you take each one of these bins you send it for sequencing and you get a count of the number of reads for each individual construct in each bin so these yellow ones are obviously weaker than these pink ones over here so then I take that data and I have to use some statistical inference to reconstruct these histograms and the overall fluorescent profiles of the proteins now, as Alessandro mentioned this makes it sound really easy but for all the molecular biologists in the room you might imagine that this is a very difficult portion because any of you that want to cut, ligate and transform DNA sometimes you're just really happy to get one colony make things work because that's all you need but in my case I need 10,000 so that was a very big challenge to get the cloning to work properly also this bin choice here how many different bins you have what size they are that's a very important characteristic it determines how coarse these histograms are you also need a good next generation sequencing prep with sufficient read quality which is expensive and then you also need good statistical parameters to reconstruct these profiles and so all of these are very big challenges but today I'm just going to talk about this one because I find it the most fun and also the most intuitive so what this looks like I'm going to take you through a very simple example of how this process works and so in this image I have the fluorescent profiles of two constructs that I designed this one, this light blue one is more or less a monomer and this dark blue one is a very strong dimer and so what happens is when you combine these two together you have this orange profile here and so then I split this up and I use the cell sorter into different fluorescent bins I send it for next generation sequencing and I use a variant of a weighted average to calculate this association so I'm going to take you through this very slowly so you're going to start with the sequencing reads that you get so on this left side in this week bin over here I have mostly monomer, the monomer construct and on this left side over here I have most of the dimer construct and then you have to understand what each one of these populations contribute so for example this first bin over here is at about 11% of the total population which means that this light green portion is 11% of the total so then you normalize so that all the orange equals one and so if you look at the monomer here there's a distribution that matches what happens over here and then if you look at the dimer the distribution again matches what looks like over here then to get the fluorescence value you multiply the median of each one of these bins and so each one of these bars is the contribution each bin has to the total fluorescence and so if I reconstruct that and add these all together what I find is that this is the calculated value of the monomer this is the calculated value of the dimer and they overlay very well with the original populations now it's great that this worked for two constructs but like I said I don't want to do two I want to do 10,000 so I thought I needed to do a better validation and so I moved up to using 100 constructs in a procedure that I'm going to call the spike-in procedure so I take the library that I cloned and I played it and I pick 100 different colonies out of it pick each one of those colonies into a different sequence in a different tube and then I run them on the flow cytometer to see what the fluorescence profile or the dimerization propensity of each construct is so you have these all validated right here and then I combine them all into one single tube and do flow cytometry on that and I was able to get a distribution that looks something like this and so then I do the whole sort-seq procedure where I sort it and then I do the calibration sequencing and then I do the reconstruction and I have that data here so on the x-axis I have the median, the flow median of each individual single one that I characterized and then on the y-axis I have those reconstructed values and I am super excited about this data this shows almost a y equals x equals y-line and the correlation is great so I was really proud at this point that I was able to get the sort-seq experiment to actually capture dimerization very well but if you remember when I talked about in the last paper insertion is also an important part to measure and so how was I going to do this for you sort-seq so if you remember I showed these plates because this is how you do it individually now each one of these plates grows for 72 hours you have to streak out every single one and there's no way I was going to do that I'm not going to do this again no way too much work and so what we did was a variation of sorry of a growth curve and so again I picked 100 different colonies and mixed them all together I split those mixtures into one that has normal LB media so the cells can grow and are happy and on the right side I put it again into that minimal media that only has a certain amount of time and compare the sequence population so I send it for next gen and I compare the sequence populations of each construct in the LB versus the M9 media so if you just look at a couple controls to start I did this time course for 72 hours which is obviously how long the plates went for and I want to thank Josh, wherever he is for helping me take some of these time points because it's a lot so in the blue I have some constructs that I know grow on those plates very well and they stay in the population though at varying levels throughout but for the ones that I know don't grow on the plates they all disappeared by after 36 hours and I was grateful it wasn't 72 because 36 is much shorter so they were all gone after 36 hours and so if I plot this difference for all 100 proteins that were in that group you can see that all of the negative ones disappeared at some point now there's a whole range of insertion ability for these different constructs which means some are more hydrophobic than others which is expected now what's not necessarily known is where I draw this cutoff here but that's an experimental question for the future so I was able to make sort seat work for both insertion, dimerization it was optimized over the course of a year or two maybe two or three but I wanted to actually ask biological questions with this which is how this section of the talk connects with the first half and so I'll bring back that image of the distribution of human genome single pass proteins from our CADM algorithm and so previously I studied 26 constructs but now I've pulled it up to 100 I might think okay 26 to 100 really not that many more constructs however I also tried them at three different lengths because we know that this assay is very sensitive to the length of a transmembrane domain so we started out as annotated by the uniprot they started out as all 21 amino acids but we also truncated them to 19 and 17 in order to figure out which one would be best because for some proteins 17 might be the best length and 21 might be the best for others so a hundred sequences of three different lengths in addition I did three mutations for every single residue for every length and so what this will do is it will show us what the mutational profiles of each protein are so if something if a mutation disrupts dimerization it's clearly important and probably at the interface if a mutation does not disrupt dimerization then it's probably not at the interface and so I performed the whole sort-seq procedure for this library of 18,000 different proteins and here I'm just going to show you the distribution of the various libraries so the 17 length one is in orange the 19 is in blue and the 21 is red and so there's clearly a large proportion of these sequences that are anywhere from a weak dimer between these two lines to a very strong dimer over here that we can characterize and so let's look first at the wild types of all these proteins and so here I've only posted the 21 length ones and so there's a range basically from zero all the way up to 250% of our control our control is glycophorin A which is a very strong and well characterized dimer and so this is where glycophorin A usually is and it's known as as strong and so all of these are very strong dimers everything in the middle is an intermediate dimer and what we consider below 40% is likely to be a transient dimer or even a monomer and so for all of these we're able to not only look at the wild types but also characterize their mutational profiles and I'm going to bring up today just one example I think it's around here somewhere called semaphorin 5a semaphorin is a protein that's important for neural development and it's also connected to autism in some cases and so what we have in this graph is on the X axis we have the wild type transmembrane domain sequence and then on the Y axis we have the three or four mutations that I tried for each one and then this last row here is the average of the column so something that is white is characterized as very much like wild type and if you look at something that's dark red you can see that that disrupts dimerization very strongly and so if I plot out this average and fit it to a sine wave what we can see is that the periodicity is 3.6 the same as an alpha helix meaning that all of these mutations are occurring on one face of the helix and this is it plotted on a helical wheel now if we look at our model from the CADM algorithm and we plot these mutations on those well we can see is a very clear interface where these two helices come together in our model and so furthermore you can see that these dark reds very severe dimers very severe mutations actually correlate with previous data I have not talked about today which is that the right side of this interface is actually much more important than the left side so I was really excited about this data but if we look at furthermore at the different lengths so here I'm showing you the 17 length dimer and that is plotted here on the top with this profile I already showed you but if you look at the 19 and the 21 the profile remains the same meaning that this is a robust interface that is likely biologically important and so even I told you before length is important so the disruption decreases with length but it stays the same throughout and so now I'm just going to move to the conclusions that have been given you here and so in the first half of the talk I explained the gas rate energetics and so again we are trying to look at sequences and predict how they get to their final position and we did that using the CADM algorithm and it correlated very well with actual stability measurements and this gave us insight into the gas rate motif in that sequence structure and energetics all contribute to total stability in the second half of the talk I showed you how I created the sort seek method and furthermore that I was able to characterize not only new dimers but entire mutational interfaces and so where do we go from here I've been really excited about this work I think it's very important for our labs, projects and so I'm just going to tell you a little bit chronologically about where we're going from here so Gladys in our lab is using the constructs that I characterized in the first half for more intense thermodynamic studies basically I did the screening and looked at the trends and she's going to tease out the exact energetic contributions of these various proteins I through the next six months I'm going to evaluate the CADM algorithm based on all that data that I gathered because I showed you one example but there are many more there and I'd like to compare it to our algorithm the newest student in our lab, Josh is going to take the sort seek data as well as the data hopefully from Gladys and use it to train the CADM algorithm which will give us insight into how each force is contributing to association and just to show you that the sort seek is not just limited to what I want to study Gilbert in our lab is using hopefully going to use the sort seek for membrane protein design and with that I would like to start with my acknowledgments that there are many let's see if I could not cry so I'd like to start out by thanking Alessandro he has been a phenomenal mentor he has really helped me grow as both a researcher and a person he's trained me in writing in talking in graphic design and experimental design he's been very helpful in making me into a better scientist he's also been very supportive of all my science he mentioned never nixing my conferences that I wanted to go to or whatever else so I'd really like to thank him I'd like to thank my committee members Katie has been the membrane protein person on my committee it's always really challenged me to ask hard questions Julie has been great and always very supportive of me Vatsan was obviously my collaborator and in particular in the ramen lab I'd like to thank her help and ability to complain I'd also like to thank Shushmita the newest member on my committee for agreeing to jump in halfway through and ask me all the statistical questions I need I'd also like to thank former committee members QC and Doug who began and later left the university I'd like to thank the CIBM training grant it has been critical to my success both as giving me a foundation of courses and school and statistics as well as providing me the funds to pursue all my professional development goals as well as the Wisconsin Distinguished Graduate Fellowship from the department most importantly I'd like to thank my undergraduates I'll probably say most importantly for every slide so I trained a number of undergrads it died again I trained a number of undergraduates throughout my time here and they've all been really important either helping me with papers as Evan did, he's on one of our papers or really just teaching me to be a better communicator, a better teacher and really pushing the edges of my knowledge for communicating what I actually know versus wait why have I been doing that for five years so I'd like to thank them very much the senate slide up has been great they've been very supportive we've had foosball tournaments every year in particular I'd like to thank Samson who's been my bay mate for five and a half years and has constantly put up with all of my complaining we have gone through some very similar personal circumstances and so it's been really great to have him as a comfort and a friend and he always pushes me to not take the shortcuts I'd also like to thank my Saipal family all of you guys here today as part of catalyst for science policy I started with these four people who are all in the room today so thank you CPU is really a person who I learned a lot from when it comes to science policy at all really my foundation and I'd also like to thank Caitlin who is the co-president after me and she's really taken science policy and CASP to places I never could have imagined when I started I'd also like to thank my network and Avatar in particular who taps me to lead the Midwest region and so his confidence in me as a leader was very, very helpful I'd like to thank what I'm going to call today my forever friends so Neha and Madison are in the audience and today Neha and I were friends in elementary school third grade maybe and so it's been really great to continue to see them and really made it all the way here through graduate school as well as Mike and actually what I noticed this morning where all these pictures were at concerts so clearly we like concerts a lot or at least I do so I'd also like to thank the class of 2014 you guys have been pivotal to my survival here in graduate school we've done friends giving weddings, boat trips, Halloween so many things with all you guys and in particular I'd like to thank Tina who's been probably my best friend through graduate school through various concerts, Breaking Bad some fun dating spills so I'd like to thank all of you guys so much and most importantly again there I go again I want to thank Kasha and Kasha Kasha's been my roommate for and knew it see for over five years and we actually knew each other from before graduate school we did an internship together and so she's been the person I've come home to to commiserate about experiments she's helped me pick out outfits for conferences, for dates, for whatever and her boyfriend's also been around a lot, Brian's been great and I'd also like to thank Zed I can't point he's been my partner for the last eight months and he's been very loving and supportive and have made this very difficult time in grad school a whole lot easier because the end is always the hardest part right and I'd also like to thank my family there are I think like 20 of you in the room today and this is probably the most that any family occasion we've ever had or like holiday meal at least and so I'd like to thank all of you guys for coming out and supporting me today and lastly just my immediate family promise is the last one I'd really like to thank my brother and sister who have become really close friends as we've become adults over the last few years and parents who have supported me unconditionally throughout this whole time that I've been here and with that I will take any questions yes Mark okay so first question is actually my rotation project what we found was that we couldn't find anything we couldn't find any concrete things that are clearly individual mutations that are important for dimerization but we haven't been able to find any like new ones that weren't already studied for the second part of your question absolutely so the I think the the strength for a lack of a better word of the gas right motif is that it's a weak dimer meaning that there's weak intermolecular forces meaning that they can come apart and come together transantly as needed so the example that I showed you the up dermal growth factor receptor is very bad when it becomes the constitutive dimer because proliferation continues so yes it's true Kasha I haven't checked that yet but I did look at a different dimer and looked at the mutational profile and that is actually true it was just a less clear example than how obvious this is Eddie yeah so as a rule it changes all prolenes to alanines so for my study of the human genome I removed any of the transmembrane domains that had proline because those weren't likely to be biologically real so what it does is it kinks the helix a lot of times and for chapter four of my thesis I modeled one of these things with catam and then there was a proline in it biologically and so I had to manually manipulate the model that we already had to add prolenes to it so they usually are kinks but that doesn't mean that they don't dimerize Amanda so when I performed this this version of the ToxCat assay only looks at single paths there are other versions of the assay that have been developed to look at multi-pass so I didn't personally look at them but I know that the gas rate motif is often seen inside multi-pass proteins oh Tina that's a great question so there's a lot of ways I can think about that but let's see gas rate association occurs in soluble proteins it's just more often studied in membrane proteins the association of helices is important in many different types of proteins I actually thought your question was going to be and I already prepared and answered my head um so when you look at transfer membrane domains they can dimerize but they also the other parts of the protein can also dimerize the outside, the inside and so it's important to understand which part of the protein is actually biologically important rather than what's just sitting there because of the other stuff yeah Jean Yang so what it says is that there's a whole range of association strengths it says that there is anywhere from a very weak dimer to a constitutively more or less covalently bonded one I mean it's not covalent but it's very very strong and so I think the fact that there's a range of association is very important for biology it's important to have things that are weak and transient because some interactions have to come apart and come together in a variety of circumstances and so just because something is transient or weak does not mean it's not biologically relevant Dr. Anderson we have the closed session now for our next talk