 Welcome to MOOC course on Introduction to Proteogenomics. In the first module you learnt about genomics. In the second module we started discussing about proteomic tools and in this slide in the last 3 lectures I try to give you the basics and foundation of proteomics. Today lecture is going to be conducted by Dr. Karl Klauser who is a principal scientist at the proteomics platform at the Broad Institute of MIT and Harvard. Dr. Klauser will focus on the basics of mass spectrometry based proteomics with emphasis on the electro spray ionization, the factors which influence good ionization efficiency and the architecture of a mass spectrometer. So let us welcome Dr. Karl Klauser for his lecture. If you have seen the description of what this session was supposed to cover it looks like this. As I put all the slides together I decided not to have them in quite that order so the general flow of sections that I am going to go through is illustrated here. And so let us get started. All right so the first section I am going to teach you some of the basics of mass spectrometry. Today in our lab in Boston we do a range of types of experiments in proteomics including discovery based experiments and targeted experiments. These are examples of all the different kinds of instruments that are available from the manufacturer of thermo. We have at least one of each in our lab and all of them have to do some of the same basic functions. They have to create ions from peptides separate based on charge and mass. Although we do mass spectrometry we do not actually measure mass we always measure mass to charge ratio. The instruments basically do two types of generate two types of spectra. An MS spectrum or an MS MS. The MS spectrum measures the complete mass of a peptide and the MS MS spectrum is after there is fragmentation and you measure the masses of the fragments that are produced from that spectrum or from that peptide. All right the as of today in proteomics there is essentially one major ionization technique that dominates the field and that is electrospray ionization. If this was 10 or 15 years ago there would have also been some Maldi or matrix assisted laser distortion ionization but today most of what we do is really driven by electrospray. So, this is a depiction of what is on the outside of the mass spectrometer here at atmospheric pressure. You end up with a liquid that flows out the end of an LC column. There is a voltage applied that causes droplets to be formed that exist as ions and then those ions are transmitted into the mass spectrometer and along the way they have to become desolvated. This is a cartoon representation this is an actual picture of what that electrospray looks like and the liquid that is flowing is most when doing proteomics most often it is through a liquid chromatography interface flowing at the most common scale to do this is 200 naniliters a minute can't be flowing salt or detergent peptides have to be in water acetonitrile and dilute acid most often that is formic acid about pH 3. If you have salt or detergent still in your sample you end up gunking up the front of the mass spectrometer and it doesn't work as well. We measure mass. What is mass? So, the elements that contribute to proteins are dominated by hydrogen, carbon, nitrogen and oxygen. The masses are shown here. It is important to understand that each of those atoms or elements does not exist as a single species. They have an isotope form. Those isotopes are not very abundant. So, you can see the one with the greatest abundance is carbon. And so 1 percent of carbon exists in the C13 form. We are also going to be able to take advantage of that. It is crucial for doing quantitation these days for there to be the existence of carbon 13 and nitrogen 15. So, one other aspect that I want you to keep in mind is that the difference between carbon 12 and 13 is not the same as the difference between nitrogen 14 and nitrogen 15. That is going to be very valuable to us when we talk about multiplexing later. So, it was actually I have been doing this for about 25 years and it was only a few years ago that I was looking at something like this and I said wait a minute a neutron is has a different mass depending upon whether it is attached to nitrogen or to a carbon. And the answer is yes of course because we do not measure the actual mass of something. We measure the mass based on energy. And the binding energy of the protons and the neutrons in a nucleus is different depending upon which element you are talking about. And that is the source of the difference between adding a neutron to a carbon or adding a neutron to a nitrogen. So, once you get some data out of a mass spectrometer these are some of the basic characteristics of it. I told you that there was isotopes. So, instead of getting a single peak we get multiple. This first one the leftmost peak or lowest mass peak is referred to the monoisotopic peak. That means it was measured of a molecule that contained entirely the first isotope of the elements that it contributed to. Because carbon is the most abundant isotope the major contributor to the other isotope peaks is carbon 13. So, here you have one carbon 13, two carbon 13s, three carbons, four carbons. But I told you it is only 1 percent. This is huge it is almost as big. Why? Well, once you add up something that has a intact mass of about 2000 there is going to be hundreds of carbons. So, the chance now of being at least one of them being carbon 13 goes way up. So, once you have these isotope patterns there is also some extra valuable information encoded in them. Approximately the mass of a neutron is 1, but we do not measure mass we measure mass to charge ratio. So, it is m divided by z. In this case you can see that those isotope spacing is 0.25. That means the charge now is 4 and we can determine that because of that isotope spacing. On a high resolution instrument this is what an isotope cluster will look like. The numeric value assigned to resolution is something like 60,000. That is the typical resolution that one will run an MS-1 scan on when using an Orbitrap instrument. Where does the 60,000 come from? The typical measure of resolution is m over delta m. So, that means you would use an enumerator you would use 431, but the delta m part comes from the width of the one of the isotope peaks. So, that is very narrow. When you have a low resolution instrument 2000 would be the resolving power of this. Now it is also kind of complicated when you compare one instrument from one manufacturer to one instrument from another manufacturer because although they both use them they will all use the measure of resolution. The resolution is not uniform across the mass range of an instrument. And the particular manufacturers do not necessarily quote their resolution at the same mass. And furthermore to make it even more complicated thermo uses different measure depending on which instrument they are talking about. So, they will quote you a resolution number at mass 400 for their quadrupole type instruments and for the hybrid instruments they will quote it at mass 200. So, when you start to say oh I want this instrument because it has got higher resolution it does not. It is just they have changed where they give it to you. But what you really need to know is how narrow are those peaks going to be and are they narrow enough to do things that are useful. TMT-10 resolution is something we are going to talk about later. And the ability to resolve the N and C isotopes of TMT report rions is important. So, I have already said the gap there was 0.25. We measure amino acids. Kelly showed you earlier when you are working with DNA or RNA there is only four things. And they are not glycine 3 and aniline and cysteine which is what they would be if they were talking about amino acids. There are there are 20 amino acids they are depicted here. This organization of the amino acids has them showing different properties of the amino acids. And I was told that there might be a quiz conducted throughout the course of the workshop here. And it just might be useful to know that my personal favorite amino acid is probably. The mass of tyrosine is 163. A couple slides from now you are going to need to know that. These particular amino acids over here, leucine you cannot tell the difference between leucine and isoleucine with a mass spectrometry because they have the exact same elemental composition. But from the standpoint of measuring ions these ones are kind of boring. They do not do much. The reason proline is my favorite is because it causes all kinds of havoc. The side chain is bound to the backbone and it likes to put retain charge there. And so you get a nice huge peak in an MSM spectrum when there is a proline and you get an ion cleaving on the N terminal side. We could not do proteomics if it wasn't for these three amino acids in particular lysine and arginine. They are basic. They bear charge. They are the basis for making the ions have charge. We do positive ion mass spectrometry. It is possible to do negative ion mass spectrometry. I have been doing this for 25 years. I never do negative ion. To me if it was negative ion it would be totally boring because it cannot do peptides very well that way. But I give you some property information here. The pK of the basic group on the side chain is quite different and the arginine has much more basic. It really wants to hold on to that charge much more strongly than do lysine or histidine. Do you have a question or a thought? Please feel free to interrupt me. So later when we interpret spectra by hand, if you knew all the masses you would be able to keep up with me. I have a big advantage over you that I just know all mammalian amino acid masses. So after we do some math I can just tell you right away what the amino acid is. You are at least going to know tyrosine is 163. Okay. Here is an old slide that illustrates an MS1 spectrum. There is a few things I want to point out here. This is measuring multiple peptides at one time. And you can see that this one is a singly charged peptide. Its isotope spacing is 1.0. This one over here is doubly charged. And then I think one is triply charged here right. So the spacing is 0.3. You can tell for the resolution or the peak widths there that this is an old instrument. Okay. Those things are wide. Okay. If we do label it a little differently, one of the things you can see is that the masses of those peptides after you account for the charge convert it back to singly charged. This peptide has no basic residue which is why its easily singly charged. This peptide can hold two charges. And you can, you typically will have charge on arginine and lysine and then on the interminous as well. Okay. It is also possible to hold charge on asparagines and glutamines. Okay. But here we have easily two basic residues. When they are right next to each other it is hard for them both to be charged. Okay. But here this one has got enough basic residues. Peptide 3 is triply charged. If the mass range was expanded wider some of these peptides would produce multiple charge states. Okay. So you would see 2 and 3, 3 and 4, something like that. All right. So how do we do MS and MS-MS? Okay. We use one instrument. Okay. But it is going to do two things. And this is the simplest instrument where there is one mass analyzer and a second mass analyzer with a collision cell in between. Graphically this is equivalent to a triple quad instrument where to do an MS measurement you let the first quad just pass everything through and then we turn off the collision cell and we measure every ion that went into the instrument and that gives you intact precursor ion information. You may hear me switch between the terms precursor ion and paranion. Okay. When I am doing that, when I say paranion I usually mean the singly charged version of the ion and when I say precursor I mean the multiply charge that is not universal. It is just my personal way of distinguishing what I am talking about. All right. To do an MS-MS spectrum we set the first analyzer to only pass ions of a particular mass. Okay. And then the width that we allow might be only about two daltons. Okay. So only things that have, okay. And in this figure it is about, it is an old figure that width was 4. Okay. So we are going to allow anything with a mass of 834 to 838 to pass through. Collision cell is going to turn on fragments into pieces and you get an MS-MS spectrum. Okay. You could put three mass spectrometers together and you would get MS to the third. Okay. That works a little bit differently, but the principle is the same. Okay. All right. The mass spectrum right now is labeled b and y and we will skip ahead to talk about that. All right. After this slide we are almost never going to see elements again. We are always going to talk in terms of amino acids and we will use letter codes for amino acids. But I want you to keep in mind that a peptide is actually a molecule. Okay. And it has bonds between carbons and nitrogens and oxygens. This is a peptide backbone and if you look close you will see that there are essentially three different types of bonds here. The most common one that will fragment in during MS-MS is this one. Okay. And that will lead to a b ion if the charge is retained on this side of the peptide and it will lead to a y ion if the charge is on that side of the peptide. Okay. When we have a mass spectrum it is not the measurement of one ion. It is the measurement of many ions. Some of them are going to fragment here. Others are going to fragment there. Others are going to fragment different places. And so the spectrum is the sum of all of those events being measured together. Okay. Why would you call this b and y? All right. Well, that is because there was three things. Okay. So we call them a, b and c or x, y and z. The most common dominating ion types though are b and y. Okay. But it is not quite that simple. You can also lose break bonds that are in the side chains of the amino acids. Okay. And in particular you can lose water and ammonia dependent upon having these particular amino acids in those side chains. If you have a phosphorylated residue you can lose 98 from the side chain of serine and threonine. Okay. Again, you can also have, if you start out with a triply charge or doubly charged peptide, the fragments can have multiple charges on them. Okay. All right. This is the way a spectrum comes out of the instrument. It's, there's no red, there's no blue. It's just a bar chart, black, numbers on it, and you got to interpret it. Okay. So this one by the way is a beautiful spectrum. Life would be grand if everything was like that. The pickets represent the peaks and the spacing would be nice and desirable and it would be ideal. Okay. If you had the choice to have a very nice looking spectrum it would look like this. Okay. All right. Now, what information content is present in this? Okay. There's a, so first of all you're going, whenever I show you these things there are several things you should know. This is the file name which means nothing to you but it tells me what project the spectrum came from. This little red carrot is the position of the precursor mass in the MSM spectrum. Over here is the parent mass which is singly charged. Here's the precursor mass m over z and then the precursor charge state there is listed as z. Okay. So if you look at this spectrum you can see there is spacing of peaks that looks pleasant. Okay. There's also some apparent symmetry to the left and to the right. Okay. That is, it is helpful to look at that symmetry. Okay. I have colored those blue and red because they're going to turn out to be b ions and y ions. We don't yet know which one they are but a b ion plus a y ion that fragments at the same place adds up to the precursor mass and that's the basis of the symmetry. Okay. So you, once you have two ions that are on the opposite side of the precursor mass symmetric they cannot be the same ion type. They must be complementary. Okay. So if you then start to do math or subtraction to figure out the mass gaps between them you don't need to do math on ones that are symmetric to each other. Okay. All right. So then we look at some of the mass differences from peaks and if you had memorized all of them you would immediately recognize that those are masses that correspond to amino acids. Okay. All right. Mass 163 corresponds to, come on, say tyrosine. All right. Tyrosine. So we can tell from this set of mass gaps that this, the sequence that's part of this peptide is going to be tyrosine, tyrosine, leucine or isoleucine. Can't tell the difference. Allanine, threonine, 115 is aspartic acid. Okay. And then this is a symmetric spectrum and you can find those same masses going the other way. Okay. I have already colored them blue and red but there is no obvious way to say that one is a b set and one is a y set. Okay. And so how do you, how do you decide? Okay. Well if you can get to the end that helps you. Okay. If you can get to, and in this case we can't readily get to the end in either case. Okay. This 201 there's going to be probably two amino acids that you have to add together to get to that one that's probably, that could be a b2 ion. Okay. This is, this comes from an older instrument where we don't have low mass available to us. Okay. This is a tryptic peptide. Okay. So that means it's very likely to have a C-terminal amino acid that is arginine or lysine. Okay. The y1 mass for arginine is 175. The y1 mass for lysine is 147. Okay. We can't, we can't readily get from 201 to either 147 or 175 in a distance that is consistent with a amino acid mass. So that suggests that that's more likely to be a b ion than it is a y ion. Okay. We could also figure out if we could get from here to the top which is, that's a 1364. Is that right? Okay. All right. So what I'm trying to show you here is that there's a lot of information. Okay. If we have a complete interpretation of the sequence it would be shown here. Okay. The way I have slashes between the amino acids is to indicate that there's fragmentation. This red slash indicates that there was only a y ion there. Okay. This says that there was only a b ion in that position and pink means that there's both b and y. Okay. So this peptide is fragmented nearly completely but it has not given us information about the order of the first two residues. Okay. Together they add up to 201 but we don't know whether it's AE or EA. Okay. And similarly over here we don't know the order of cysteine or alanine. Okay. All right. But as with mass vector or driving down the street in India not all two wheelers are oil end fields. Okay. Sadly. All right. So some you get peptides that will fall apart. Do not give anywhere near as complete sequence information. And it's actually much easier to manually interpret these things because you can't get very far. Okay. So the best we can do is we can tell that there's some symmetry there. We have at least one mass gap that is consistent with an amino acid. And then there are a couple of combinations of amino acids that will add up to 202. Okay. When you search a database this is the only peptide that will be consistent with all of that information. Okay. All right. So what I tried to illustrate for you is that unlike DNA sequencing where it's typically the case that you get a clear, complete answer at every step of the process, with an MSM spectrum you have incomplete information. Okay. And what I've tried to go through and tell you is some of the factors that contribute to that information being incomplete. Okay. And here's a list of those things here. Okay. The fragment ion types and the tendency of for them to occur depends a little bit on what kind of fragmentation mode you're using. There are different ones out there and I will come to that in a later slide. Okay. You have complimentary information in both directions. Okay. Which, but that can lead to some uncertainty as to which direction you're in. Okay. And I told you there were 20 amino acids and if you were looking for mass gaps you would hope to find one of those 20. But there may be other modifications that happen that you have to be aware of and that's going to alter the mass. Okay. Then there are some amino acids like I told you a leucine, niolucine where you can't tell the difference because they have the same mass. There happened to also be some combinations of things that have the same mass. Okay. All right. And or very similar masses. Okay. So lysine and glutamine for example both have 128 and if you have low resolution low accuracy you can't tell the difference. But today with an orbitrap type instrument you do have the mass accuracy to tell the difference between those things. Okay. Two glycines together is identical to one as faraging. Okay. A glycine and alanine is identical to a glutamine. So these things go together to make your life a little bit different, difficult. This is a figure that I made in graduate school because I was faced at that time with many spectra that were more like the poor quality one. Okay. And I wanted to know that this was before genomes were done. Okay. And I wanted to know once the genome is done are those poor quality spectra still going to be good enough to determine a sequence with a database of a complete proteome. The good news is I'm still standing here today doing mass spectrometry. So yes the answer is it's good enough. Okay. But this is the nature of the figure that I made. At that time the we were still unknown how many genes were in the human genome and one of the highest estimates was that it was 100,000 genes. Okay. So I used that as an upper limit and then I took the mean length of a gene or of a protein in the Swiss Pro database at the time and that was 350. And then I made these calculations. Given the peptide length here I wrote a program one of my first programs to go through and count how many peptides there were going to be at that length given these assumptions. Okay. And so the red line is the simplest one which is to say if you could make all possible sequences and count them this is what that's how many there would be. Okay. Then the blue line is how many peptides there would be in the human genome. Okay. And then the green once I found out that this was nowhere near close to the red line. I said well how little information could you actually do this with. Okay. All right. And that's where I calculated the green line. So the green line means we don't actually know the order of any of the amino acids. All we know is the amino acids that are present in the peptide the amino acid composition. And to my surprise at the time you could start to get out to a point where if you had long enough length the amino acid composition was going to be unique. Okay. So you can still do a lot of good with these partial sequences. Okay. Life of course is a lot easier when you get complete sequences but life is not always giving us that. Okay. All right. So programs then have been written to do this kind of thing and I've written one of them. The software package that I'm responsible for developing is something called Spectrum Mill. All right. And but nonetheless all of the software packages out out there do this basic type of thing. Okay. You'll start with an experimental spectrum. You start with a database of sequences. Okay. You have to take those sequences and digest them into peptides theoretically. And then you make a model spectrum from the sequence and you match those two up. If you get a match however you score it then what you're looking for is to find the peptide in the database that gives you the best match. Okay. All right. So when you do this if the database is incomplete then you can get the right answer. Okay. And that actually causes some trouble because the programs are always going to give you the best answer. And what you'd really like to know is can you only give me the right answer? And if it's not there could you please just tell me that. Okay. Okay. But they don't do that. Okay. All right. So that can be frustrating. Okay. All right. So as a consequence though now what we do is since we can't be sure we've got the right answer let's at least estimate what our error rate is. And so I'm going to talk to you about calculating a false discovery rate from this type of thing. So building further from where I started about the basics of proteomics workflows and focus on the mass spectrometry Dr. Carl has further given you the detailed concepts of the principle of electro spray ionization and why it is important in proteomics. Then he talked about importance of high resolution of isotopes for proper identification of peptide sequences. How collision induced ionization results in the formation of different types of ions which is useful for the identification was also covered. Then he showed you examples of different spectra and how those could be interpreted manually something similar to what I talked in the in the previous lecture. But now you have seen much more detail about how to derive the spectride spectra manually. Additionally you are also informed about the factors which could affect the fragmentation of peptide. And lastly an effort was also made to help you understand the architecture of two very advanced mass spectrometers used in proteomics. So I hope from my previous three lectures and Dr. Carl Clauser's today's lecture now you got a very good understanding about how to use mass spectrometry and different you know important considerations for doing proteomics using these kind of instruments. In the next lecture Dr. Clauser will help you appreciate the importance of sample preparation. And then moving on to the quantitative proteomics with the use of i-track and TMT labels for quantitative proteomic analysis. Further he will talk to you about use of various search engines in protein identification. Thank you.