 Welcome to this short introduction to the core concept of mass spectrometry-based proteomics. A cell consists of many proteins which can have varying abundance, be modified in numerous different ways and make up many different protein complexes. The systematic measurement of all of this is what is known as proteomics and the most popular technique for doing it is mass spectrometry. However, given how many different things can be measured in proteomics, you can imagine that mass spectrometry-based proteomics experiments come in many different flavors and for this reason it's a pretty steep learning curve to get into analyzing such data as a computational biologist. Let's start with getting an overview of the experiment types. At a very high level you have two types of proteomics experiments. You have top-down proteomics in which the goal is to measure intact proteins and there are many challenges to doing this. I won't go through all of them but fundamentally it boils down to the heterogeneity of proteins. Proteins have many different physical properties and this makes it difficult to make a single assay that works well for all of them. The alternative and the most popular approach is bottom-up proteomics. Here the first step is enzymatic cleavage, taking the proteins in the sample and cutting them up into smaller parts using for example trypsin. These short peptides have the advantage of being much more homogeneous, making it easier to make an assay that works for all of them. However, it comes with a price. Having cut the proteins into small pieces, we need to put the puzzle together later. You can also divide mass spectrometry experiments based on the experimental focus. Here's the goal to measure proteins which are there and how much do you have up them in a way very similar to transcriptomics. Or is the goal to look at phosphorylation sites specifically so-called phosphoproteomics or measuring other post-translational modifications. In these cases you will typically want to use some enrichment strategy that enriches for the type of peptides you want to see before running the mass spectrometry. Finally, you might want to look at protein interactions using affinity purification as a very popular approach in which you tag a bait protein and use it to pull down all the interaction partners that it forms complexes with. You can then use mass spectrometry to identify what's in the pull-down. I've mentioned mass spectrometry many times, but what is it? Fundamentally, a mass spectrometer is nothing but a fancy scale. It allows you to weigh molecules. The way this is done in practice is that you ionize the molecules, then accelerate them, deflect them with a magnet and detect the intensity of the ions. In a schematic it looks like this. You have an ion source, it emits a beam of ions, these are deflected in a magnetic field and for that reason they separate out and you can measure the intensity of the individual ions. In this case it matters a lot how heavy the ions are since a heavier ion is going to be deflected less in a magnetic field. The charge also matters since the more charge an ion carries, the more it's going to be deflected in the magnetic field. For this reason what you measure is actually not the mass, but the mass to charge ratio of each ion. And on top of that of course you measure the peak intensity. How much did you see of the ion? The problem here is that for identifying the ion you have only one mass to charge ratio for each ion and that is simply insufficient for identification, at least when you're working with a complex mixture like an entire proton. This brings us to the next topic, LC-MS-MS and here you get the first taste of how much people in mass spectrometry like their acronyms. LC means liquid chromatography and it allows you to separate the peptides before even sending them into the mass spectrometer. The first mass background then isolates precursor ions, these are fragmented typically by collision with an inert gas and then sent into mass spectrometry again for a second round of mass spectrometry to identify more data. What you get here is thus that you first separate things in time, then separate them by mass to charge ratio of the precursors, select certain precursor ions from the precursor spectrum and produce fragment ion spectra for these. This means that for each selected ion you have the retention time from the column of liquid chromatography, you have the precursor mass to charge ratio and you have the fragment ion spectrum. That's a lot more information to allow you to identify the ions. There are two main types of such mass spec experiments. There's non-targeted protomics which aims to measure everything. The downside is that these methods are generally speaking only semi-quantitative and they have abundance biases meaning that you're more likely to observe abundant proteins than less abundant proteins and they are stochastic in nature meaning that especially for the less abundant proteins it is random whether you see a protein or not even if it's there and that means when you do separate runs it's going to be a problem to identify the same things in your different samples. The other alternative is targeted protomics. In this case you've already before running the experiment decided specific peptides that you want to measure. This has the advantage that the experiments can be quantitative but the downside is that you can really only target a few hundred peptides in a single experiment for which reason you cannot actually measure an entire proton with non-targeted protomics. This brings us to identification of spectrum. How do we find out what each spectrum actually corresponds to? The first step is peptide identification and one way of doing that is so-called de novo peptide sequencing. The idea is that if you have a complete fragment spectrum you can deduce the order of the amino acids purely from this. However that requires that you have a complete spectrum that you've observed all possible fragmentations which is hardly ever the case. For this reason the more popular approach is database search. You're trying to find matching peptides for the spectrum in a database consisting either of computed fragment spectra deduced from the approaching sequences or a spectral library of previously observed fragment spectra of known peptides. In either case you use a similarity metric to say which spectra are sufficiently similar to be considered a peptide spectral match or as it's called a PSM. The problem is of course that you can get false matches and we therefore need to have a handle on the error rate. The most common way of dealing with that is to introduce so-called decoy spectra into the database. Whenever you have a match of a spectrum to a decoy spectrum you know it's a false match and for this reason you can estimate the false discovery rate based on how often you're matching the decoy spectra. After having identified the peptides we still need to map them to proteins. Remember we need to put the puzzle back together. And the problem here is that multiple different proteins can give rise to the same peptide. You therefore cannot know which of these proteins it came from. Consequently when you're analyzing mass spectrometry data you will tend to find protein groups rather than individual proteins. That is you have sets of one or more proteins that can have given rise to a peptide and you cannot know which. And these are the groups that you have all your data for such as the abundances. And that leads us to the topic of quantification. We want to not only know what's in the sample but how much there is of each thing. One way of doing this is label free quantification or LFQ remember mass spec people really like their acronyms. The simple idea is if we know how many peptides we've seen for a protein and we know how many spectra we've seen for each peptide we should be able to estimate protein abundance. And indeed you can. However these numbers depend a lot on many other factors as well. And for this reason it's hard to normalize it between different runs. And since you cannot multiplex the samples having not labeled them in any way you end up having to compare not really comparable numbers between runs ending up with rather inaccurate ratios in most cases. The alternative is therefore to use labeling methods. You will run into acronyms like SILAC, iTrack, TMT. All of them are ways of introducing stable isotope labels into the peptides. That allows you to multiplex the samples since you can see where a certain peptide came from due to the labeling and therefore only have to compare values within one run. That's a big advantage since it deals with the stochastic nature of mass spectrometry as you're guaranteed to observe the same peptides of all samples when you do it within one run and it gives you accurate ratios because you're comparing measurements within one run. The downside is of course the cost of labeling. It's not free neither in terms of time nor in terms of money. So in practice how do you do the statistical analysis of all the spectral data coming from a mass spectrometry study? The first step is to use dedicated mass spectrometry analysis software such as MaxQuant or ProTermDiscoverer. Regardless of which software you use the end result here is going to be tabular data consisting of identifications in your sample and peak intensities the quantifications. If the software didn't already you will want to log transform these data to get log intensities and subsequently normalize the values across samples using the same kinds of techniques that I already covered in an earlier presentation on transcriptomics. The normalized log transformed intensities generally follow a normal distribution and for this reason you can use t-test ANOVA and similar methods for identifying significant regulation in the sample regardless of whether it is changes in protein abundance, changes in post-translational modification status or changes in interaction partners. However you have to remember that even at this point you still have to deal downstream with the problem of protein groups. That's all I have to say about mass spectrometry based protomics this time. If you want to learn more about what you can do in terms of downstream analysis I suggest you look at this presentation next. Thanks for your attention.