 Hi everyone, we're kicking off this week with our special series of bite-sized talks that are focused on NFCORE pipelines. Thank you all for joining us and I'd also like to thank the Chan Zuckerberg Initiative for supporting our events so far and more that are coming in the near future. So Leon Bickman is the first speaker in the series and he will be presenting the NFCORE MHC Quant Pipeline that is targeted at data-driven skeptidomics to us today. While Leon is currently working at BioNTech, he will be presenting work that he carried out at the University of Dubingen in Germany. We really appreciate that you've taken the time to join us today Leon. Over to you. Thanks. I hope the sound is working fine. Yes, so thank you for inviting me today to present the NFCORE MHC Quant Pipeline that we constructed during my PhD time in Dubingen. And yeah, it's an automated pipeline to analyze mass spectrometry data for the discovery of epitopes that can be used for the design of vaccines. And second, and I'll structure this talk in three parts. So I'll recapitulate some background on cancer immunotherapy and aspect geometry. Then I'll go in depth into the steps, the most important steps of the MHC Quant Pipeline. And then at last I'll also give a little future outlook of what could be done in the future, not that I'm done with my PhD. So let's start with the background. So this is a very basic image of T cell-based adaptive immunity. And this is only one branch of the immune system, but that was a part that we were interested in. What's happening here is that T cells of the immune system are controlling basically all the cells in our body, and they do this by checking upon one cell surface protein complex called MHC. And this, it's called major histo-competibility complex. And on top of this complex, there are little peptide fragments presented. And they represent breaking pieces of all the protein content of a given cell. So if there's something wrong inside that cell, maybe a viral infection or a cancerous protein that is normally not there, then the T cells can recognize this from the peptides, the epitopes, and engage into a cytotoxic activity and kill the respective malignant cell. And this is not only a comic, you can also visualize this in real life. For example, using electron microscopy. This is being exploited, for example, for cancer immunotherapy by comparatively analyzing tumor and normal tissues by taking biopsies of a patient. And then using whatever methods one has available from sequencing transcriptomics and also MHC mass spectrometry. And then really trying to find out which are the MHC epitopes that are only presented by the tumor tissue. And those ones can then service candidate epitopes for a vaccine cocktail that would stimulate the cancer patient's immune system against its own tumor. And the MHC quant pipeline has focused only on the bottom parts here so the identification of MHC epitopes from mass spectrometry data to give you a bit more of feeling of what kind of data we're dealing with here. So what one usually does in the lab is to take tissue samples, homogenize this into a solution, then immunochromatography purified MHC complexes. There are two different classes, MHC one and two. And upon illusion of the peptides from the complexes one can spike these solutions directly into the MS instrument and measure this. And nowadays mass spectrometers are really high throughput instruments so you can automatically sample from a box of vials and measure hundreds of thousands of MS runs in rather short timeframes and talk about weeks or months though, but it's possible. And therefore we need this new highly parallelized processing methods to process all the data that we're aggregating from these instruments that are in mass spectrometry data is really complex to analyze. So this is where the MHC quant workflow came into place. And I'm going to tell you now about what's going on inside the pipeline. So first of all, here, a bit of abstract overview over the architecture so like all NF core pipelines. You have in the center processes that are carried out from software libraries and in this case we used primarily the open MS software library for computational mass spectrometry. You can imagine like a toolbox of Lego blocks to to stack together specific applications for very specific use cases and mass spectrometry. And we combined this with third party tools outside of the library and also some things that were not available, we scripted into Python and are and also included this into processes. So of this is then nicely containerized using Docker and singularity or other methods that are provided by the NF core template. And this can then be run using this next flow workflow system, pretty much on any system, and it's highly reproducible. You see a rough sketch of the pipeline. And you can see there are really quite some steps going on 35 different steps and they're all interlinked with each other in different ways. And if we just focus on the five most important ones. I'm going to guide you through this step by step now. So what's happening. Most importantly, the first stage is protein database searches carried out using the search engine comment. And we chose this as well established search engine because we did a, it's a first of all it's a very simple and fast scoring method. And it has no triptych bias for unspecifically, unspecifically cleave peptides, like the MHC peptides are so it makes you deal with other types of peptides and comment that does not have that bias. And then we did a benchmark, comparing a variety of different tools here in our results it seemed like comment was finding most peptides on the bottom you see the number of peptides on average. On the top you see the percentage of identified MHC binders. And while the ratio was always the same, the number of peptides was higher for comment. And the only other similar performing tool was peaks. However, this software is a license tool so we couldn't use it here. We also verified these additionally identified peptides, comparing their retention time properties in the access you have to measure retention time on the Y axis there predicted retention time from sequence. And here we're seeing that those peptides uniquely identified by the common MHC quant workflow were nicely correlating here, whereas looking into random peptides, decoy peptides you would see them scattering all over the place. So this raised also the confidence in these uniquely identified ones. So as a next step, a classical thing in proteomics is carried out of the false discovery rate. And we're using we're using here a bit more of an advanced method called percolator. And if you want to know all the details about it, I really recommend you reading the paper here from Lucas cal nature methods 2007. It is in contrast to the classical approach of simply computing the false discovery rate by looking at a univariate target decode distribution here. A multivariate separation is achieved by an iterative machine learning approach where the spectrum matches are compared by a variety of different scores and then an iterative process the discrimination between target and decoy distribution is achieved in a better way. So the next step, the retention time along all the peptides is corrected. And this is another problem one encounters in mass spectrometry measurements that across different measurements to the retention time is slightly deviating. And this is being corrected in the pipeline using an open MS specific tool called net line identification, and that basically aligns all the peptides to one central reference. And also this one we verified by looking at two very different MS runs in terms of the retention properties and you can see the deviation was corrected to nearly zero across the entire range. So finally, we get to the part where mhc quant gets a second name from quant, so every peptide is also associated with a quantity. And this is carried out using a targeted chromatogram extraction approach so each peptide identification is located, not only an MS two level but also an MS one level and then corresponding chromatograms are integrated. And the sum of the signal intensity area under this curve represents the quantity that can be compared. And again, we went into the lab and also verified this so looking into the signal intensities of 57 spiked in peptides that were diluted in a linear series. We're also observing this linear decay and signal intensity. So, again, we made sure that this quantification works quite reliable. Finally, an mhc affinity prediction is carried out. And here we applied to open source modern neural network actually architectures called mhc flurry and mhc nuggets. And we chose these predictors among all the variety of predictors out there. They were completely open source and not licensed. And we were very happy with the working of these tools and they included a variety of mhc alleles. And in benchmarks that were out there they were actually performing quite similar to to the well established license tools as well. So this I'm already at the end of the talk. And I'll give you a bit of a future outlook what could be done in with this pipeline next. One thing has been done already by Marisa Doppelhaar who recently joined university tubing and did a great job of porting mhc quantum DSL to so this was really an essential step I would say to to continue the development of this pipeline. But other things have been happening to and it has been shown for example that intensity predictions massively improve identification rates of these kind of peptides so this is definitely something that should be included here in the future. And also the mass back instruments are developing currently at high speeds so it would be also I think a great thing to add to this pipeline in the future to make it possible to include I am mobility data for example or data independent acquisition based methods. But we will have to see how this will be continued. So thank you very much for your attention and that was already it. And I'm also still including here acknowledgement of all these people that helped me by grading the pipeline so I really had a lot of support not only for the pipeline also throughout my PhD so for this I'm also very grateful here.