 Hello everyone, thanks for listening. My name is Marvin, and I'm doing my PhD in the Schilling Slab at the Institute for Surgical Pathology in Freiburg, Germany. Today, I'd like to present a project about peptide mass spectrometry imaging of pancreatic cancers. They can be divided into the prevalent group of x-ray tumors that arise from cells that produce the digestive enzymes. The most common subtypes are the pancreatic ductal adenocarcinomas shortpedal. On the other hand, we have the rather rare endocrine tumors that arise from cells that produce the hormones. We use so-called tissue microarrays with small biopsy cores assembled in one glass slide so that 25 to 50 samples can be processed at the same time. Using a workflow for peptide imaging, we measured 13 TMAs, the normality top imaging device, and got 100,000 spectra. One such spectrum represents the peptide distribution of each individual pixel over one such tissue core. And from this spectrum, we can then generate the seed maps that show the intensity distribution for a specific mass. So in general, we have a quite complex data set. The aim of the study is to generate a classifier to identify PDOC and to distinguish PDOC from other pancreatic cancers. The latter group is very diverse and consists of all non-PDOC pancreatic cancers. Since the number of those cases is quite low, it's difficult to assemble a balanced cohort, which is a limitation to most pancreatic cancer studies in general. We'd also like to identify the most interesting MSI features, which are, of course, those that will be part of the classifier. And it's important to us to provide a complete and transparent data analysis workflow because it's still missing from the MSI community. The first step is the data pre-processing and kind of a reduction of the data that leads from the spectra to spectra with the PIPs. The complete pre-processing workflow consists of several steps, like, for example, a baseline removal and a smoothing or some detection steps, the removal of contaminants and also intensity normalizations. Of course, we recommend to check the data after each step with a quality control tool. After the pre-processing, we get spectra that are reduced to a couple of hundred MSI features here, 743. And the final QC shows us as well that the mass shift is quite low in this data set around two PPM and also a typical peptide distribution that most MSI features lie on the MSI range between 500 and 1,500 MSI. With this pre-processed data, we can then go for the statistical analysis to build the classifier. We split the data set into a trained data set to build a classifier and a test data set to evaluate its predictionability. We split the trained data set again for cross-validation training to find the best parameter for the classifier algorithm. So in total, we use the classification tool three times first to test a different parameter and identify the best ones, then again with the best parameter to build the classifier and then again to test the classifier's performance on the test data set. And as you can see here, we also had the travel of having an unbalanced cohort with a way higher number of PDA cases. Nevertheless, the classifier shows on its own trained data set, it was built on an accuracy of around 90% and the sensitivity between 83 and 91%. It consists of 127 MC features, 40 features defining the PDA and 87 features defining the diverse group of all other pancreatic cancers. Without evaluating the classifier's performance on the test data set, it achieved a lower accuracy of around 72% and sensitivity between 77 and 70%. This results are still very good and we have a couple of samples. We need to go back to the pathologist and ask them to recheck those cases. It's probably difficult for the algorithm to distinguish these two groups. We gave it because we have the PDOC on the one hand and the very diverse group on the other hand. Nevertheless, these are very nice results that the words can be shared with the community and especially the most common group of PDOC, so such a high lethality and it's difficult to subtype them correctly. So we might be worth to study the molecular histology of those tumors with MSI. I'd also like to highlight the applicability of the MSI tools already available in Galaxy to such huge data set. It consisted previously of around 400,000 spectra and 54,000 MC features. So it's a huge benefit to us to have the opportunity to use Galaxy for this and we want to share our experience by sharing the analysis workflows and also providing some training material on how to perform a complete analysis on a large clinical data set. And with this, I'd also like to thank my whole lab, so the whole Schilling group, the surgical pathology and also the pathology industry.