 Hello, my name is Andrew Ocheski, and I'm a PhD candidate in the Griffin and Tret-Yukova labs. I wanted to talk to you all about some of the COVID-19 research we've been doing in the Griffin lab using our galaxy workflows, in a talk I call, a galaxy-based evaluation of optimal peptide targets for MS-based detection of SARS-CoV-2 in clinical. The COVID-19 pandemic has been one of the most significant and destructive global events in recent emery, with presently hundreds of millions of cases and several million patient deaths. A key component of controlling the pandemic is rapid screening for and diagnosis of the disease in patients. The two primary diagnostic tests are PCR amplification of viral RNA using patient samples and serological tests to determine the presence of antibodies. However, the multiple steps in QTPCR amplification come with the potential for introducing false positives. In addition, detecting antibodies for SARS-CoV-2 only verifies cases ex post facto. We believe that targeted mass spectrometry assays to detect specific SARS-CoV-2 peptides would aid in the global fight against COVID-19 due to the relatively robust nature of peptides and the speed at which such an assay could be performed. Targeted mass spectrometry assay for viral peptides would be non-invasive, fast, and highly accurate and would therefore be a valuable addition to the current testing paradigm. However, not all peptides are suitable targets for analysis. We hypothesized that we could use bioinformatics workflows in Galaxy to mitigate the need for testing peptide targets and determine the optimal SARS-CoV-2 target peptides a priori based on publicly available datasets, which constitutes the work I'll be discussing here today. This was performed using key workflows in Galaxy. In this first workflow, we use proteomics search engines to interrogate open source mass spectrometry datasets for the presence of SARS-CoV-2 peptides in order to create a panel that we could then use to search clinical datasets for specific high quality viral peptides. Through the use of in vitro cell culture datasets, clinical datasets, and other bioinformatics experiments, we were able to construct a panel of 639 viral peptides for use in downstream analysis. The 639 panel of peptides correspond to proteins throughout the SARS-CoV-2 proteome. Mathing peptides to the viral proteome yields matches to almost every protein in SARS-CoV-2 with several proteins having high sequence coverage. These tend to correspond to structural proteins such as the viral spike protein, the membrane protein, and the nucleocapsid protein. In the second workflow of our analysis, we leveraged our 639 peptide panel against clinical datasets using the PEP query search engine to determine which of these peptides are reliably found in actual patient samples. For this, we used seven sets of clinical data from COVID-19 patients harvested from the upper respiratory tract and deep within the lungs. The PEP query search engine matches spectra to a list of peptides of interest. In our case, our SARS-CoV-2 panel, and then determines whether the spectra might be a better match to peptides in a designated reference proteome. For our reference proteome, we used a combined FASTA file containing the human proteome, common mass spectrometry contaminants, and eight other coronaviruses to ensure specificity of these peptides for SARS-CoV-2. The results of the PEP query search were then filtered to retain the best matches to SARS-CoV-2 and fed into the open source analysis program's UNIPEPT and the NCBI BLASP to verify their specificity for SARS-CoV-2. After running the second workflow, we see that our initial set of 639 SARS-CoV-2 peptides is winnowed down to 87 peptides with a high specificity for the virus. Of these, most of the peptides belong to the structural proteins, again including the spike protein, the nucleocapsid protein, and membrane proteins which make up the viral particle itself, while replication proteins translated from the large ORF1 are largely lost. Second workflow also generates MS2 spectra that can be examined using the Laura Keats tool in the multi-omic visualization platform so that we could examine the quality of the spectra of peptides that pass through PEP query. Our rules for selecting good spectra are based on getting the highest quality matches possible. The product ions must be at least three times as intense as the noise signal, and there must be at least three consecutive ions in the B or Y ion series, or ions corresponding to the fragmentation of the peptide backbone. By way of example, there are two MS2 spectra belonging to PSMs observed in a COVID-19 patient dataset which passed PEP query validation. When we look at the spectrum on the left, not only are the product ions in and amongst the noise, but each series contains only a single product ion. We can therefore discard this spectrum, and the PSM is being real. By contrast, the spectrum on the right has a fairly complete consecutive series where the product ions are well above the noise. This PSM was therefore retained for consideration as a target. Ultimately, we determined that four peptides scored highly in PEP query of best manual evaluation and were present in the majority of clinical datasets. These peptides are presented here, where you can see the quality of their MS2 spectra. These peptides corresponded to mapping to the nucleic acid protein, which serves to package the viral RNA within the capsid. This highly abundant protein is also the gene most commonly amplified during QRT-PCR diagnosis of COVID-19, making MS detection of SARS-CoV-2 using these peptides as targets a complementary methodology to pre-existing assays. Our target selected, we wanted to ensure that these peptides would make suitably specific targets for SARS-CoV-2. In running them through UNIPEP, a metaproteomics application designed for taxonomic and functional analyses, we see that they all have some alignment to various strains of SARS-CoV, with each showing alignment to SARS-CoV-2, illustrated here by the increased font size. To see how specific these alignments were, we also used BLAST-T to align the peptide against the viruses. As you can see, these peptides show perfect alignment to the nucleic acid of SARS-CoV-2, more so than any other coronavirus, with the first peptide in the series being wholly unique to SARS-CoV-2. In addition, none of these peptides showed any alignment with the human host proteome. One issue we noted in our analysis was the difference between those samples collected from the upper respiratory tract and those collected from within the deeper lung. In looking at the unique peptides that ask PEP query validation in this group, we see that relatively few SARS-CoV-2 peptides are shared between these two samples, with few of our designated target peptides detected within the deep lung samples. Unlike the peptides detected in the upper respiratory tract, the PEP query validated viral peptides in the deep lung samples correspond to proteins throughout the viral proteome. Interestingly, during our manual annotation, we found that those peptides that pass manual validation have notably lower PEP query P values. In looking at those peptides which pass PEP query validation with a much lower P value threshold, we see that in upper respiratory tract samples, most of those peptides are retained, while the lion's share of those peptides which pass PEP query validation in the deep lung samples are then lost. In keeping with this observation, only a single peptide that will pass PEP query validation in the deep lung samples also passes manual evaluation, suggesting that these samples, which are inherently trying on the patient to harvest, make poor samples for mass picrometry-based detection of SARS-CoV-2. Include, four peptides from the nucleocapsid were determined to be particularly suited to targeted proteomics. They had high confidence matches in the PEP query workflow, were observed to have high quality spectra, and were observed in most of the clinical data sets that we analyzed. A manuscript detailing these discoveries has been written up and published in the Journal of Clinical Proteomics. For future directions in this work, we are working to determine the optimal sample preparation conditions for LC-MS analysis of SARS-CoV-2 patient data. In addition, we are working to investigate whether different strains of the virus can be differentiated effectively using LC-MS assays. We are working on this in collaboration with Waldemarie Carvalho and the Fleury Group and the Cervasta Lab in IIT, Bombay. I'd also like to talk a bit about a metaproteomics analysis we performed on COVID-19 positive samples to determine the degree of co-infection with SARS-CoV-2. Over the course of the pandemic, many patient samples have been collected and submitted to shotgun proteomics analysis as a part of the research efforts to combat COVID-19. While most of those approaches detect the infection by targeting SARS-CoV-2 proteins associated with the host response, we believe that the MS-based proteomic data sets can also be further utilized to identify and understudied what potentially important co-infection status of the infected individual. Bacterial co-infections during respiratory-related viral outbreaks have been shown to significantly contribute to fatalities, as demonstrated in both the 1918 and 2009 pandemics. Moreover, recent research reports have also shown that many fatalities attributed to COVID-19 viral infections could be due instead to patients' pre-disposition to co-infections. Diagnosing and managing co-infections can be complex, as it is possible that opportunistic co-infecting organisms are present in patients prior to viral infection or are acquired during hospitalization. For example, chronic bacterial infections associated with chronic obstructive pulmonary disease can be a risk factor for patients with severe COVID-19 infection. Additionally, some COVID-19 patients with severe presentation are subject to prolonged mechanical ventilation, augmenting their chances of developing nosocomial infections. Furthermore, the use of antibiotics to treat bacterial infections is especially high among COVID-19 patients under intensive care. The global overuse of antibiotics can lengthen the growing roster of antibiotic-resistant pathogens and present problems in pairing infective organisms with appropriate antibiotics in an effective and timely fashion when using culture-based testing for co-infecting microbes. Early diagnosis of multi-organism co-infection is also necessary to avoid complications during hospitalization. The work that we used in our proteomics analysis is based on two steps. Together, they're able to identify co-infecting microorganisms present during COVID-19 infection. In the first step, publicly available mass spectrometry datasets from COVID-19 positive and COVID-19 negative patients were analyzed in Galaxy using Compil 2.0 against a comprehensive database. The resulting peptides identified were then analyzed using the taxonomic search engine Unipep to identify species present in the samples. Those species with two or more unique peptides were used to create a clinically significant multiple-organism fast database, which the data were then searched against in Galaxy. Peptides matching microorganisms were validated computationally via PEP query and manually annotated and validated using the lower-key platform. Using these workflows, we were able to detect characteristics peptides from several microorganisms in COVID-19 patients. In COVID-19 positive patients streptococcus pneumoniae was detected, a bacteria known to cause pneumonia and other respiratory tract infections. Drugingly, we were also able to detect lactobacillus ramnosis, a probiotic, in addition to an unclassified pseudomonas strain. By contrast, COVID-19 patients also showed some characteristic microorganisms, namely pseudomonas monteliii, which is responsible for meningoencephalitis and actinobacteria, your signei, known for causing bacteriaemia. That included this study, meta-proteomic analyses of COVID-19 patients demonstrate co-infection with other pathogens. A letter detailing this workflow and our results was published in the Journal of Proteomic Research. We're continuing to develop and modify our list of co-infecting organisms by continued analysis of more samples of patients in India and Brazil. This information can be found at the URL at the bottom of this slide. I'd like to conclude by acknowledging all of those who work on this, on these studies in the Griffin lab, namely Dr. Pratik Jaktab, Subina Mehta and Dindui Anwen. I'd also like to thank our collaborators in the galaxy EU and then the Wollong lab at Scripps Institute. Thank you for your attention and I hope you enjoy the rest of the conference.