 Hello, everyone. I'm Caroline Weiss. I'm an AI and ML engineer at GSK.ai. And I present a project that I've worked on during my PhD at ETH Zurich in Carson-Borgwards Group. And that project is clinical antimicrobial resistance prediction. And more specifically, I will introduce a data set that we created specifically for this purpose, which we call DREAMS, the database for antimicrobial resistance prediction based on mildly 12 MS. This talk is aimed at researchers working on the application of phenotype prediction algorithms, especially if you're working on vector represented data and data collected across different domains, if that is of interest to you. But this talk should also be for you if you're interested in early antimicrobial resistance prediction in general or study clinical mildly 12 MS. So I want to know about its applications. So in this talk, I will introduce the application of our prediction, the threat and treatment of antimicrobial resistance first. Secondly, I will introduce the data set. And thirdly, I will introduce some results that we've been able to achieve on antimicrobial resistance prediction. Antimicrobial resistance has been recognized as a major epidemiological threat for decades, and it was named one of the top 10 public health threats to humanity by the WHO in 2019. In a report on antimicrobial resistance issued by the UK government, it was estimated that the antimicrobial resistance was responsible for 700,000 deaths per year worldwide in 2016. It was estimated the number could rise up to 10 million a year in 2015. So research to optimize the treatment of infections and the usage of antimicrobial drugs is highly important. I will quickly introduce the main data type investigate throughout this project and that is Malditov mass spectrum. Malditov MS stands for matrix assisted laser desorption and ionization time of flight mass spectrometry. The base for mental workflow of Malditov MS is illustrated here in this image. A single bacterial colony is picked from a Petri dish culture, and that colony is then applied onto a target plate and the matrix solution is added. The lasers then apply to the sample which produces charged fragments, and these fragments are then accelerated in the electromagnetic field of the time of flight analyzer. Small molecules will arrive earlier at the detector than larger molecules and the result is a mass spectrum that maps the intensities of particles measured at each mass to charge ratio. This is what a typical Malditov mass spectrum looks like with measured intensity drawn against the mass to charge ratio. What we see in the spectrum is an overview of the proteins in the cell, which are mainly ribosomal proteins, and the peak profile is highly characteristic for the species of the microbe measured. Our research into Malditov mass spectra are motivated by one single question. In terms of data learning, can we extract more information out of the spectra than just the species, and the phenotype that we are specifically interested in is antimicrobial resistance. To see how this could be useful in the clinic, we quickly look at this illustration of how the mass spectra are integrated in the clinical treatment workflow. We have a collection of the specimen from the patient, which is then analyzed by Malditov MS to determine the species of the infection. In the culturing period, the phenotypic resistance is determined and based on that information, the antimicrobial therapy is set. So if we were able to predict the resistance level at the time of Malditov MS, a machine learning guided treatment decision could be made at an earlier time point. This reduced time until the resistance informed treatment decision is made can have a large impact on the course of the infection and prevent using ineffective antimicrobial drugs. So with that foundation, we move into the second part of the talk, which introduces a new data set collected specifically for this purpose. We curated a data set tailored for Malditov MS based antimicrobial resistance prediction, which are several orders of magnitude larger than any previous data set. We labeled this data set dreams, the database of resistance information on antimicrobials at Malditov mass spectra. This combines data from four different medical institutions in Switzerland, which are labeled dreams A to dreams D. And the main and largest data set is dreams A, which was collected at the University Hospital of Basel. All of dreams contains clinical routine data, which is important as we are aiming for a clinically applied predictor in the end. And therefore it was important to train and predict with real world routine spectra. So look at Malditov data from a machine learning point of view and how the spectra are represented the spectra are made up of tuples of intensity and master charge pairs, and the number of pairs making up one specific spectra is variable. The spectra are pre-processed with an established pre-processing protocol. And so in order to obtain fixed length vectors for machine learning algorithms, the measurement pairs are binned by adding up all points within a certain bin range. And we use the bin size of three dot. These graphs illustrate the effect of the binning from pre-processed spectrum to bin future vector. Further, we are defining a binary classification scenario for resistance, both the resistant and intermediate resistance categories are in the positive class. And the susceptible resistance category is in the negative class. And next we look at some results that are based on dreams. So in order to establish a machine learning baseline on dreams, we applied three machine learning models for inter-microbial resistance prediction. For this analysis, we focus on data from the largest side to dreams A. Based on results that are included in the paper that's based on, but omitted in this talk, we decided to use species stratified data sets and predict a single antibiotic phenotype at the time. So the species antibiotic pairs are septrics on resistance in E. coli and in Klebselapia pneumoniae and oxosilin resistance in Stavaurios. The species were selected on cleaning for relevance and antibiotics by how important they are to combat infections involving the species. We applied three models logistic regression, like EBM and MLP. And you can see the best performing curves are two times like EBM and MLP ones. We reached decent performances, for example, in our work of 0.8 for oxosilin prediction in Stavaurios. So in the next part, we move on quite a bit from the previous part and looking into developing new methods that leverage certain properties of mildly to a mass spectra. For this we developed a new kernel specifically designed to fit the properties of mildly to a mass spectra, which we named PyG, the PyG information kernel. There have been two spectra as an S prime with master charge values XI and XI prime and intensities lambda I and lambda I prime the closed form approximation for the inner product takes this form. Every peak in S is compared to every peak in S prime, and the peak intensities are multiplied weighted by this exponential and added up. The further away the two peaks are from one another in on the x axis the lower the weight is. I want to point out some properties of this kernel that are specifically important when working with mildly to mass spectra for prediction. The similarities calculate directly on the peaks which are defined through the master charge and intensity value. So the bending step described earlier is not necessary with this kernel, even more uses a more accurate representation as the x values are not altered through the bending step. Second, the kernel is capable of assessing interactions between the peaks, which is important as I mentioned that the same biological fragments can be detected in different locations along the x axis. Additionally, for our kernel solely a single parameter needs to be optimized T, which controls the degree of smoothing in the kernel. And finally you compare, we combine our PI kernel with the Gaussian process to obtain our model which is GB PI. And I hope this gives you some idea how mildly to have can be used for machine learning prediction. Other areas of application for which dreams could be perfectly suited are any other work continuing with antimicrobial resistance prediction. Also work in the field of the domain adaptation and that batch effect normalization, or learning unified representations of spectra from different sides. So this of course is not exhaustive. So all code from all experiments can contain in pi the packages that are made publicly available and dreams itself is publicly available and ready for download at the dry data repository. So those are some of the questions I want to reach out here my contacts and I want to conclude my talk by acknowledging all the people that were involved in this project, which were above all Carson and Adrian but also bassia and Alina Max and Catherine. And thank you for taking the time to watch this video.