 Hello, my name is Lonika Schaeffer. I am a PhD student at the University of Oslo. And together with another PhD student named Milena Pavlovic, we created ImmuneML, a platform for performing machine learning analysis on adaptive immune receptor repertoires. And ImmuneML is available through our public Galaxy instance through several Galaxy tools, which is what I will talk about in this presentation. Immune repertoires, or AIR, are collections of immune receptors in the body. Such immune receptors recognize antigens like viruses, bacteria, or cancer, making AIR data highly useful for disease diagnosis and drug design. But the biology that underlies AIR data is very complex and unique, due to the high diversity and low overlap of repertoires. And therefore we need specific AIR encodings and machine learning methods, and cannot rely solely on generic frameworks like scikit-learn or PyTorch. Several studies successfully applied machine learning to AIR data, but there is a great lack of standardization, making it difficult to compare and benchmark the different solutions, which is in turn hindering our ability to unravel the biology underneath. Therefore we developed ImmuneML to provide a platform for developing, evaluating, and choosing the most suitable machine learning approaches for AIR studies. ImmuneML can be applied to two types of classification problems. In receptor classification, we want to predict whether an immune receptor binds to a given antigen. And in immune repertoire classification, we try to predict the disease. Here you can see the overall ImmuneML workflow. ImmuneML takes in an AIR dataset containing receptors or repertoires with the labels we want to classify, and a YAML specification file describing the settings of analysis components, as well as the instructions that will be performed with these components. Finally, the results produced by the different instructions can be navigated through an HTML summary file. We created eight different Galaxy tools, which provide interfaces to these instructions. The first set of tools is used to create a dataset. We can import an experimental dataset from files in the Galaxy history. We support importing AIR data from all common file formats, such as AIR, VDJDB, MyACCR, and others, as well as custom tabular formats which can be specified in the advanced interface. The produced HTML summary shows the dataset type and size, and which labels can be used for training machine learning classifiers. Alternatively, a synthetic dataset can be created consisting of random amino acid sequences. This tool takes a YAML specification as an input. The resulting dataset can be used the same way as any other ImmuneML dataset, but it does not contain any meaningful patterns that can be learned through machine learning. To synthetically implant such patterns into a repertoire dataset, we can use this next Galaxy tool. We can simulate immune events, such as disease states, by implanting motifs in some of the sequences of the repertoires. This tool takes in an ImmuneML dataset, which may either be experimental or simulated, any YAML specification, and the result is a modified version of the given dataset. The advantage of simulated immune events is that the ground-truth disease signals are known, which can be used for benchmarking purposes, as was shown in this recent preprint. The following tools can be used to train and apply machine learning models. When training machine learning models, ImmuneML uses fully configurable nested cross-validation to find the best hyperparameters for a given classification problem. These hyperparameters cover data preprocessing, encoding, and the machine learning method. We tailored the Galaxy tools for training machine learning models to two different user groups. For immunologists, we provide simplified interfaces, with questions related to the assumptions that they have about their data. Based on their answers, the most appropriate encoding is chosen, and technical settings related to hyperparameter optimization are largely predefined. This tool also exports the internally generated YAML specification to the Galaxy history, which can be used as a stepping stone to get familiar with the more advanced interface. For bioinformaticians, we provide a YAML-based interface, which allows full control over the analysis parameters, access to additional components for preprocessing, data encoding, and machine learning methods, and allows for seamless switching between the Galaxy and command line interfaces. In the resulting HTML summary file of these tools, we find information about the performance of the different models in the outer and inner cross-validation loops, and other relevant statistics and visualizations. Additionally, the trained model with optimal hyperparameter settings is exported to the Galaxy history as a zip file. This zip file can be used again as an input to the next Galaxy tool to use the trained model to make predictions on a new dataset where the true disease state or antigen binding labels are not known. Lastly, we provide a generic Galaxy tool that can take in any YAML specification, which can be used for other ImmunML applications such as visualizing and coded data without performing machine learning. If you use ImmunML, please cite our preprint, which is out on BioArchive. The ImmunML tools are available on the Galaxy toolshed, so you can install them on your own local Galaxy instance, or use our Galaxy instance at galaxy.immunML.uo.no. The ImmunML command line tool can be found on GitHub, and for the latest updates, you can follow the Twitter account at ImmunML. And finally, I would like to thank all colleagues, collaborators, and funding that made this project possible.