 Hi, this is Santiago Carmona and I will present our new bioinformatics tool called project deals. The target audience for this talk are researchers using single cell transcriptomics to study immunology related problems. By the end of this talk, you will be able to incorporate a powerful tool to interpret immune cell states. In the past 10 years, single cell transcriptomics has allowed us to explore cell diversity at high resolution and throughput. These technologies are transforming the way in which we study human disease. For example, here we are looking at the low dimensional representation of the tumor microenvironment. While it is relatively easy to discriminate cell types, broadly speaking, such as T cells, B cells, dendritic cells, by looking at their gene expression profiles. When we start looking into the diversity within each cell type, for example here T cells, it becomes much more difficult to characterize discrete cell states. For example, these recent studies show that T cells in human tumors can be found in different states of differentiation, such as cytotoxic, transitional, activated, exhausted, etc. Cell state definitions are inconsistent across studies, so we don't know if states from one study exist in a second one, limiting our ability to discover general patterns. Multiple sources contribute to these problems, but in particular cell state definition is very sensitive to data analysis parameters and to batch effects. And this is the bioinformatics problems we are addressing here. In the single cell analysis workflow, we have multiple steps towards the definition of cell states, such as feature selection, dimensionality reduction and clustering, which depend on subjective choices of parameters. This is usually an iterative process driven by biological interpretation, and therefore is highly time consuming and requires expertise both in bioinformatics and in the biological system. And as a result, we get ambiguous definitions of cell states. To address this issue, we propose an alternative approach to data analysis that consists on generating once a reference Atlas, which summarizes the current knowledge in a defined biological system, and then projecting query data onto this reference. This allows us to interpret any data in the context of a stable and curated system of coordinates. As opposed to the typical workflow. This is a fast and an automated process provides consistent search state definitions and enables systematic comparisons across studies, but how do we generate such a reference address. Well, we can either decide for one data set of our choice and use it as the reference, or we can take data from multiple sources and integrate them into a new unified reference address. And we have found that usually the second one is the better choice. The first address we did is one for CDAT cells in viral infection. In this case we took 12 single cell data sets from different studies from acute and chronic viral infection models at different time points and generated a reference by data integration and manual annotation. We can verify that this reference Atlas accurately described prior knowledge in terms of the phenotype of the different CDAT cell states, and also in terms of the temporal patterns along the course of infection and the type of infection. Now that we have a reliable reliable Atlas, we can start using it to interpret new data. And then we can run by our algorithm projectiles. In the first step, the method will filter the relevant cell type in this case thesis, and then it will correct for batch effects and embed the query transcripts into the high dimensional space of the reference address. For visualization projectiles uses control lines indicating the areas of the map with high density of project it says. This is a real life example. In their paper from 2020 San Juan colleagues study the diversity of CDAT cells in chronic viral infection across six different organs. This is particularly challenging data set to annotate and interpret using a typical unsupervised approach, because batch effects are mixed together with cell subtype specific variation, and with organ specific variation. By projecting this data onto our reference Atlas projectiles can easily identify the subtypes across organs and batches. And these automated classifications accurately matched those generated by the experts in their original study projectiles is able to robustly identify a thesis subtype in diverse conditions. For instance, here, this thesis stating yellow, known as slack in spleen and blood, or this other thesis state called text, both in spleen and liver, but also importantly, it allows us to identify conditions specific transcriptional differences that deviate from the reference states. In this case, we can see tissue specific signals, including abrogations of genes, such as inner for a one and city 69, which are likely associated with increase antigenic stimulation in spleen, compared to other tissues. Going back to cancer, we generated a second reference Atlas, in this case of tumor infiltrating CD eight and CD for dessert by multi study data integration and annotation. In the tools, we can now address the problem of defining consistent these states across studies. We can now project sales from any study and interpret their states in a reference system of coordinates. For instance, we can look at this diesel data set by us and colleagues. In addition to the sales states the authors defined in their own transcriptional space. And we can see how each of the states localized in our reference space and compare the annotations provided by the authors with the automated annotations of project deals. Finally, we can start mining public data and asking important questions, such as, what are the immunological differences in tumors from patients that respond, compared to those that don't respond to cancel In summary, we provide a recipe to build reference atlases by multi study single cell data integration. We contributed to specialized reference atlases for teachers in cancer and viral infection summarizing prior knowledge in these systems. And we provide a new method project is for the automated interpretation of new singles and data using reference atlases. For this, I would like to acknowledge the excellent work, Massimo and data, the main architect behind project deals, and to thank our collaborators, George cook us Jesus Korea Soryo from the University of Lausanne, and Rafael kubas and sorry newly from genetic from their critical contributions. I would also like to thank the Swiss National Science Foundation for funding this research through their ambition program.