 Good afternoon, everybody. Welcome to this very special virtual seminar of the SIB. Today it's going to be a celebration of the SIB Fellowship Program and the students that have graduated last year in 2017. So the presentations will be from Ron Appel, the SIB Director, as Active Director of the SIB, and the four students that Ron is going to introduce you during his presentation. So this is a very special seminar with many people presenting at the time. We are going to keep all the questions for the end. So those who are listening to us from the streaming, they can ask the questions through the chat box of the Adobe, and we will repeat the questions for the presenters afterwards. So thank you very much for all, and enjoy the presentations. Ron Appel, please. Ladies and gentlemen, it's a great pleasure to welcome you at the celebration of the first promotion of the SIB Fellowship Program. Welcome to all of you in the room, and welcome to the many thousand people online. Something like that. So as you can see from the title of my short introduction, it's about research and it's about people. At SIB, our mission is mainly threefold. On the one hand, we have the goal to provide infrastructure and resources to the life science community worldwide. Then we provide training and eventually third aspect, we try to promote research. And today, as you have already understood, it's about promoting research and to do good research in bioinformatics or the bioinformatics part in life science. What do you need? You need well trained, outstanding bioinformatics. And this is why we decided a few years ago to create the SIB Fellowship Program. And so the goal of the SIB Fellowship Program, in order to promote better research, even better research in Switzerland, was to attract outstanding students to Switzerland and provide them with a grant so that they can do a PhD in three years, exceptionally within an additional year. And the outstanding students who were selected could, in this way, carry out a research project in three or four years under the supervision of one of the universities being a SIB group leader. So they were supervised by a SIB group leader. At the same time, the PhD would take place at one of the Swiss universities, one of the SIB partner universities. Each of these universities are institutional members of the SIB. In addition to that, these selected students would be, during the whole time of their PhD, they were or are SIB members, and they are also part of our SIB, of our Swiss Bioinformatics PhD Student Training Network. So the first step was to select the students. We published a call, we actually published two calls, one in 2012 for a selection taking place early 2013 for PhDs work starting in 2013 and the second call, 2014, to start in 2015. So in the first call, we published it quite wide. We had more than 100 applications for more than 30 different countries all over the world. From the file, from the application, we selected 18 applicants whom we invited to Switzerland for the two days selection process. And out of these, we selected six students for the fellowship. And then two years later, we did the same again and we had actually similar figures. We had more than 100 applications from as many countries and we selected 18 who were invited and then selected only four. We had only four grants available. And SIB is funded in part by the federal government but not for research. We are funded for the other two missions providing infrastructure, providing resources, developing tools, developing databases, part of training that we are not funded for research so we needed to be able to carry out this fellowship program. We needed sponsors and we were quite lucky to find a number of sponsors. SystemTix.th, which funded two PhD students, the Foundation Lennart in the Arc Lemanik, the Universities of Geneva, Zurich and Lausanne and then the Geige Stiftung and the Foundation with a very long name, the Foundation for Excellence and Talent in Biomedical Research. So we are indebted to these sponsors which allowed us to launch this fellowship program. And today we are celebrating the first promotion of our fellowship program and more precisely, we will celebrate all together the first four PhD students who finished their PhDs. So very briefly, I finished by introducing each of them. In the order they are going to present their talk, the first one is Yannick Vollmer who did his PhD within this program under the supervision of Denmark Iber at the ETH, funded by the SystemTix.th grant. Yannick joined the SIB coming from the University of Stuttgart. The second one is Franzista Grull who came from the University of Heidelberg, did her PhD in Lausanne under the joint supervision of Henriques Mann and Johannes Xenarios. Then we have Malgorzata Nowitzka when I tested the... Sorry about that. When I tested the presentation, there was a very nice picture of you but we'll see you in... No, not at all. So you did your PhD under the supervision of Mark Robinson at the University of Zurich with the grant from the University of Zurich and eventually, and you came from Poland, you did your Masters at the Polytechnic of Bresla and eventually Gabriel Studer doing his PhD with Thorsten Schwede in Basel and he already did his Masters in Basel. So far for the short introduction. Again, I would like to thank first our sponsors for their generous support, then our fellows, first for being here today and doing your presentation but also for the nice work that you have done and all of you who are here to celebrate with us. So enjoy the talks. Alright, thanks a lot for the nice introduction and thanks to the SIB for organizing this nice event here today. As said, I did my PhD under the supervision of Dagmar Ibo at the PBSSE of ETH Zurich and during my PhD I focused on the question of growth control in organ development with a particular focus of its analysis in Drosophila and also a bit of vertebrates as we will see later on. So what is growth control? Well, in growth control we try to answer the question of what determines the final size of an organ. Just think about your two arms. They're perfectly the same length then proportion to the rest of your body so clearly there need to be robust mechanisms in place that ensure growth termination at the right size. What do we know about growth termination or growth control? Well, on the one hand we have organ extrinsic control and on the other hand as I said your organs scale with your body size and clearly body and organs size are influenced by external factors such as for example nutrition. On the other hand, however, we have also something called organ intrinsic control. To explain this idea, take this experiment done over 80 years ago by Twitty and Schwinn. There they transplanted the developing limbs from a smaller nude species to a bigger one and vice versa and what they observed is as you can see on the picture is that these grafted limbs are cut through their donor size clearly indicating an organ intrinsic mechanism of growth control here. Now, due to the shortness of the talk I cannot go into details of previous models that have been published but let me just say that there are other models of course but a lot of them are controversial or have failed. So my PhD focused on the question of organ intrinsic growth control and when I started my PhD the question was well wouldn't cell differentiation so going from a highly to a less proliferative cell state in the deterministic manner be sufficient to terminate growth robustly. We studied this question in the development of the eye in the Drosophila fruit fly and the eye as any other organ in Drosophila develops from an imaginal disc. Here you can see the antenna eye imaginal disc the part indicated as indicated from the antenna as well as the eye. Now, growth of the eye part is quite particular because the cells anterior so if you look on the screen to the left of the morphogenetic furrow and are contributing the growth are proliferating while the cells to the right are actually differentiated cells so there's a clear spatial separation between them. Now this boundary, this morphogenetic furrow is not stationary but as development progresses it sweeps in a posterior to anterior so right to left over the tissue. Final size of this organ is thereby determined how fast the cells of the tissue growth in the anterior part and how fast this boundary moves over the tissue. We started to look in this of course first we need to acquire data, we used 3D imaging and then image reconstruction to give us the total area shown in red as well as the anterior and posterior areas. Also we measure what is shown in orange the posterior length and that's the distance that this furrow has moved up to this time point and this is linearly related to time. We then of course can start to explore the data there are two interesting properties in the data first the total area shown on the left expands about linearly with developmental time and on the right you see that when you plot the anterior proliferative part versus the posterior differentiated part and you see this bell shaped curve with the fast initial increase which then settles off and eventually we can go into detail again. So the question now is are these growth dynamics really just the result of this moving furrow or do we need additional mechanisms that allow the furrow to catch up and eventually terminate growth? To answer this question we could write this simple model where the change in the total area over the developmental time is of course equal to the changes in the anterior and posterior part and that's then equal to what we call the area growth rate K times the anterior area. Now we are interested in what is this growth rate K over time? Well we cannot measure K but we can infer it from our data to do so we can approximate the slope in this plot and then simply divide the respective slope by the anterior area and when we did this we immediately see that there is a continuous decline in this growth rate K and the additional mechanism that down-regulates the growth rate in this anterior part of the tissue and allowing the morphogenetic furrow to catch up and eventually terminate growth. So this is already interesting for us but of course we were wondering can we also find functions that describe this decline and eventually link these functions also to biological mechanisms? We found three different functions so a power-law relationship an area-dependent growth-law where the growth rate K is inversely proportional to the total area P and an exponential growth-law. Now unfortunately the first case we couldn't find any corresponding biological mechanism the second case would correspond to a case where a growth-controlling factor is getting diluted and finally the exponential case would correspond to a growth-controlling factor being degraded. We then went on to simulate the growth of this tissue and use these different growth models and when we did this this is what we got. Now the different colored lines are the different models with a declining growth rate. In gray you can also see what would happen if we had a constant growth rate and we immediately see that we would not be able or that we are not able to reproduce the growth data with this model. Now coming back to the colored lines what you see is that all of them fit the data nicely but all of them have differences between them are very subtle. So clearly it's not possible to say one model here is better than any of the others. However there's one interesting property of these dimensional lists and that is that you can dissect them out of the developing larvae and transplant them into the abdomen of adult flies and what will happen is that they will still grow to approximately the same size but development will take 5 to 7 fold longer. So here's such data on the left it's our control larvae with the wild type data and on the right it's data for such grafted items. Now interestingly the area dependent growth rate which would correspond to a dilution based mechanism naturally preserves the final size and also the growth kinetics at these lower developmental states. So we thought well this would be a very elegant model and in a follow-up study to this first part then looked whether we can actually identify a biological molecule that could regulate growth in such a manner. And indeed we found unpaired or UPD its production is restricted to the initial stages of eye development so it's being diluted during the main phase of outgrowth and as you can see here it has a massive influence on the final eye size when it's down or up regulated. Now we did a genetic screen for different mutants in this pathway and then checking the final eye sizes in the adult flies. There are a few of these genotypes here. So the black genotype that's a control strain and all the previous analysis has been based on this strain. The blue genotype is another control strain with very similar eye size. The yellow genotype there we down-regulate UPD signaling we get much smaller eyes and in the red case we up-regulate UPD signaling and we get much bigger eyes. We then went on to do the same or similar analysis as before so we acquired the growth data nothing too surprising here but then more interestingly went on to again approximate the growth rate K in these different genotypes. So here's again this dilution based or area dependent model where we have an inverse relationship between K the growth rate and the total area P and we have a slope of minus one and so this direct inverse relationship. Now in case of the blue genotype as I said it's another control case we expect again a slope of minus one and similar growth rates and that's indeed what we observe. In case of the yellow genotype we reduce the initial levels of UPD we get smaller eyes so we would still expect a slope of minus one but substantially lower growth rates and again that's what we observe. Now finally in the red case we continuously express UPD now also during the phase that it has been previously only been diluted and we would therefore expect that we now get bigger than minus one and that's indeed what we observe. Now finally we wonder whether such declining growth rates could be something that's evolutionarily conserved also in other organs and we therefore check also these other organs now I just want to show one example here so that's a developing limb butt of mice we again measure different properties of this development and we again found that we need to use declining growth rates to fit this data and more importantly or very interestingly found that we can use similar even the same growth models to describe this decline. Now in case of the limb butt we also have the possibility to actually check the proliferation rates directly and again found that there's a decline. With this I would like to conclude I've shown you very briefly that in during my PhD I observed in declining growth rates in organ development independent of the organism studied and more specifically I've shown you that in the context of this the area growth rate declines inversely proportional to the area growth and that dilution of UPD quantitatively explains by development of several reasons. Now there have been of course a lot of people involved in these projects along the way I want to thank especially my supervisor Professor Dagmar Ibo as well as Professor Fernando Casares and his whole team for the nice collaboration on the Trasocular Project and then of course I want to thank SIB and Systems X for the generous funding of refuge. I would like to thank the organizers for the invitation to be able to present here my PhD project which is entitled Recently Active Transposable Elements provide insights into the evolution of my life circular RNAs. The transcriptional landscape of a cell is diverse. There are many different RNAs such as protein coding RNAs, long non-coding RNAs, different small RNAs and since recently the landscape was expanded by the discovery of circular RNAs. Circular RNAs distinguish themselves from a normal linear transcript by the unusual splicing behavior. In a normal splicing reaction the 3-prime end of the nexon is joined to the 5-prime end of nexon that is located downstream in the gene. In contrast circular RNAs are formed by a so-called back splicing reaction in which the 3-prime end is spliced to a nexon that is located upstream in the gene leading to the observed circular structure. As of today there is very little known about circular RNAs. It is evident that they are present in a wide range of species and developmental time times. They often overlap with protein coding genes and they are very very stable due to their circular structure but possess very low expression levels. There is some evidence that the back splicing reaction is supported by repetitive sequences in the blanking entrance. These repetitive sequences can basically approach each other forming a herping-like structure in which the two axons that are involved in the splicing reaction are brought in closer proximity which facilitates the whole reaction. Interestingly they are often found in neuronal tissues and the axons involved in so-called RNAs are associated with higher conservation scores. Currently it is hypothesized that they might be involved in a variety of different processes such as microRNA sponging or transcriptional control but we only have a handful of functional samples which makes it very very difficult to extrapolate on their functional importance. Nevertheless the high number of circular RNAs their presence across different species and their sequence conservation is often used as a claim for their functional importance. However none of the current studies was specifically designed to address the question of functional importance based on evolutionary conservation and we therefore decided to re-adress the hypothesis in my PhD project. I've been working with a dataset that consists of five different species the opossum, mouse, red, reeds, macaque, and human. In addition I looked at three different organs liver, cerebellum, and intestines. In collaboration with the lab of David Getfield from the University of Lausanne we generated paired and sequencing data and I developed a detection pipeline to predict and to identify circular RNAs across all the different species and tissues. When I applied this detection pipeline to my samples I'm able to identify about 2,000 to 3,000 circular RNAs per species. In the next step I tried to assess the overlap of circular RNAs between different species based on the excess supply side. I was able to identify a small number about 100 circular RNAs shared between all the species. This number is higher than expected. Nevertheless, we can also see that the number of species specific circular RNAs is at least a magnitude higher. In addition to this I was also able to show that the expression levels show kind of intermediate conservation levels and similar to what was reported in the literature circular RNA extents process higher fast transport. As I just showed you some circular RNAs are shared between species but does this necessarily mean that they are conserved? In order to address this question we need to understand that there are two alternative hypothesis that can explain the observed overlap. In the case of divergent evolution we have a phenotypic trait in species A and B and this phenotypic trait developed from a common ancestor. So this trait is conserved. In case of circular RNAs that means if we have a circular RNA in species A and B there was already in the common ancestor of these two species a circular RNA present. But sometimes similar environmental conditions can also lead to the development of similar phenotypes by a process known as parallel evolution. In that case we have a phenotypic trait in species A and B but it originated independently and there was no common ancestor. For circular RNAs that means even though we have circular RNAs in species A and B they did not originate from a common ancestor but maybe because of similar genomic constraints. I therefore was wondering if there are indeed such genomic properties that can maybe explain the presence of circular RNAs. In order to address this question I used different linear regression models. The idea here is that we have a response variable that can be the presence and absence of a circular RNA parental gene how many circular RNAs are present in a gene or whether circular RNAs are shared for species specific. We then use a set of different predictors to understand how they contribute to the probability of observing this response variable. In my case the set of predictors consisted of for example the genomic length of the gene in which the circular RNA is found the number of axons and transcripts the GC content the complementarity within the gene the number of observed repeats and a couple of other predictors. When I apply this linear regression models to my circular RNA data set then it becomes very evident that there is indeed a set of several genomic properties that predict circular RNA presence. These properties are a very strong decrease in GC content an increased genomic length, higher pass-cons scores and self complementarity in the gene. These properties are true for all of the species analyzed. In addition, these genomic properties do not only predict the presence of circular RNAs but they can also indicate the number of circular RNAs per gene and their presence in other species. Furthermore, I've seen that the complementary of the intron and the number of repetitive sequences seems to also play an important role in influences whether a circular RNA is present or not and I therefore decided to analyze these repetitive sequences in more detail. Circular RNA flanking introns are repeat rich. What we can see here is the mean repeat frequency of flanking and background introns in all the different species and as you can see the purple bars which represent flanking introns are at least two fold higher possess at least two times more repetitive sequences than the background introns. In all species these repetitive sequences do also overlap of small transposable elements and when we analyze these small transposable elements in more detail then we can see that these TEs are very often lineage per species specific as in the example of human that I'm showing here. So in purple we have those TEs that are enriched in the flanking introns and interestingly they are primate or human specific. They all belong to the class of ALO elements. In contrast we also have a couple of repeats of TEs that are decreased in the frequency and what we can see here is that for example all the near elements that are mammalian specific TEs present across all mammals are degraded in these introns. There are additional associations such as that the enriched TEs were recently active so they are young because they were recently active they have lower degradation levels and they have the potential to form stable secondary structures not only with the exact same TE class but also with other family members. Interestingly shared circular RNAs are also enriched in lineage and species specific TEs contrary to what we would expect if they are shared. From the literature we also know that TEs can interfere with the spicing reaction of the gene and cause spicing error. And therefore I have hypothesizing the following model to explain why circular RNAs are present across different species. The idea is that we have off-locust pyrenic genes that are found in a similar genomic context because the genomic context is similar it leads to the species specific integration or independent integration of lineage specific transpose elements into the circular RNA pyrenic genes and because these integrations were recently under TEs are young they can still base pair very strongly in order to form the herping-like support circular RNA formation. That means that the presence of circular RNAs in different species and tissues is not explained by the functional significance and high conservation levels but instead by a similar genomic context in which off-locust pyrenic genes are found. The presence of recently active and lineage specific transpose elements suggests therefore that circular RNAs originated independently from each other and that many circular RNAs would rather be a splicing error than a functional product. And with this conclusion I would like to thank you for your attention and I would like to thank you my supervisors, Henry Kasman as well as all the other people that have been involved in the project and the different funding resources. Thank you. Thank you for organizing this very nice event to have opportunity to refresh my PhD thesis which has a long title but in short it was mainly about differential analysis which were answering questions of various biological questions and also were dedicated to analyze various types of hydroput data. I will talk briefly about my two main projects. One of them was to develop a package which is called DreamSpeak. This package you can use for differential transfigure analysis from RNA-seq data and the other work is the site of workflow which describes differential analysis approach to site of data. Both of them are available on biocondacter so you can go there and see them. So the key features of the DreamSpeak package is that it can be applied to differential transfigure analysis and the transfigure CTL analysis. It is based on Dirichlet multinomial model with information for statistician which is an over dispersed model for proportions in comparison to multinomial distribution. It was designed to perform inference in small sample size data and with it you are able to model complex experimental design which can account for bad effects that you have in your design. Here in the corner that comes along with the package so Marc has some of them if you are interested to put it on your laptop or somewhere you can get it. So now I will move to the site of workflow. So the site of technology is used to measure protein expression on single cells. In this experiment for each cell proteins of interest are tagged with metal isotopes which are attached to the protein specific antibodies. Such tagged cells go to the mass cytometer which does magic and measures the metal isotopes abundance on each cell. As a consequence we get a table of this metal abundance for each single cell. And the site in the site of workflow describes how to perform differential analysis using this data. We refer to this approach as a classical. It consists of two steps in the first step we identify the cell populations present in the data for that we use unsupervised clustering approaches. In the second step we use statistical modeling and testing to find out which cell populations are differentially abundant between the conditions of interest. In our workflow we use a demonstration data which comes from Bode Miller study where the peripheral blood mononuclear cells were investigated. And here we want to compare samples from 8 patients into conditions before and after stimulation. In this experiment 10 cell surface markers were measured but only the 10 surface markers used to identify the cell types. So the first step is to cluster cells into groups of similar cells. Here I show you results of clustering into 20 groups. On the left side this is a heat map that shows the median marker expression of this 10 markers in each of the 20 clusters marked by a barcode on the left. On the right side you can see a result of dimension reduction technique where here now each cell is shown into two dimensions. We have used here a method called T-SNE. It's a very popular method for this sort of analysis. So we can also see that in this T-SNE map the cells are colored according to the cluster that they belong to. And we can see that cells that belong to similar clusters are really closer to each other. In the next step an expert, my colleague was annotating this identified clusters into meaningful cell populations. In case when some of the clusters were having similar protein expression it can be that they were annotated as the same cluster. And here as a consequence we obtained eight main cell types for the PBMC from the PBMC. Now we are interested in the differences in the abundance of these cell types between the two conditions. So here for example we can already see that the dark blue cells which correspond to the CD4 cells are less abundant in the simulated condition on the right than in the reference. But in particular we are interested in the proportions in comparison of the proportions at which cell types are present in the samples. So here each bar presents the composition of the sample with the corresponding eight cell types. And in our method we concentrate at each single cell type at once. Here I show you an example of the analysis for the CD4 cells. And on the right side you can see the same proportions but visualized using box plots. And hopefully you can also see that this visualization already shows much better what are the potential differences between the conditions. We use the generalized linear mixed models and we assume that the cell counts follow binomial distribution and the logit of proportions of these cells can be explained by linear combination of an intercept and a component that explains the differences between the two conditions. The observational level random effect that accounts for the extra over dispersion in the data and a random intercept for each patient which explains the blocking between patients which are highlighted here by the circles. Actually this workflow originally was developed during the analysis that I performed in a collaboration with my colleague, Karsten Krieg. Here the goal was to identify biomarkers that are associated with the response to anti-PD1 immunotherapy. We were studying the PBMCs in healthy donors and melanoma patients that were undergoing anti-PD1 immunotherapy. Those patients after the therapy were then classified into responders and non-responders. And the goal was to compare what are the differences between responders and non-responders the green and the red patients. Additional challenge in here was that the data was acquired in few measurement patches but luckily thanks to the GLM approach we could account for that in all models. So the main results from this analysis are now coming in the next slide. So first we were interested in the general characterization of the lymphocytes. Here is a TCM map showing the identified cell types. We have identified 7 main cell types and among them PD4 T cells and CD8 T cells were down-regulated in responders in responders versus non-responders. And for the myeloid cells we could observe an inverse association where there is higher proportion of those cells in responders than in non-responders. In the next step we are also interested in more in-depth characterization of the myeloid cells and for that a different panel of proteins was measured for again and here we were able to stratify the myeloid cells into CD4 T negative and CD4 T positive cells. From the differential analysis in here we could find out again, we could confirm again that the T cells are less abundant in responders than in non-responders and actually among the myeloid cells it's only the CD4 T cells that are in higher abundance in responders and those CD4 T cells are also referred as classical monocytes. We could validate these results using an independent cohort of patients in a FACTS experiment where the same trend is shown and we could also find a significant association of better clinical outcome for patients with higher frequency of monocytes. With that I would like to thank my supervisor Mark for great supervision through these four years other committee members my collaborators of course SIV and the aerobics and again Mark. Thanks again the last time now for organizing this event and I want to present to you the work of it through my Ph.D. with the title Efficient Algorithms and Protein Modeling and I want to start with throwing a number into the room so it's 20 years of the Swiss institute of bioinformatics so the project that the opportunity to work on is tightly connected to the SIV but it's around for even longer so it all started with a paper from 1993 by Manuel Peitsch and what emerged out of it is the Swiss model web server that is still around today. When you open the website you see a few sentences that describe what it is and here in bold I highlighted the actual philosophy behind the purpose of the service to make protein modeling excessive to all biochemists and molecular biologists worldwide. Well to answer the question why protein modeling is necessary we have to appreciate a little bit the functional complexity of protein. This goes a little bit beyond just complex metabolic networks as it is highlighted here on the slide but it also involves signaling events, cellular transport, I mean proteins are even building blocks for cytoskeleton but what ultimately determines function is structure. Well, so this means when we want to understand the mechanistic effect of a disease mutation we need to know structure or in our case maybe a model can inform about the question at hand. This is shown here with the example of a gamma hydroxy boot-rate dehydrogenase short GHPD which we built for an ongoing project in our group. Well, but when we come back to actual structural information we have a problem and that's the problem. The problem is here this blue line that shows the number of entries in the trample database and the growth here I represent the explosion of the sequence methodologies we have observed in recent years and what you would see in the red line is the number of entries in this withdraw which is also a project here at SIP and it's a manually securated subset of trample and in green you would see the number of entries that we have in the PDB so it's the number of micro-molecular complexes where we have actual structural data. Now to make red and green visible we have to look at the same thing at logarithmic scale and what becomes obvious is this huge structural gap so it's the number of sequences where we have actually no idea how it looks like and that's exactly the goal of Swiss model to reduce the size of this gap and inform the scientists of what could going on. Now this is all based on a statement that has been made in the 80s by Arthur Lesk and Therese Kapia where they claim that if two proteins are homologous structure can expect to be very similar this still holds today and I just illustrated here with three homologs to GHPD and the bottom right you see a pair of sequence identity matrix so this means that these numbers represent the percentage of identical amino acids when you do a pair of sequence alignment so despite the moderate homology the three-dimensional similarity is absolutely stunning and that's exactly the starting point for multi-modeling. It all starts with a pair of sequence alignment, you have a template you have a target and this sequence alignment also reflects evolutionary events that happen so you have insertions here marked with red you have deletions marked with green and the first step of multi-modeling typically is you have to resolve these evolutionary events to construct a valid backbone so you do that by sampling the conformational space that is available for such a stretch so you need algorithms to do that you need algorithms to finally decide for one particular conformation for performance reasons you typically don't do that in a full atomistic representation but rather reduce, reduce representations of the protein structure so this is why we then need algorithms that subsequently explore the conformational space that is available to all the sidechain atoms and finally decide which is the optimal configuration by doing all these steps you introduce very chemically regularities, there is nothing you can do about it but you can typically nicely resolve them by applying energy minimization using molecular mechanics force fields now on this slide this slide pretty much summarizes the main project of my thesis so it was all about exploring, investigating current state of the art algorithms in homology modeling that implements novelties and push the current state of the art, push what is possible what came out of it is the Promo-3 modeling engine so it's a full software package that does the full homology modeling from A to Z it has a modular design and is written in highly efficient C++ code and the clue with it is all the functionality is exported to the Python scripting language so it's also promised for the future so you can also type new algorithms on top of it but when I now talk of pushing the current state of the art I also need to provide a little bit numbers so that you believe me so the obvious thing to do is to compare with what most people use and in this case this is modeler even though it's a relatively old software package that's what the people are using and what I did here to generate this illustration here is I obtained a few hundred models that built them in parallel with exactly the same input in modeler with Promo-3 and I plotted a histogram of differences in score so in rat you see the so-called LDDT score which is a metric that describes the similarity of a protein structure to a known target higher is better when you now take the difference you see that the rat's distribution is clearly shifted to the right so models generated with Promo-3 are more accurate on the other hand in blue we have the mole property score that is agnostic of the target structure but the rather evaluates the value of chemistry and in this case lower is better and again Promo-3 seems to produce better models now what to summarize this plot we can say that despite the runtime of Promo-3 to generate one model is lower than with modeler the models are more accurate more well to close the loop a little bit to the introduction we have a look at Swiss model now it looks like today 25 years of the kickoff and I can proudly say that since about one and a half years it is powered by the Promo-3 modeling engine and since we have about two requests per minute this makes already a few thousand or many thousand protein models that have been built for the scientific community worldwide but that's not all you cannot just throw three-dimensional coordinates to a user a typical question that comes up is how accurate is my model how similar to the actual target structure can expect it to be this is the second question I followed a little bit in my PhD thesis and there I was involved in two projects it's all about the so-called quality estimation problem and I asked him in a bit more specific question the question was rather where is my model good where is my model bad it was all about per residue of local quality estimates so one project that was involved in was estimating local qualities with a scoring function that depreciates the physical chemical properties that occur in trans membrane protein models and then another project that was together with in collaboration with a master's student Christine Ramper where we found a consistency of intratomic distances in protein models with ensembles of constraints that we extracted from all homologues that we find for a particular target now that was the summary so it's by the way this is all available also from the Swiss model website and this was the summary of what I did so it's time to wrap up and I can say that this thesis had an impact on quality estimation in protein models and it further pushed the philosophy of Swiss model and making protein modeling accessible to all biochemistry and molecular biologists worldwide and it's time to say thank you so thank you ZIP for organizing this fellowship program and even a PhD student needs something to eat so also funding is very important and in my case this was the Swiss foundation for excellence in biomedical research and I also want to thank Porsen for a great working environment at the Biazentrum in Basel and the other two people you see there is Tim Mayer from the Biazentrum and Matteo D'Alperaro from EPSL from my PhD committee this also wouldn't have been possible without psych or the center of scientific computing and there are so many names that I need to say thank you so this should represent that to just drop a few all the people from the group like I don't know Gerardo Tavriello, Stefan Wienert and so on so and also thank you