 So, hello everybody and welcome to those that are connected, remotely connected to the CIB virtual computational biology seminar series. We have the pleasure today to have Laurent Falquet, so I will briefly go through his bio. He started doing a PhD in biochemistry at the University of Geneva in 1997, then he joined the MbNet group at ISREC in 1998, then he did a postdoc with the Philippe Buche on the first site database from 2000 to 2002, and then he became Mnet node manager for seven years and the secretary of the executive board. Then he moved to Vitality as a project manager where he worked for years on the NGS genome assembly team and since 2013, he became a metrassistant and metronome research at the University of Fibourg and is responsible for bioinformatics co-facility there and is also a group leader of CIB. So today he's going to talk about the bioinformatics co-facilities is responsible. So, thanks for coming and the pleasure. Thank you Diana, thank you to the organizers for inviting me, it's my first video broadcast lecture so I hope it will be okay and no problem will arise. Since we are in a special day today, I tried to find an interesting title that reflects what is the bioinformatics co-facility in Fibourg and I came to such a title, a small fish in a big pond or vice versa and in fact if you look at the co-facilities in Switzerland for example just by looking at this recently published CIB profile you will see that there is a huge number of co-facilities but some are hidden in fact because as you see on the right side column not all have the black dot at the end and so there are a few co-facilities that have the black dot being the principal activity of the group but most of the groups within CIB as you can notice do some kind of co-facility business and here I just listed a few of the most important ones starting of course with vital IT and then listing nearly every university, every institute or polytechnical school in Switzerland has its own core facility and the University of Fibourg is a small one in all these big fishes let's say it's a small fish among the big fishes but if you look at the University of Fibourg by informatics only then from University of Fibourg I'm one of the two big fishes or three big fishes let's say in the University of Fibourg because we are only three groups with bioinformatics there so that's why I give this strange title if we go a little bit deeper into my small fish here you see that it's organized very simply there is one head myself plus some scientific collaborators that are part time collaborators and because unfortunately the funding and also the necessity within the University of Fibourg is not such that we can have several fully paid people so we have currently two stagiaire let's say two persons that are working with me on different projects and we have past members of the group that are listed here like Patrick Fah or Julien de la Fontaine and Lissandra what do we do as a core facility very similar to other core facilities we provide support for biomedical researchers at University of Fibourg and thanks to of course vital IT and other resources we are able to analyze the data that is produced and most of the data is next generation sequence although it becomes you can have different sorts of data for example there are cases where proteomics data becomes popular nowadays analyzing the structure of proteins in the membranes using cross linkers it becomes really popular in proteomics and for unfortunately in that field I'm very weak so I need to learn a bit more but it's too much for my capacities at the moment so I hope that someone else in the SIV will become an expert in this field so I can rely on those we secondly have a lot of teaching and training activities so and as you can see I really separate the two things because these are completely different stuff if you do a teaching for a bachelor students or master students it's completely different than doing a training for PhD students that are really interested by coming and learning something you know bachelor students often are forced to come to some courses and they are not really enthusiastic for the courses they are much more interested by what will be in the exam and if they would succeed to get the good mark you know that's the good question so that's two different things and I will give you an example of teaching and also of course we have some research projects within the group and it's mostly regarding bacterial genome assembly and analysis this is some examples of support that we gave to different research groups the first one may be a bit more in detail is the identification of proteins involved in the microization I will go more detailed for that one because it's already published and I will give some short examples of other types of analysis it's RNA-seq and chip-seq I would say are very popular among researchers at the moment here in C elegans but also in mouse and added to that pathway analysis becomes also something that is asked by many people but it's not always easy to do these kinds of analysis chip-seq and pathway analysis are quite challenging at the moment recently I got involved in a metagenomic project on plant leaves and I will also give you a short introduction on that project it's still ongoing work so I cannot go too much in detail and finally of course my genome assembly annotations there are few publications last year about this genome assemblies and also you will see a new tool that has been developed in my group so let's go a little bit more in detail with this plant genes that are involved in microization process so microization is a symbiotic process between plants and some fungi in the soil and as you can see here the red part here is the fungi and the fungi invades the roots of the plants but invades it in a controlled manner building so-called arboscular structure and this arboscular structure allows the plant and the fungi to exchange compounds symbiotically and this arboscular structure is very good because it increases dramatically the surface of exchange between the plant and the fungi and this is an important process because plants that do have this arboscular micro microeases grow much faster and much better than plants that do not form these arboscular structures and it's interesting because it's found in some decotyledons and also some monocotyledons but there are branches within the plants that do not form microeases at all and the idea of the project was to mine the database the public databases to see if it would be feasible to identify proteins that are typical of this arboscular microease symbiosis that that could be the signature of this symbiosis and we designed a complicated pipeline where we extracted whole proteins from ensemble plants and we chose a certain number of full genomes that are known to form arboscular microeases like tomato, like potato, vitis, wine, wine yards and other plants like soybean and Medicago truncatula which is one of the most interesting in this case one that the people were interested and of course we selected a few plants that are known for not forming microeases and these are Brassica sea and Arabidopsis thaliana for example so we have nine different proteomes three are non-forming microeases and six are forming microeases and we used a so-called yearanuit method which is a hierarchical clustering method to cluster all proteins of these proteins and why do we do this it's kind of a cluster W for clustering protein if you like you need to give a pseudo tree to the to the program so that it can cluster pairwise each of the of the genomes and group them together and forming those clusters identifying clusters that corresponds to potential autobox for the proteins and you see that in fact out of these clusters we separated the clusters in three different tasks we call that tasks so tasks nine which contains one copy of each of the nine genome okay one gene that is found clusters containing nine proteins one of each of the nine genomes and so these are the we call them housekeeping genes they are found in all the of the genomes and we had two other tasks task three which is only the clusters that contains at least three genomes of the microeases but zero of the non-arbuscular microeases so these are clusters where we know that these proteins are conserved in the plants that form microeases and it's their absent in the plants that do not form microeases and same thing for task four but here we take we asked for at least four of these proteins among the six genomes why did we separate the four and the three it's because with three we get too many clusters to be analyzed for the gene expression analysis so that's why we restricted a little bit more with task four but then what we did is try to see using the cluster we built multiple sequence alignments of the proteins here for each cluster and then we use cyblasts to identify within the total ensemble plant proteins to see if we can find a certain score and we separated the E values of the housekeeping genes and the E values of the harbuscular micro result genes okay we consider those ones are really signatures for for this potential signatures and then we by doing this comparison we hope to extract the genes that are really statistically significantly different on the task for we had a look at gene expression from a huge number of experiments in the roots in induction of harbuscular microeases and the database there is a huge number of experiments that are documented this database so it's really interesting to extract this information for the genes that we identified here in part of subset of this interesting cluster here and by looking at this expression gene expression in different tissue in different conditions we extracted using band diagrams to see what are the corresponding the overlapping genes in different processes we extracted the certain number of genes that were then analyzed for example for their upstream 2kb sequences to try to find unserved sequences in the promoter that could be this signature for a transcription factor for example so this has been done all with data that that was already in the data so the good thing is that we were able to re-identify proteins that were known to be involved in microization like Ram 1, PT4 and others and proteins that were not that are not part of microization like cycle D6 or RPL5 are found in the red part so we here in the green part of the graph are really the proteins that are found in the dichotes harbuscular micro result plants versus in red the dichotes non harbuscular microeases and here we try to see if in the monocots harbuscular micro result we could also distinguish from the dichotes non micro result proteins and it's a little bit more difficult to identify since the two peaks overlap slightly a bit more than here so here we have series of proteins that are not known to be part of the harbuscular micro reason and it would be interesting to look at those and so we continued with so with the I mentioned before this expression analysis and with the expression analysis we were able to identify a certain number of proteins in the different conditions that that are signatures of harbuscular microease or at least could be signature of harbuscular microease they are going to be testing this hypothesis and from this list of proteins we identified signatures in the upstream part of the promoter in the promoter region and it looks really nice when you look at these signatures here there are very good examples of DNA sequences that are signature of known genes already and we identified these signatures again and this is this was obtained in the lab and also confirmed by our analysis that's now another project a recent project on metagenomics of plant leaves so the idea here is that on some Arabidopsis mutants that have decreased cuticle permeability so the cuticle is the surface of the plant leaves and when this cuticle is modified that is more permeable it becomes more resistance to a botrytis which is a fungi so it's kind of counterintuitive the cuticle is more permeable but it's the plant becomes resistant to the fungi so the question is is there any difference in the bacterial community that leaves on the plants on the wild type leaves or on the mutant leaves so it means in a different manner does the fact that the cuticle is more permeable will modify the community of proteins that leaves on the plant and so what we did is we washed the plant leaves and we extracted the bacterial community by analyzing the 16s RNA and it's done by amplifying with specific primers hyper variable regions of this 16s RNA using mysic sequencing two times 300 base pair and if the two primers are well designed it's 300 base pair with slightly overlap it should give you nice sequences we obtain sequences around 420 base pairs of good quality and we could use that using the module package to extract interesting information about the community so the design of the experiment is the following we have three plant replicates of each of the white type and two mutants blacks and BDG and of each of the three replicates we had five leaves for each white type and mutants so it means 15 samples per mutants or per wild type plant we also had control sample from the soil to see if there is a difference in the soil in the community of bacteria you find in the soil I don't know what if this are the five first or if it's already be grown plants it's probably very short because time frame of the experiment is quite short so I don't believe it's really old plants so it's probably the first leaves but I need to check so this is a sorry we don't see very well the green is really bad yeah okay so this is the principal component analysis of all these samples after attributing the organ is actually the unit to the sequences so we identified what are the bacteria that are counted how many of each we found and using these counts we did a PCA analysis and you can see that in green here unfortunately you don't see the there was a circle here around the green and the green is the white type plant and you have the red and the blue which are the two mutants you can see the circles of the red and the blue they overlap completely so it's impossible to distinguish the red and the blue mutants but the white type green separates more or less relatively well from the mutants still there are a few few samples that overlap with the mutants but most of the wild type samples are separated also interesting are the three soiled samples that are in the center of the graph and you see that these three soiled samples here three soiled samples here these three soiled samples are overlapping with the white type so there seems to be a difference between the wild types and the and the mutants but the wild type and the soil are very similar if we look now at the distribution of the different genera of the bacteria here and we see that on this heat map you can you actually separate the mutants blue and red here easily from the green one and the soil okay the soil are very different from from the rest especially in the diversity the soil is much more diverse in different types of bacteria you find compared to what you find on the plant's leaf surface but still the wild type is quite different from the two mutants the two mutants themselves are not distinguishable like the PCA analysis the only thing you see here in addition to that is that the first replicates replicates the plant A in both mutants is very different from very different slightly different from plants B and C they still all cluster together but you can see that there is a branch here so the first replicate was not treated the same and so that's I discussed with the lab people and they told me yes it's true plant A was not treated the same but I don't know what they did okay so we see a slight difference but still the two mutants cluster well together and we can see that the diversity is pretty different from what you see in the soil or in the wild type plant we of course try to isolate the bacteria that are on the leaves of the mutants and we isolated two interesting bacteria that were sequenced here in Lausanne on the packed bio- machine and we got very recently last week by the way two full genomes completely assembled after one small sequencing so it's pretty nice thank you packed bio for that and so no need to struggle to finish the assembly it's already finished okay so now we are working on this bacteria try to see what they contain what are differences try to identify the species because it is not so easy to identify the species complex this is another example very also interesting example so it's a cheap second analysis yes yes of course they did a lot of analysis beforehand but they wanted in addition to these analysis add something about the bacteria I don't know in detail right yes well they in fact the two so the monas that we identified one enhances the resistance and not the other and so we are interest interested to see what is the difference and that's why we selected this but so it's clearly linked to the to the bacteria so not any bacteria can enhance the resistance that that's the current hypothesis okay so this other project is chips like an RNA-seq analysis of the gene lead 418 and this gene is part of a complex called a chromatin remodel a complex nerd and this chromatin remodel complex modifies the methylation and acetylation of some histones in some places and it will switch on or switch off some programs during the development of cna so the idea was to to see if this gene where this gene binds the DNA it seems that it binds the DNA either itself or in the complex and so using an antibody against lead 418 two tissues were analyzed the intestinal tissue in hypodermal tissues and they displayed a shift at the TSS so maybe you can not see very well but still you see that the green peak here is slightly shifted compared to the yellow and purple peak or pink peak here and this this is the difference in the intestinal tissue the peak of lead 418 is slightly after the TSS whereas in other tissues it's before the TSS and so this is interesting for the people in the lab they think that this is the the sign that that 418 has some activity in shifting the histone modifications and we compared known histone modifications and we see that there is a it's not shown here on the slide but we can see exclusive presence of some histone modifications where we have the peak of lead 418 compared to regions where we don't so it seems that it's mutually exclusive some made some metallic mutilated histones compared to lead 418 okay so now some problems that appears in the core facility sometimes you end up with problems and you need to find the solution this is a another C. elegans story it's an analysis and we try to identify differentially expressed exons using text-seq package and using this package in a mock-to-knockout we saw that the mock-to gene in fact is expressed in the knockout okay so you see that one exon is what was was destroyed in the knockout compared to the white type in blue here but the other exon is expressed so what happens in fact if you map the read onto the region preparing the knockout here at the top and the white type at the bottom you see that the knockout region so the region that is removed from the genome contains the first exon so clearly it explains why the first exon is absent in this analysis here but the second exon should be absent too since you remove this part here this exon should not be transcribed but in reality it is because it forms a transfusion splicing variant with the preceding gene here so by removing the DNA in this region here we force this gene to transcribe with this exon here yes it's probably not functional but is it completely not functional we don't know so that's an example of cases where you look at something that should be absent and well it's still expressed at a high level and so we discovered this transfusion of the two genes here another problem-shooting case here is a recent one on the Cheapskate experiment in mouse for the gene RBPJ it's a transcription factor or that's not a transcription factor transcription modifier like that that's really a transcription factor but what you would expect with this gene is to see a peak before the gene notch here in the promoter of gene notch and in fact compared to the input DNA here you see a small peak so we were happy with the small peak but we saw that there is a big bunch of peaks in the middle of the gene here these orange peaks are completely unexpected because they are in the protein region by zooming in this region we discovered that these peaks in fact cover exactly the exons of this of this gene and this was really puzzling because this part of notch one is called the NICD domain and this NICD domain binds to RBPJ so we were this is a nature paper yes we have it but in fact after discussing and analyzing a bit more well could be a plasmid contamination okay it seems that in the lab they have a plasmid that contains exactly this NICD domain to express it in cells and in fact they have from where we don't know but they have a contamination in their liquid somewhere and when they extracted the brain of the mouse to do the chip-seq analysis they contaminated the DNA and the chip-seq with some plasmids containing this part of the gene so this is a bad luck experiment or faulty experiment let's say but it I mean troubleshooting this kind of things it's quite difficult you don't know exactly what could happen okay let's speak about teaching and training this is part also of my duty the co-facility organized workshops and so we organized in 2014 for workshops and two of them were organized thanks to my IT for the high-performance computing workshop and thanks to the VCF facility micro-ray and RNA-seq workshop the two others were organized we also organize the star mix force in June last year genome and transcriptome assembly so there was also a collaboration with other c-groups because there were people from there in Basel giving this course even from Bosat it was really interesting was here we had free genomics day a bit less successful because of a bad choice of dates that's my fault I choose the dates that is just between the holiday first day holiday and the weekend and of course many people eat the bridge so unfortunately this was a bad choice and then we also organize a course sequence a genome that is somehow a little bit modeled on the course that is given here in the master genomics here and instead of sequencing all new genome all new bacterial genome we sequence yeast mutants and I will detail that in more in 2015 we already had a course in February on the methylation and methanol analysis again with people from Los Angeles and also from Basel Michael Sander and Christophe Schmidt that came to give this course in people it was very successful we are planning a pathway studio workshop to help people in people that are struggling to identify pathways and analyze their data and since we have now a license for this year in free for this pathway studio we want to have more people and we have already a course workshops planned for the end of the year in 2016 we will organize the free free genomics day again because it's now in 2015 it will happen in Bern on the 5th of June so if you plan to listen to a good workshop you can go to Bern the trees of June there will be the beef region in the mix then we will organize in every second year in Bern and once in free so let's go a bit more in detail with this teaching activity here so the idea here is collaboration with Professor Kessler is to look for supressor mutation of a gene and the mutant of gene nsa1 that is involved in 16S ribosome biogenesis and so he isolated in his lab 44 candidate clones while he has 200 of those he selected 44 of his clones that we sequenced using high-seq Illumina pair dense 200 times 100 base pair and five students worked in analyzing each of the students will analyze nine candidates so that's quite a lot of work because you have nine different samples to analyze to map on the reference genome identify the mutation sleep and small adults and then to annotate these snips with factor then what we do with it is of course to remove to filter out all the common mutations among the clones because these are mutations that are of course not found in the reference genome we don't have the reference genome of the original this mutant here so we had to subtract the normal mutations you find between the reference genome and our genomes and then we kept only the homo zygote mutants because it's our yeast strains and the remaining mutants were analyzed manually looking at IGD and also extracting information on the genes that were touched by unique products and it was brought by the way we were very happy because we found one gene for the aspirin yeast that is a group one in nucleotide binding protein subunit that are like protein okay and from Swiss Broadway notice that it's located the 46 40s so this was a good candidate and so we had a look at this at this gene but we are very disappointed because all the mutations are in fact not in the coding region but in an intro between the two exons of the gene see these are the mutations here in the light blue and so well that could be inside this intro in fact it serves no RNA so there is a snow RNA exactly encoded in this intro and all the mutations affect this no RNA you can see they are in the red here see all the mutations here and some even our divisions like here and so we found five mutants with the deletion and some mutants with punctual sneak mutations here and you see that they affect regions important for the snow RNA the C box and the D box and this and also the D box here and this is a CD box snow RNA and this CD box no RNA is very important because it helps the maturation of the 25s ribosomal RNA a component of the 60s particle and this is the complex that's what if I is the 25s ribosomal RNA with the snow RNA you see here and other proteins not one not 58 not 56 and 13 and incidentally we also found in these four so we are quite happy because we found mutations in proteins and also in the snow RNA involved in the maturation of the 25s ribosomal RNA so this closed loop and we have now an explanation why we found these mutants so when you look for mutants it's always good not to look only at the coding region of the protein but sometimes at the non coding part of the protein okay final story it's an example of research this is a tool we developed together with Alexi Dutcher currently in my group and it's a tool to help in the analysis of DNA methylation bacteria that you get from Pacific Bioscience sequencing basically when you sequence a bacteria with pack bio you not only get the sequences but you also get information about the potential modifications of the DNA and we developed a web interface or a tool that generates nice images using circles and the web tool is hosted by and this currently works fine so you can submit your first a file of your genome you submit the motif the GFF file you get from Pacific Bioscience and the tool will identify the four most important motifs and display them in a graphic output like this we have three different outputs one where we have both strands that are viewed separately when with both strands viewed together and one where we see the motifs that are methylated versus the motifs that are not methylated this is a bigger view of this and what you see here are the contigs you get from the the assembly so not every time you get a single contig here we got 12 contigs for the full assembly and we identified four motifs that or pack bio identified four motifs that are important for and methylated and we have the sliding window that goes around and counts how many motifs are found methylated and if we pass a certain threshold we get a brighter color here the threshold is this lighter red or lighter green area in the circle here and you see that some regions are I mean some motifs are very heavily methylated like GATC you see it's methylated all around with some regions like here for example which is a little bit more methylated than normal so it could be yes the origin of replication for example could be all other things and the other motifs here are probably motifs that are part of a restriction methylation system here and the last one motif here CATG is another interesting motif but it's less much less methylated in the genome however some regions are much more methylated than others and if you look at this region here it seems to be a toxin operable so that's an interesting region why is this toxin operable more methylated than others good question and here I also show some regions that show slightly less methylation and these regions in orange here are it's not really easily visible but maybe you can see here a little bit better this region hides phage so you have a phage inserted in the genome at four different places here these are regions where you have this lower methylation at that place could be a mean that there are biological explanation why the phages are less methylated than others so that's our tool and we hope that if you are packed by your data you will use it remember the link you in the UniFR.ch bug-free Pac-Man and that's the name of this Pacific bioscience methylation analyzer Pac-Man. Okay that was all for today so that's summary we have very diverse activity in the small core facility it's like we must be extremely flexible and it's as a consequence it's very difficult to build pipelines because we don't have 10 different groups that do exactly the same so it's too much time to invest to design a pipeline just for one experiment that will change the next week okay so and I understand that some people are in the core facilities are building pipelines I know it's first with Jacques but they have a throughput that is much bigger than that mine and they have very similar experiments and they can really build upon this pipeline and leverage on this pipeline so investing time to build that is quite a lot and of course training user to be independent is the most successful option for me so I try to give as much training as possible so that the people can do most of the analysis themselves because then they don't have to pay and they don't bother me to manage. Okay oh one more thing yeah we identified a new zebra fish mutant here see it has a big thing at the mouth here strange thing here what is it well stress April is a good day thanks for your attention and if you want to see what is this robot it's a robot fish that you can put in your bag okay ha ha ha ha job's here