 Felly gydaeth gwaith rhywbeth gyda'r clywed o'r 1000 o'r ddaeth genem y projött. Mae'n ffordd rydyn ni i'n mynd i ddod yn meddwl am y cyhoeddfeydd. Mae'n gweithio'r ddweud i ddweud o'r ddweud o'r ffordd yr hyn yn cynghorodheidio mewn cyfnodol ac mae'n ddweud o'r byn cyhoedd ddweud o'r ddweud o'r byd ac mae'n ddweud o'r byd o'r byd o'r ffer o'r ddweud. You get a sense that it is one of the big things happening in human genetics, human genomics, at the moment and the idea of this tutorial session is to try and give you a much closer sense of what the project data looks like, how you can get at it, how you can learn about particular regions that you as researchers might be interested in, how you can actually use these data in your own studies hopefully to find out new interesting things about diseases or whatever process you're interested in. So it's worth sort of, you know, just trying to reiterate this sense of excitement. It's ten years since the first draft genome was generated and we're now at the stage, you know, the publication that is in nature last week is of the pilot study, that's 180 or so genomes, but as a project we're already up to essentially the thousand genome mark in terms of data generation and over the next few months you'll see that coming through. So these data are really coming through and what we hope this project will do and I think we're already seeing that it is doing is essentially being a transformative project. It's going to change the way people can do human genetics and human genomics. So just a couple of things about what the project is. It is a big thing, you see there are lots of us here and in fact the whole conference has kind of peppered with people involved in the project, there's probably one of them sitting next to you now. So it's a big international project and the idea is it is trying to make the baseline reference set study of human genomic variation that will provide the foundation for human genetics in the next few years, maybe five, maybe ten years, that kind of scale. So it's a consortium with many platforms, that's many of these massively parallel sequencing technologies, the high throughput systems, lots of different centres, lots of different peoples, lots of different research groups both on everything from the informatics group to the disease association groups and of course lots and lots of funders all cheerfully putting money in to support this fantastic operation. And so it's going to, what we hope is that it will provide a resource to support genome wide association studies, the GWAS in the abbreviation in many different populations, not just one but hopefully across the world pretty much. We set ourselves some quite specific quantitative goals, we decided to try and be hard about this so we set out to find at least 95% of all the SNPs in the accessible human genome, I think probably more on that later, at a frequency of 1% in the population as defined somehow and then in genomic regions to try and push further down into the rare variants going down towards the 0.1%. Importantly it's a project that doesn't just aim to find the SNPs but also to find all types of genetic variants and we believe they're all important, potentially important in disease and so we want to find both the shorter insertion and deletion polymorphisms and the larger more complicated structural variants that can have quite dramatic effects on genes. So at one level it's a catalogue variation but it's also, it is trying to generate a thousand or in probably ultimately about two and a half thousand individual genomes so we want to provide genotypes, the set of variants carried by these two thousand people and we want to put those on particular haplotype backgrounds, we want to say this variant goes with that variant in this particular place and as Jeff will describe later that's going to be a very important technique of sort of resourcing for one use of these data. Providing the data is another one thing but very importantly to us we decided that we wanted to make the data from individuals for which we had cell lines available so that means that other people can go and take the same samples and either do genetic studies or to look at molecular phenotypes, gene expression, chromatin changes and so on and in doing so this notion of a central reference data set that lots of different groups can kind of converge towards I think will be very powerful and finally we decided that we'd be a totally open project that is basically everyone else sees the data at the same time that we do and obviously we're processing it and providing summaries of it that were much more digestible and our goal is to do that publicly and quickly with these frequent releases. Just to give you a little heads up of the way in which I think people will use the data in medical genetics first up again to use the data after a technique called imputation to essentially do genotyping for free in existing genome wide association studies so I think that's one thing that people are already doing and it's actually providing resolution and in some cases new regions of association in disease genetics but there are other uses as well you might simply be interested in finding out all the sort of polymorphic variants in this particular region of a genome and the projects good for that it will give you a catalogue of the variants we found and some notion of their allele frequencies so you might choose some to prioritise for later work and then a third and very useful technique is to take the variants that we found and essentially use that as a screen for variants in subjects that you might be interested in perhaps you study a particular Mendelian disease or a particular cluster of individuals with a super phenotype of a particular disease and you want to find the things that are really novel or unique to this set of individuals and the thousand genomes gives you a sort of a overview of normal variation against which to make that screen. Just one very quick word about imputation it's this technique that I mentioned there, the idea that you can essentially do genotyping for free the idea is that you take the thousand genomes data some data that you've already collected say from genome wide association studies and then you use statistical techniques that Jeff will talk about to essentially do the filling in process. It can be very powerful and certainly when we get to the thousand genome samples it will massively reduce the cost of genome wide surveys because you won't have to do the sequencing yourself necessarily. So that's kind of the future that where we are exactly at the moment is essentially very we have very mature releases of data on three pilot projects at one looking at some trios one looking at low coverage samples and one's focusing on some exons 180 of these low coverage genomes but there about 700 of these exons are different coverage and we've learnt different things from each of those. Going forward about two and a half thousand samples in the full project spread across the world but not randomly from across the world. The idea is to collect them in regions of major medical genetics interest in sort of in little satellite groups of populations so essentially the idea is to get five lots of a hundred samples from particular geographic regions such as Europe or East Asia and so on. So as I said I think we have data on over a thousand samples available to us as a project at the moment that's not in a digested form for you yet but Gonzalo this morning announced that we have a release currently on 600 samples as of about today and that slide gives you a sense of where we expect to be so another 600 samples being sequenced by the end of by mid next year and then a further 800 towards the end of it that year in the beginning of the next and in addition to the sequence data across the whole genome we're going to collect XM sequence for those as well and importantly other types of of information about genetics snip genotyping on dense 2.5 5 million 10 million snip arrays additional array cgh to get better resolution and genotyping for copy number variants and structural variants so the idea is to try and integrate all these different sorts of information to provide this very high quality set of two 4,000 sorry two to two and a half thousand genomes. Okay that's enough from me let's hear from the people who who are going to give you much more detailed insight into some of these first up we're going to have Gabel talking in a lot more detail about what the thousand genomes data look like Stephen it is then going to tell you about how you get your hands on the on the data kind of important Paul's going to tell you actually about a browser that we have and that has been he and his team have been developing which allows you to look at a lot of the data in in the context of genome annotations pull out bits of it and so really explore the data without having to get all the data down onto your computer. Jan's going to tell you about structural variants is a particular class of variant that a lot of people are interested in and then Jeff is going to finish up by telling you about how the data can actually be used in medical genetics and it and if there are people standing at the end there'll be some questions and answers. On that note however there will be some time for questions at the end of each of these sessions and if people have questions things that they want cleared up during a talk then please stand up and shout or use one of the microphones if there's something you don't understand don't just sit there thinking I don't understand this just shout because there'll be other people in the room who also want to know the answer and the idea of this is to have a tutorial session okay it's not just a series of lectures finally I must ask you to please put your phones on silent and that includes everyone up here okay thank you very much and over to Gabor