 Excellent. Ladies and gentlemen, it's been a great pleasure to introduce the last speaker of what has been a fabulous day at Genetic Genealogy Island 2015. And once again, we have John Cleary. And John is going to talk to us in much greater detail today about SNP markers. And the title of this talk is Serving the SNP Tsunami, Next Generation Sequencing Testing for the Genealogist. Now John is a lecturer, teaches at the University of Edinburgh. He's a member of ISOC, the International Society of Genetic Genealogy, which we should all be members of. And he is involved in the project for searching the fate of the Scottish prisoners, as we heard earlier on. He's also done some incredible work using SNP markers to define the branching pattern within several of his family trees and within the DNA project. And he's also going to give us some very interesting data on how we can use the SNP markers to actually date when that branching pattern occurred. So please give a warm welcome again to John Cleary. So thank you very much for coming for the second time to hear me today, those who are only sitting through me for the second time. It's a pleasure indeed to be back here to talk to you about this. And I think this talk will be a little bit slightly more advanced, maybe a bit more intermediate advanced than the one I gave this morning. Can I just ask you how many of you have taken a SNP test of some kind? So quite a few of you. And in general, you'd be quite happy if we just talk about SNPs and STRs, HAPA groups. I don't need to do definitions of these things. I have a few slides with definitions on, but I think trying to save time or just rattle through. So, I mean, it's very, very quickly as we're aware, the SNP is a point of mutation. So here we have the stream of the chain of bases and the DNA. And when one of these bases changes to another base, we then have a SNP, a SNP being a single nucleotide polymorphism, polymorphism being changed in the structure of the DNA. And what's generally believed about SNPs is that they are permanent. That's not necessarily true, certainly not in all cases, but they are long-term markers. So once a SNP occurs, it is very unlikely to change back again, which means that you will track that SNP through the descendants of the first man who had that mutation. Of course, SNPs occur all the way across the genome on every chromosome, but are focusing today on the Y chromosome, and therefore, again, the inheritance of the Y chromosome with father to son, which also, we believe, tracks the path of inheritance of the surname from father to son down through the inheritance lines. SNPs tended to be associated with what's called often known as deep ancestry, because these have been occurring right back through human history right back to the very beginning, which means you can identify descent groups which are thousands, tens of thousands, even hundreds of thousands of years old. And of course what's interesting today is that as SNP testing becomes more inexpensive, it's becoming within the reach of the genealogist who wants to do more recreational investigation of their genome, and therefore we're finding ways to use these for assisting genealogical research. So a few quick definitions here, which I won't go through, because I think you're all familiar with haplogroups and subclades, but essentially haplogroups are the higher level groupings marked by letters, where subclades will be any group marked by a common marker below that, and all of these of course are marked by SNPs. And so anybody who has a SNP will inherit that at some stage when ancestor up through the male line, from any time from your father right back to what's known as the Y chromosome atom. And there are a number of ways of testing SNPs, and this summer we've seen a huge increase in one of those, which is the second one here, the growth in panels of SNPs. So previous to that there were individual SNP tests, which can still be done at companies like Family Tree DNA and the Y-Seq, and they're very inexpensive. They can be as little as $20, as much as $40 per SNP. But the panels of SNPs are now putting together huge collections of these, which are usually related to a particular subclade. So if you wish to investigate whether, what your subclade is, or whether you belong in a particular subclade, or which subclade of a subclade you belong in, then you can order a panel of SNPs for anything from $90 to $120, again from the same two companies, Family Tree DNA and Y-Seq. I'm not going to talk about that today. It's a new development, there's been some new tests released by both companies in the past few months, and they're proving very popular, because they are quite cheap. But they're very different from the next test here, which I am concerned with, which is next generation sequencing. This being a method to try and read whole sequences of the genome. At the moment, those parts of the genome which have been read by these tests, taken by genealogists are still rather small. People keep talking in these meetings about a future coming when we'll have our entire genome read for a relatively small cost. But at the moment, we're still talking about reading a stretch of the Y-chromosome, and not the whole Y-chromosome, just parts of it. But the parts that are read are reading their entirety. So you'll find every mutation, every difference from the reference sequence that you have on those sequences will be read. And this is very different from SNP panels. In the SNP panels, investigate SNPs which are unknown already, which have been found, and have been tested previously and put onto the collection. Whereas the next generation sequencing will find whatever is on your genome, or the part of the genome that's being read. And therefore it can discover SNPs which are unknown. It can discover SNPs that you may share with lots of people. It may discover one that you had in your generation that even your father wouldn't have. You'll always know of course how old or how recent these SNPs are, but it will find all of them on the stretch that's been read. Why do this? Why test Y-SNPs? Well, a number of reasons they can help to identify people who share a recent common ancestor. So again, as I said, there's a lot of usage of SNP testing for deep ancestry. Increasingly, we can apply this to hypotheses we may have about historical ancestors as SNP testing is now beginning to reach the historical period. And there are already some SNPs which can be said to be markers of particular family names and particular branches of families. And that probably is the goal of most people involved in this kind of SNP testing, to bring it into historical era when they can be correlated against known people with known surnames. And can then be used as predictors of others who may be part of that family but may have different surnames because they've visited of NPEs and other reasons like that. It's important to say that they are still not very cheap, relative to other DNA tests, but it can be argued they bring great value for money because they bring all the SNPs and stretch being read. They'll discover new SNPs that are not known, cannot be found by any other method. And they also bring along a package of STRs for those who haven't tested up to the maximum level of STRs. So, a very common quote is being passed around the field at the moment. And I think it was first said by Mike Walsh, though I'm feeling you would deny that he actually said it. And the idea of SNPs being the trunk and the branches of the tree while the STRs are the leaves of the tree. So, in the future, STRs will remain important. SNPs, at the moment, at least from the big Y, don't quite have a discrimination. We need to tease apart fine branches of family within the last couple of hundred years. So, the power that STRs can add to a SNP interpretation will be to work out those fine branches once identified by SNP that are shared by the branch. So, both of these will remain vitally important. Here's a couple of examples of SNPs which can be shown to be family identifying SNPs in the sense that all the people so far found with them share a family name. And these graphics are taken from Alex Williamson's Big Tree which I'll say a few words about later on. But here we can see a group of McFarlane's here and there's a very well researched family, a very active surname project. And each of these little blocks here is indicating SNPs which identify particular branches of McFarlane's. But above here, we see these two SNPs which appear to be BY674, BY675 which are shared by all of these McFarlane's and so far only by these McFarlane's. Of course, other people may come along eventually with a different surname having one of these two SNPs. The chances are they might be a non-parental event if that's the case. But so far we can call this a McFarlane's SNP. Interestingly, they're shared at a higher level at an older level, this block of SNPs here. And these all seem to be identifying members of the Black family. So here we have another family identifying SNP which is clearly related to this McFarlane's SNP. And I suppose people who are from the McFarlane clan or CEPT or FEMA will be able to tell us what the relationship may exist between Blacks and McFarlane's. It seems to be something here. But what interests me is that what we have here are SNPs identifying families. The Maxwell family. And this is from the, again, taken from the FGC 594 page. Again, of Alex Williamson's Big Tree. And I'm actually on this page with a bit further along. So I'm very, very distantly related to these Maxwell's very distantly. And what's interesting here is we do have one person who looks like a possible NPE. But clearly these we can say are Maxwell SNPs. But here I think we see something which is illiterate of one of the problems with this kind of research. You want to be very good at identifying certain SNPs as family identifiers. With a great long chain of them here, that actually goes quite a long way. There's about 20 or more SNPs in this block which have not been divided, shared so far only by the Maxwell's and Le Mans. And this will go back a long way. This will go back thousands of years because it takes that long for all these SNPs to appear. Which means we're going back before surnames began. So you have to be careful I think in identifying certain SNPs too readily as being identifying a particular surname. It's very likely I would think that other people eventually would test with other names who would share some of these. I'm sure the Maxwell's and Le Mans would want that because that would split the block and bring more structure here. But so far again it's reasonable to say as things stand now these look like SNPs identifying the Maxwell family. And there's a chain of six here which will probably take us back about 600 years or so at a rough gas. So next generation sequencing then is a method for compiling the sequence of a stretch of the genome. Essentially the genome is smashed up into bits which are 100 base pairs long. And then these are reassembled by a clever software into mapped against the reference sequence and then assembled into a read of the genome sequence for the person to be tested. And the advantage is it's fast and cheap and just to enhance a little bit more and have firms instill it here from my own raw data from my own big white test in which you can see the reads here and this is actually showing a snip here because it's telling us there is a mutation from G in the reference sequence to A in my data here. But what I want to do here is just pull out slightly to show you the reads. Each of these is a chunk of my white chromosome which is 100 base pairs in length which is in red and then reassembled and you see how there are lots of overlapping segments here that's how the assemble is done so that eventually my own genome can be reconstructed. The advantage here being that every read in this position is showing the mutation so it would be quite confident that this really is here. And I'll pull out again. You can now see all these 100 base pair reads stacking up. You can see certain positions are read a lot. Some positions hardly at all and in one read here maybe three or four reads in this position here and of course some stretches are not read at all and that's the thing about the big Y test. It's very quite hit or miss in terms of what's being read. You don't get your whole Y chromosome. You don't even get the whole of the so-called readable Y and some people feel that maybe it is advantage there's a discussion going online in a moment about how another test offered by another company which covers the whole of the readable Y is of superior quality and that is probably true but the advantage of this is the cost is less. So you can make a decision how much of the Y chromosome do you need to be read to get the kind of data you may need to begin to build structure into your tree and you don't necessarily need to have the whole thing read particularly if cost is an issue. So here is a comparison of the two big tests the big Y family tree DNA and the Y elite test of the full genomes corporation this is drawing upon data published in the ISOG wiki very useful source information and very recently the full genomes corporation have introduced a new variant of the test with longer read lengths so they're reading longer chunks of the DNA one would think that would lead to more liability in the decision I don't know enough about the technology here to know exactly what difference it makes but I suppose it is a question can be asked now since the big Y test itself about two years old I wonder if there are similar moves on this side to begin to match the read lengths of the full genome corporation but a key figure down here is the amount of the readable Y chromosome that is read Y elite aims to get about 90% of it and the big Y aims to get about 55% you might think that's a huge drop in this and without doubt there probably are key snips that are of interest to some researchers that are being missed by the big Y in fact the one I mentioned earlier called FGC 5494 which is a very important branch market is not read by the big Y at all I have that market but it's not read in my big Y test so I had to test elsewhere to find out whether or not I was positive for that market now I don't recognise this as a problem because I think that when genetic genealogists want to conduct the kind of research project I was talking about this morning cost matters and how you distribute the money you may have towards test also matters and my own view is the big Y is giving us enough to go on to make it worthwhile thinking about doing two big Y tests on two people who may be related rather than a single full genome test on just one person for similar amounts of money but other people may have different views about that the Y chromosome is divided into a number of different regions and some of which are more likely to furnish useful SNPs than others and the big Y test family true DNA say is designed to target those regions which are more useful in providing SNPs for family research it doesn't mean they get them all but the test is designed to target regions that have a lot of known SNPs that have been discovered beforehand and where it can be assumed because of the greater density of known SNPs there may be greater density of new SNPs and possibly greater density of reliable new SNPs as well so I have a little diagram here of the regions of the Y chromosome just to give you some idea what we're talking about here there's drawn a classic article who first mapped the regions of the Y chromosome and straight away you see that about half of the Y chromosome can't be read at all because it is too repetitive when we talk about the readable Y chromosome we're not talking about this or this part here the centromere or the two tips which recombine with the X chromosome so these white areas here would be your readable Y chromosome and these also divide into a number of regions known as the pseudo-autosomal section actually that's the tip at the end here recombines that's the repetitive area and here then are the regions of the readable Y again colour-coded here to the X degenerate section the yellow the atomic section region and the X transposed regions and let's see if you do better than the people in Birmingham I want to put the same question then in spring which of these regions do you think might be the one most targeted or better target for Y SNPs go on tell us Morris I'm sure many know the answer it is in fact the X degenerate region and again to simplify things greatly this would once have been similar to the X chromosome it doesn't recombine the X chromosome and it accumulates mutations over the millennia and therefore it is degenerated away from the state it was shared with the X chromosome and therefore that is where you'll find those mutations the SNPs that we're interested in the X transposed region is quite similar to the X chromosome it's therefore very difficult to be certain if you find a SNP on this stretch that it's actually on the Y chromosome and not on the X and the atomic regions contain some of the repetitive areas known as palindromes which actually have a lot of good SNPs in terms of the genetic information on the Y chromosome but because the palindromes have this habit of overwriting each other it means that SNPs you may find on one of the palindromes could disappear if they're overwritten by another stretch that didn't have that SNP and therefore these SNPs are not reliable in the sense that they may not track a family line all the way down in all cases and you may therefore get false negatives but because the SNP has been lost through this process so therefore the tendency from organisations like who analyse raw data for testers is to focus on SNPs in this area on the grounds they tend to be more reliable and more likely to persist down the generations and therefore be reliable markers of those family branches you're trying to find so how do we go about dealing with this data and how many have done the big white test and obviously quite a few of you perhaps you were dawned initially by the way in which results reported to you but there's a lot of data to deal with and when it comes from Family Tree DNA it's not necessarily presented in a way that's very easy to digest and we do need help I think it's one of these things we need to go to people who have an expert view and such people include hapler group project administrators who probably are the leading experts on their particular hapler group and the SNPs within it but also many skilled enthusiasts who developed software, developed processes for analysing the raw data files and pulling out the information that is of interest to the the tester and there's not something that is easy to do yourself but actually you can also do yourself and I'll look at how that can be done later so I think that if STR testing lent itself to the growth of spreadsheets and big cross tables comparing lots of numbers with each other I think SNP testing is leading us back towards something very familiar to the genealogist which is the tree diagram and again there are many different varieties of trees being designed by people involved in projects and I think Maurice is going to tell us tomorrow about trees and their uses in genealogy but this is one of the one of my favourite trees that's been designed by a man called Alex Williamson who's put together recalls the big tree and Gene was talking earlier about the P312 marker one of the key markers in western Europe so Alex is working on the P312 marker and its descendants and this is the top of his tree in which we can see key next level markers including L21 which will of course be one of the most common markers in Ireland and other markers along here and he's building a descend tree of every person who takes a next generation sequencing test and who submits his raw data to Alex so what we're seeing essentially is something like an embryo tree great tree of the Y chromosome potentially just to give on to the next page this is the FGC 5494 page now one of the things you're going to grapple with when you come into the world of snip testing are these awful names every single one is named with some utterly unmemorable combination of letters and numbers and in the end what most people do is remember their own and the ones are of immediate relevance to them and the top level ones but forget the rest and I think the best thing to say is make catalogue numbers means of looking up in a list or an index to find out what particular snippet is, where it is located and so on so here we have the one that marks the haplogroup that I'm in there's a haplogroup underneath L21 it didn't feature in the the Munster Irish talk yesterday even though my family is from Munster though they may not have come from Munster before then and we don't know as you see along the bottom here are all the people who tested positive for some line of this and you'll see the flags will tell us where they're from, there's quite a few Irish flags here actually, quite a few English ones over here somewhere around here should be the Maxwell's again some English flags and right over here is some fellow called Cleary Martin Red Martin Red because a very nice function that Alex Williamson has designed recently is to overlay STR data on top of this tree by color coding particular markers so I've picked out one here that is a very rare marker that I have I'm the only person around here who has this, I share this with my Gullman genetic relatives who have not tested big white yet and over here we've got someone called Mr Brunei who also has the same marker and we haven't got a common origin for this this is two separate occurrences of this STR appearing because all the people in between this do not have it so down here then we have the color coding and we can see what the normal reading is which is a value of 13 for this particular marker and a few people here have got 14 I think and then two have 11 so again this is a very nice way in which we'll see where the tree being developed is to capture the SNP data and I think this is in some ways more intuitively easier to read than the big cross-tables you find with STR data in the sense that once you find out what line you're tracking you can track it right the way down and Alex Wilkinson is also adding once you get the person at the bottom whoops pushing too hard you get personal data about their private SNPs so on the tree here you see all these SNPs that I share with one other test that's not a big why test unfortunately this is someone who is probably no longer alive and was part of the 1000 genomes project and I do not know anything about him apart from the fact that he is from Barbados but we clearly share a very deep ancestry from thousands of years ago the SNPs we share with each other and then when you click on my name you get all the SNPs Alex Wilkinson has found in my raw data which are private only to me now it's only private to me because nobody else close related to me has tested yet if someone were to test it was closely related to me then many of these were found to be shared and that would use a bit more of complexity into the big tree but I think the big tree is tremendous is the possibility of this tree eventually extended all the way back this could ultimately be the great tree of the Y chromosome descent and a similar tree could of course be constructed for MT DNA so this ultimately could capture everybody I don't know if modern pusing has the processing power to do that but certainly within P312 the huge P312 subclade is capturing anybody who will do an NGS test and at least anybody who does that will see where they are located in relation to all other testers and you will see there is a big difference between here where again like me there is probably no one close related who has tested and here where people have tested several people and they are getting SNPs here into the historical period so how do you go about analysing your raw data I think it is definitely well worth sending your raw data and there is a BAM file if you do the big Y test to one of the analysed companies who will do a third party analysis of your data for you they will pull out the SNPs they find they will give you a full report on those SNPs and on the quality of those SNPs and they will also give you a large pile of STRs which will include something like maybe 6 to 100 or so of the 111 panel offered by Family Tree DNA now I had about 101 of those 111 STRs in my test and I had done the 111 test already so I was able to compare the two side by side and they are all absolutely accurate so every single one agreed with the Family Tree DNA test that I had previously done I think we are not yet at the stage where we will have a single test for just SNPs and STRs but to me the potential is there and I think this must be coming at some stage of course it is still very useful for project administrators to see the STR test done by the potential big Y test before they do it because those STRs will give guidance as to where people may be related so there are two companies doing the third party analysis including the Phil Genomes Corp who of course do NGS tests and will supply a full report that they do but will also analyze the BAM files from Family Tree DNA test and a good example here of how you can use the YFUL graphical interface as a BAM file reader I think it is important that genealogists should have accessed their raw data and a means to read the raw data file they have and so YFUL make it possible by allowing you to browse your own test position by position seeing what result you have there showing a SNP which is called YP355 and this was discovered about 18 months ago and we now know it is a major SNP in the so called Young Scandinavian subclade I'll say a few words about that later on and it also tells us that we have a good quality SNP here as well so let's get through this quickly if you are a tester you can actually do a lot of the analysis yourself and with a colleague in the coming surname project Tim Cummings we have developed our own approach to taking the Family Tree DNA list of results and filtering out the chaff and finding the wheat that is useful to the people who are taking tests so if you get your results from Family Tree DNA you get something like this which you will see a long list of position numbers and the particular mutation you have and Family Tree DNA's own assessment of the quality very rough and ready assessment of quality however and there are other ways of finding raw data of people who match you so for example if someone matches your test so that these are matches to someone by the name of Kemp who is in the Kemp project I run we can find out again what their results are by looking at first of all which ones they share with the person who matches so these are matching to the Kemp SNPs they both share these and then which ones are unique to the tester which are not shared with the Kemp tester so this way you can put together a list of the FTDNA's unique positions from the Y-Test and then you go through a filtering process this is a spreadsheet which filters out the SNPs giving us first of all an SNP that's already been identified as being shared with other people and these here are all subcladed defining they're all actually in the YP355 subclade which we're investigating and these ones then are these SNPs which in this particular test are all positive up to here so this person has all of these upwards but then doesn't have these and in this case we're all curious you think that they don't have these ones so these ones here are SNPs we want to investigate further and find out why this person doesn't appear to be positive for them and you can also filter out the SNPs which are known to be not reliable so various reasons whether they're palindromic or repetitive areas or simply found to repeat many times in many hapler groups we can get rid of these and these are all listed as family tree DNA hits and most people will find they've got around 130 to 140 novel variants as they're called but what you want to do is get that list down to about 10 to 20 which will be the generally good quality one so the vast majority of these you will throw out because they've been found and are known to be not so once you've done that you can then do a consistency check on the ones you've found this is where we're using to further our research is very useful a very handy function called the group browser so anybody who shares a particular hapler group here, R1A can join this and you can then search any position and find out whether the tests are positive or negative for it here's YP355 again a very nice clear reader very clear grouping here who are all positive, that's a subclade and all the ones who don't have it who are not in that subclade and this in which you'll find it's a bit inconsistent, lots of unclear reads therefore this particular position is not going to be a high quality snip for whatever particular reason and we can see here that there is this shows that even the read for individual testers is uncertain so we would just reject this one and wouldn't take it any further so what we're doing is winnowing down the list of variants of those which seem to be reliable and consistent it's not to say that some of these are the ones that are not going to be useful at some stage in the future but at this stage they're inconsistent and they're not going to help develop clear branching trees like the one we saw earlier so very quickly then we'll case study here of how I came to do this I work on the Kemp project and I have relatives whose name is Kemp and we've found that a lot of other surnames closely match them and we began to investigate these surnames as well and we label this group the Jacks which stands for Jacobs, Anderson Cummings, Kemp and Small and all of these found those with the exception of Jacobs seem to have some association with the northern part of Ireland so this seems to be grouping with a connection with Ireland no obvious connection with Scotland despite the fact that these may look like plantation names we haven't found that the Jacobs don't know where they come from they're descended from one migrant to the USA from the 17th century they don't know where he came from and they'd like to so we found that this group has some distinctive STR markers and we also ran them through a few utilities like the McGee utility here and got some evidence of looking like connection here and through clustering network diagrams like Philip here and again look at this one here, we've got other R1A down here we've got the Scottish McDonald's family up here which are not too far away from us they're also within the L448 subclade and here then are the Kemp's Jacobs and Cummings rather jumbled together but there's a Jacobs branch here is a Kemp branch they look like they're fairly well defined family branches and the Cummings seems to be a bit more all over the place but it's got a long story short then we're going to expand this into a much wider analysis of the whole subclade of YP355 and a year ago I presented not this but this diagram to GGI 2014 which was an embryo of what's now turned into this and so we have a lot more structure in the tree we have about 31 big Y tests plus a number of single snip tests and some snip panel tests which we've been using to identify these branches and all the labels showing here are all shared snips that are held by at least two people or more they've all got names because YFUL very kindly named them for us but I don't think having names is particularly important and what's important is finding whether something is shared by two or more people and therefore must represent a pre-branching stage and so underneath each of these blocks we find branches and here is the the Kemp and Cummings over here and we have an estimate of dates and I'm not actually going to go too much into dating because it's very complicated and I think it's unclear as well we have very very rough estimates of what the dating of these splits may be we imagine from those diagrams I showed you earlier that Kemp Cummings etc had an origin round about the Dark Ages so it's slightly pre-surname but only just pre-surname in fact the number of snips we find since branching the Kemp's here here are Cummings and one or two other surnames suggest the branch may be more like the 8th century I'm not drawing any significance of that but for the date I think it simply means that's when the common ancestors that's lived and since then the two families have diverged while still being found in similar parts of Northern Ireland, Tyrone, Farnammer and Cavern and here we have what we might call genealogical time again very very roughly just a little bit after 1300 here and of course as I said earlier that varies a lot depending whether you're in England, Scotland, Ireland and Wales but some of these snips again are just peeping, scraping into genealogical time but I'm not ready yet to do as the Maxwell's have done and declaring any of these to be family predictive snips because you see these are blocks and of course we don't know which of these is the oldest and which is the youngest which is nearest to the branch and which is much older so any of these could be pre surname and one or two of them could be post surname we just don't know which one so we're dependent really on a lot more testing being done which will probably take a few years to get even more structure into this but I think the extent to which this tree has grown is quite an earth shattering actually there are still more big wide tests in the pipeline which will add even more to this so I'm going to finish by talking about the pitfalls and Maurice announced earlier today I was going to talk about convergence I'll make a few more slides then I think it's an interesting question again to what extent STRs and snips can relate to each other since many of you are probably very familiar with STR and testing you may be moving into it so taking a little chunk here off the trio showed you with the Kemsom Cummings here we identified this STR marker a rare marker within R1A as being the marker for this jack's group so in the words it's found in this line and this line so it must have originated somewhere in here which suggests it's actually a very old and very stable marker going back possibly to the 8th century if we're drawing correct conclusions about how old this branch branching split is little mistake here this will be YP618 by the way there it is this is the overall subclimate here which is below YP355 and then just after GGI last year another result appeared in a Norwegian person also containing DYS44721 it must be even older than we thought now many of these branches in this tree have a mixture of aisles and Norwegian names so clearly most of these high level SNPs seem to be appearing in Scandinavia and then dispersing by whatever routes across Europe but we thought the Kemsom Cummings one hadn't any Norwegian names and therefore post dated the move of this block here into the aisles and suddenly we found we had Norwegian in this group as well so we thought maybe the Jacks need to be renamed the Jack Oats or maybe the O Jacks anyway since the Norwegian is called Elson however this summer the Norwegian tested a single SNP and found himself to be YP618 minus in the words he's not in this subclimate so once again it's a trap I mentioned this morning we fell into it once again here in putting too much credibility in this single STR as being more predictive than it actually is I think it is very predictive actually it does seem to be very stable it hasn't found anybody in this group who hasn't had a mutation away from that but clearly this must be another occurrence of this but I think what's interesting to me is the sense of which first of all there seems to be a bit of convergence here between these two markers but also that the SNPs are now exploding this we can make a hypothesis that the Jacks group and SNP testing has blown that open and shown actually not where the Norwegian person fits in the tree but where he doesn't fit on the tree a second example of this I'll finish after this is in another branch of the same tree and here is a grouping put together by the R1A project which they're predicting of these four people who have all now three of them have done the big Y SNP testing form one single subgrouping under this newly discovered SNP in this tree YB1426 and they put this together on the basis that there's a shared STR so the only shared STR there are a couple of shared STRs here which make a signature here in which most of them have 16 one here who's actually a McPherson or descended from McPherson's of Sutherland seems to be 17 so the R1A project concluded reasonably that you have a branch market here moving in one direction and they would all therefore be close to each other and McPherson would be further away in his own sub-branch however it's not quite how it worked out this is how we construct the tree originally and here we have a block which is two SNPs together and one of the goals of this kind of research is to split the blocks so eventually we get at each branching point it's going to be impossible to do for everything because many of the other people who have some of these SNPs but not all will have died out those lines won't exist anymore will never split every single SNP block but this one has been split and very interestingly it didn't split along the lines predicted by the R1A project instead what we found was first of all a man by the name of Miss Talchard who is a member of the Devon project found that he was negative for this particular SNP which we thought he would have it was a no-call in his big Y test there was no result there and looking his band file which found he was actually positive for this particular SNP therefore we had to move him out of this STR16 group and he's now further away from them than McPherson with the other STR results but one more revision yesterday I had an email from someone who's taken a single SNP test for YP 1461 suspecting believing he was here but suspecting he might be like Miss Talchard and indeed we found he actually is so in other words we now have these two here in a power group under this group we don't know if they are common yet we have these two here remaining under this level and the McPherson now sits in between them in this level closer to these two than to these two this may all seem a little bit like train spotting to an extent but I think what's very interesting for people in this group is it has exploded again a hypothesis based on STR testing and it's shown how the this tree must be structured and means that this particular STR is not going to be a very good predictor of how people may fit within these groupings instead the SNPs are showing us how the groupings are actually structured of course there will still be a role for those STR results that they have in working on the finer more recent branching lower down the tree so I'm going to start with this stage and throw the floor open for some questions going to lay the day and ensure many of you are looking forward to your dinner but if you have any questions I can answer and please do ask away Thank you very much John for another very interesting talk I know I've just been mentioned several times during a number of presentations today and the like you know the different technologies of STR, NGS testing and SNP testing certainly for people who have multi genetic names or people whose history has sort of resulted in a whole series of people with different surnames having almost historical SNPs that makes mine very complicated in terms of trying to work out a testing strategy because the cost of these tests and once you get back for them are quite considerably different I mean the single SNP if you think that somebody is on your you can get a single SNP result for $20 as you say but clearly it will be advantageous to the wider community and other branches if you can persuade either yourself or your other half or the testing to pay for big Y or a Y full test but that's very very difficult because that's quite a lot more I think there's no question there's a cost issue here and I think the ideal situation really is you don't need to test everybody in a branch by big Y two or three would be enough depending on the degree of variation within that cluster from the STRs so I think you could club together and if everybody in a group of 10 chips in $10 then sorry not $10.50 then you better cover a big Y because it's optimistic but I think people don't tend to do that so I think most big Ys are funded by people who are paying for themselves so the SNP panel tests which now are much more cost low cost option for people to test their SNPs are based on the research from the big Y without the big Y these SNP panels would not be possible but of course it has to be born in mind that what you don't get from the SNP panel test is discovery of anything new you're only testing whether or not you may have particular SNPs which are known I think once a branch has investigated its structure I'm not convinced that any more big Y testing would be advantageous we've done two and they're so close to each other that it's, I find it hard to imagine there'd be a lot of variation from the others so I think we probably have enough there I'd like to see more elsewhere but I think with those two and that family cluster STRs would have been much more useful but of course if others wanted to know whether they were part of particular subclade, of that subclade they can take the panel test to do that in my spirit project what we did was we pooled money together and we actually tested the two members of the project we only got 12 members in the project tested the two members that seemed the most genetically distant from each other and that actually, the common access was probably 1600 sometime in the 1600s but it turned a difference of two SNPs between the two project members so it actually proved to be a very useful exercise to encourage anybody that's willing a certain project to set up a general fund advertise it on your Facebook page or on your activity feed and just get that money together for the two most distant members of your project a question from James Irvine John, I was going to have to move away from the microphone that was just terrific I had you talk a few months ago and the progress you've made is having more unfortunately tomorrow I'm going to start talking to you that's fun now but I'm affored to it this will be the other sort of just the difficulties we all face one specific one that I take up this evening that I'm going to take up tomorrow you said that you looked at your own big wide test for SDRs and that 111 markers find a 100% agreement between then between the SDRs and the from big wide and the original SDR I find a very different answer 5 comparisons I've made some 67 some 117 and find only about 90% correlation and some significant innovations and I've got the impression that I much rather trust for SDRs the original SDR tests and not such the big wide versions of SDRs at all and I think it's quite dangerous and I've heard others saying the same thing I think it's far too early to say that the big wide can substitute replace big wide tests just happened in my case 101 or 111 panel and they were all the same as the FTDNA reads but maybe we need more investigation of this to see how consistent that is just a comment how can you treat DNA no longer give you the raw data for the SDRs big wide test are you sure about that they're saying that the MTDNA out are you sure they're taking it out? Yeah I think there's a lot of debate about this earlier when family tree DNA stopped providing the MTDNA data it was included incidentally in the big wide now I also found I actually have done the full genome the full modicondrial sequence and again I found that my MTDNA results in the big wide were 100% and they're all exactly the same as my FMS but many people only get 80% or 70% or less so there's a lot of variability in those MTDNA results that were being reported before the spring the big wide isn't about MTDNA so I don't think it matters really that it's been taken out but I think the SDRs can't go because they're part of the sequencing there are the SNPs inside SDRs which we need to know about and of course the SDRs themselves form part of the sequence of your DNA so to strip them out we'll be removing part of the raw data the customer actually is entitled to and they would worry me if they try to do that no no it's okay it's one thing actually the read lengths if we do improve the read lengths it will be the SDRs that will improve so if we can get to 50-80 as we can yes I hope so soon yes when is the cost going to fall John? there were sales every year I think most people who have ordered the big wide have probably done it on a sale there's one happening right now and I'm sure there's one at the Christmas time as well the big wide at the show which ranges down from $575 to a year $475 so I'm asking for the day when it actually comes down to $1,000 you mentioned that you can analyse your own results when they're overall fit how accessible is that to the users? yes I think Jim's going to answer as well but actually I think not especially accessible but I think what you have if you've done the FTD in a big wide test is you have a list of novel variants and you can work from that if you have the right kind of filtering the right reference materials to see which of these variants are known to be unstable which we've discovered already labelled, identified as being not reliable you can filter those out it's not full day it's not full day to approach you but what I do recommend to anyway I know who does the big wide is to pay the extra $40-50 to get the third part analysis so you can send your BAM file to the two companies I mentioned earlier who will do the full analysis for you but what I'm suggesting also is that it's not an insuperable barrier for the ordinary genealogist to take on seeing what they can work out themselves from their own results and you can also view your BAM file if you wish and there are some quite good graphical readers but with a little bit of training you can actually begin to see the patterns of self in your own data Jim will speak BAM file readers available so you can send your big wide BAM file or you can put the BAM file into and you can view it that way and James is going to comment on it as well I'm going to attempt to show you how to do it tomorrow can I say we're not doing the kind of analysis which you'll be doing what we're doing is filtering out a list that's already been prepared for us and working out what's useful and what's not useful in that list yes I've been fantastic again I learned a lot I wanted to ask about one thing that you said was difficult just because I think it's important and that's the estimating the dates of origin because in order to collaborate with historians and geologists it sounds like we really need to get accurate dates for those now we're in this beautiful time frame so you mentioned 100 years per comment are you doing any advances I'm getting that more accurate yes there have been two important papers published in the past year attempting to date SNP branches one was published by the Y14 based on their own method which was around correlating the age of the Malta boy remains in Siberia which I mean carbon 14 dated and then by counting the SNPs which they've derived from his DNA they then come up with their own calibration for what the mutation rate may be the other paper was produced by the Icelandic team at Decode who are basing their own calibrations on pedigrees as you may know Iceland has got extensive records historical records and there's a huge general data base containing a very large portion of the Icelandic people and there's been a very wide collection and so Decode have been using the Icelandic database again as a means of correlating the age of the SNPs they've identified in their research and the two papers have come up with slightly different mutation rates but not greatly different mutation rates and so the rough rule of thumb from that people are now talking about roughly 150 or 160 years per SNP from the big Y obviously it depends which test you take how much of your Y chromosome is in red but a Y full of defined a defined region of the Y chromosome they call the combat area and their 155 approximately year mutation rate is based on that so you can count the SNPs in each branch if you wish you can average how many SNPs per branch there may be in say this branch or this branch and then multiply that by the by the number of years per SNP but obviously we've got very few samples here so we're averaging a very low number of samples and therefore any numbers we derive for more recent SNPs are not going to be great reliable it has to be a huge margin of error around then it's probably more reliable for much older SNPs we told for example of this is probably about 3,000 years old and these are these two children SNPs here are probably not much younger they may be around 2,000 years old most then the rest so make it careful because you can try to estimate what the ages may be but with very few samples we're not going to get very reliable figures that's where the numbers down the side come from essentially it's easy but it's reliable I have a quick question from Derek I've been for actually about 130 years lower in order to practice my practice first for analyzing this stuff Alex Williams big trade the easiest to clear us second is life hope to give the dates we also integrate all the ages we have many samples it's possible very easily and through the ages I have a quick question you mentioned your closest match the big trade was from Barbados did you consider that it's one of the leftovers from healthcare from worship to Barbados have you considered that? I'm sure there's some connection there I'm sure many of you know there's a lot of Irish YDNA in Barbados for historical reasons for the actions of Cromwell I think the restoration regime as well was guilty I don't know where my ancestors come from I'm assuming that they are Irish that could be Scottish or Welsh when you go back to these kind of time deaths but yes I think the Barbadian is probably the same from people who were Barbados from either Scotland or Ireland but just on the other point I agree with your analysis of Alex Williamson and Wifeful but I'll say to if you've done the big Y send your raw data to as many places as possible and get as many analyses as you can you shake me's head but I'll say do it because the differences can be instructive and illuminating so I sent mine to both Alex Williamson and Wifeful and there were slightly different analyses from both which again I can use in my research so yeah we agree on that I've done the same with the Gleason DNA project which I'll be talking about tomorrow again looking at it from a very different perspective I want to try and reconstruct the branching pattern within my Gleason family tree of the 13 members in the project 6 of them have done the big Y test and that's almost 50% of people in the Gleason DNA project have done the big Y test 2 of them are actually brothers and they have different snips compared to each other so this is cutting edge science John and between yourself and James it is so wonderful to have you both here at the conference because you're bringing to us the latest discoveries that will change the way that we think about reconstructing our family trees so I just want to give you a very big thank you for such a wonderful presentation