 Good evening and good afternoon, and ladies and gentlemen, welcome to the second last lecture of today. Can everybody hear me okay in the back? Yes, okay, fabulous. Just to introduce myself, I'm Morris Leeson. I am a genetic genealogist. I'm also a psychiatrist, a pharmacy physician, a part-time actor. I do the occasional voiceover work. I'm a jack of all trades, and I'm trying to master just maybe one or two. But I've been involved in genealogy and genetic genealogy specifically since about 2008. I run the Leeson, the Farrell, and the Speeran DNA projects and a couple of other smaller interest projects. And I put together the DNA lecture program for genetic genealogy Ireland, as well as helping out in the UK with the DNA lecture program for who do you think you are. So it's a pleasure to come here and talk to you today about my own family. I'm going to talk to you about the Leesons, and I'm also going to use the example of the Leesons to help you perhaps understand what is possible in surname projects in terms of reconstructing family trees. And the title is Building a Family Tree with SNPs, STOs, and Maine unnamed people. Now what we're used to doing is we're used to building our family trees with named people. But what I'm exploring is the possibility of actually substituting DNA markers for the ancestors when the named people run out, so that ultimately we have this kind of a tree. The named people are in blue, and they're starting here in say around about 1960, going all the way back into the 1400s. And some of us will have relatively extensive pedigrees that maybe go back to 1690, and this is just the direct male line, father, father, father, father. Some of us will go back about the 1810, others will go back and have a roadblock about 1840, 1870. It very, very much depends, but this is kind of a typical Irish family tree. And in our Irish surname studies what we find is that we end up with a lot of trees going back to about 1800, then the records run out and we're left with perhaps DNA instead as the opportunity to connect these people together. So it's all about using DNA markers when people run out, and the big question is, is it possible? There's the brick wall. Can we actually get beyond the brick wall and discern the branching pattern of our overall family tree going back to when the Gleason's first arrived, or the Doherty's, or the Morgan's, or the Spearans, or any other surname, when it first arose in Ireland around about 900,000 AD? Is it possible to actually at least have the branching pattern of that tree, even if we can't put a name in every single box? Can we put DNA markers there instead? So that's the question that I asked myself with the surname projects that I'm running. And we're just going to look at the why DNA in this presentation. Father, Father, Fatherline. We're not looking at any other ancestral lines. Here to remind you is the why chromosome. When we unravel it, it forms that double helix. We've got these bases along the bottom. They're G, C, A and T. G always binds with C. A always binds with T. That forms the basis of the genetic code. And the main take-home message from this slide is that there are two types of DNA marker. The first one is the STR marker, and the second one is the SNP marker. And we'll be talking about both types during the presentation today. The STR marker is a repeat of several bases, TAC, TAC, TAC, which you see here. So this particular marker has a value of 3 because there are 3 repeats. The SNP marker on the other hand is just a single substitution. This perhaps should have been a gene where you can see that an A has been substituted in there instead. A very important distinction between the two types of marker. You have about 500 STR markers on the why chromosome. You have about 50,000 SNP markers on the why chromosome. So there's a thousand times, no, there's 100 times more SNP markers than there are STR markers. So there's the potential there that they give us a lot more granularity. A finer detail. Now SNP markers have been used very extensively to study human migration out of Africa 50,000 years ago and spreading across the globe. And this was when we had relatively few SNP markers identified. But of course in the last couple of years there's been a tsunami of SNP markers discovered and it's allowed us to get a lot more granular and discover a lot more of the finer downstream branches of the human evolutionary tree. So that's the top down approach. That human evolutionary tree is getting further and further away from Africa 50,000 years ago and it's getting very, very close to Ireland 2,000 years ago. So it's coming up into a genealogical timeframe, into a historical timeframe. So this top down approach of human evolution is beginning to meet what we are doing which is the bottom up approach of genealogy. And that's why it's particularly exciting to try and explore this interface between the two movements. There's a variety of SNPs. Here's just an example of the tree as it has progressed. Finer and finer branches. A couple of years ago I was L21 and last year I was L2 by 5. Now I'm even further downstream from that marker today. And I'll show you some of those results for the Gleason DNA project. So new markers help to find the finer branches of the human evolutionary tree. When you do a YDNA 37 marker test, like you might buy outside at Family Tree DNA, and if you are thinking of buying a DNA test, incidentally I'd advise you to do it after this lecture because they're going to be closing down the main hall at around about 5.30. So it will be a limited amount of time before they come in here and take everything down here as well. So I will be trying to gain a little bit of time, so excuse my speed in talking to you. Because Brad will be the last lecture and he's going to tell us about the future of genetic genealogy and I'd hate not to find out what the future is and not give him a chance to finish. I want to know what my future holds. So I will be running through this very quickly. This is what a surname project looks like. So for example you have all the markers along the top, you have the various members here. I like to think of it as a stack of YDNAs, stack of Y chromosomes, and all along the Y chromosome you have the markers here. These are the STO values for each individual stacked on top of each other and you can see that they're all identical. It's only when you get out here that you have the occasional mutation. This indicates that these individuals are very closely related to each other. Now there's a couple of features. The first 12 markers are called panel 1 and they're markers 1 to 12. The second panel is marker 13 to 25. The third panel is marker 26 to 37. These are the mutations here. The fast mutating markers are indicated by maroon or dark red coloration up here. So this is a very good way of identifying what is a fast mutating marker and what is not. The other thing that's worth noting is the minimum, the maximum, and the mode. Now the mode is the most frequent value. You may not be too familiar with that. The mean is the average value. We don't really are not interested in that. Median is the middle value but the mode is the most frequent value. And you can see here that everything is here is 15 so the most modal value will be 15. Over here we've got a few mutations but the modal value is still 33 with the occasional 34 thrown in. So the most frequent value is 33 on that side. That's what the modal value is. The modal haplotype refers to all of these modal values across here. And the importance of the modal haplotype is that it probably represents the genetic signature of the common ancestor to that entire group. Probably. I'd say 99.9% of the time it would do so. Maybe a bit less than that. I see James Irvine shaking his head and then when I say no he says yes that's correct no. So it may not be that frequent. I guess we just haven't done as much research on how frequently it occurs that way. And it will be biased and it will be skewed because of course we are the survivors of the ancestors. So there may have been population bottlenecks, a variety of other things that made some branches of the family die out that were perhaps more representative genetically of the common ancestor. So we may very well be seeing a skewed version of the ancestral haplotype. But the modal haplotype is a very important consideration. Always compare new members to the project to the member that is closest to or exactly matching what you have as the modal haplotype. That is a general rule of thumb because that helps to decide whether or not they should be included in that particular surname project. At least that's the way that I operate. I know some project managers are slightly different. Don't worry about reading this. I'm just showing this to you for the nice colors. And the nice colors are very important because you can see here that there's a completely different color pattern in Lineage 1 of the Gleason-Cernand project to Lineage 2. Here's a long line down here. It's a different color down here. Here are two long lines here in orange on Lineage 1. They're missing in Lineage 2. Lineage 2 has these two blue lines here. There's a little bit of one there but it's large. This one here is missing in Lineage 1. The point is that the types of mutations you get help to find this distinctive marker value pattern for each of the lineages that tells you that Lineage 1 is very different from Lineage 2. Now I can tell you that in the Gleason project everybody in Lineage 1 is descended from a single individual. And his name is Thomas Gleason and he was born in 1609 in Cockfield in Suffolk. It is an English line. It is not an Irish line. Lineage 2 are the North Tipperary Gleasons. And that's typically in Ireland where the Gleasons come from. Are there any Gleasons in the audience or anybody with Gleason ancestry? One, two, yes, well three now. There's a saying in North Tipperary, if you throw a stone you'll hit a Gleason. It's an observation, not a recommendation. So that is where Gleasons are very, very heavily concentrated. Now why do we group people together? And the answer is because different pieces of evidence point to the probability that they are all closely related. And I'm trying at this term markers of possible relatedness. And DNA is just one of those. So I put together these criteria for the Farrell project that I inherited earlier this year. And I said what indicators could possibly suggest that the people were possibly related to each other? First one of course is that two members of your project share the same surname. Your Gleason, I'm a Gleason, hey, maybe we're related. We share the same surname. It's a marker of possible relatedness. Genetic distance then, turning on to the genetic markers are in orange or red. But genetic distance is one and that's the distance between two members. If it's zero or one or two at 37, I don't know what that was. Then it indicates that there was possibly a very close relationship between these two people. Similarly, as James has talked about in his lecture, a tip score of let's say greater than 80%. Also suggests that the two people are very closely related to each other. If two people share the same rare marker value, that can also be a very indicative indication that these people are possibly related to each other. Also if the results of SNP testing are consistent with each other. If you have two people who are on separate branches of the human evolutionary tree that was connected 10,000 years ago, then they can't be related to each other in a genealogical time frame. You need to check to see whether the SNPs of two people if they have been tested are consistent. It doesn't matter if one is upstream and one is downstream, as long as they're consistently appear to be on the same branch. If you've got two people on branches that are far flung, then they are not related to each other. The same surname variant. So you can get a Gleason with two E's, you can get a Gleason with an E and an A. You can get a Farrell, a Farley, a Frawley. The same surname variant is present among two people. That's again suggestive. The same location, the place that they came from. You saw that for Gleason lineage two, they all came from Tipperary. That tells you that they're closely related potentially to each other. And ultimately, if they have the same common ancestor, two members of your project with the same common ancestor, well then they probably are very related to each other indeed. So when I applied these markers of possible relatedness, or MPRs, I just had to invent my own TLA, three-letter acronym. When you look at these MPRs and apply them to the Farrell project, but we're just going to use one, we're going to use genetic distance. And this is the threshold for genetic distance that is set by Family Tree DNA. It's relatively arbitrary, but it's as good as any. My suggested approach would be to use genetic distance as your first guide, as to whether or not two members should be lumped together in the same project. And for those members who are probably or only possibly related, use the tip tool. Greater than 80%, greater than 90%, greater than 60%, you choose your own cut-off then to decide whether these outliers should be included in the main group or not. I developed this method of dividing those on the periphery into a kind of a B group, and my A group is the core. So you'll see in my Farrell project that I have Genetic Family 1A, which is the core group, and Genetic Family 1B, which accounts for the outliers, who would perhaps skew the modal haplotype if I included them in the core group. So in the Farrell project, I grouped people on the basis of genetic distance alone, and they fell into these separate genetic clusters here. The interesting thing is the same surname variant occurred in Genetic Family 1 and Genetic Family 2. This is a bunch of Farleys, this is a bunch of Ferrells. So we're getting independent verification that a second marker of potential relatedness is turning up in these projects when you have just grouped them on the basis of a totally independent variable, namely genetic distance. The second thing to notice is that one of the groups, they all have the same most distant known ancestor. So again, another independent variable. You group them on one variable, and then you find that they match on another variable. And thirdly, also in Genetic Family 3, there was a rare marker value among all of these people. This Dis449 value of 26 only occurs in 1.7% of the general population. It occurs in 0% approximately of those in haplogroup A1B. So again, you get independent verification on the basis of genetic distance from a variety of other markers of potential relatedness. And this is why we group people together. We group people together so that we optimize the chances that we have a group that are consistently related to each other. One of the questions I get from project members is, why do I match some members of the project but I don't match others? So for example, here's my dad, up in yellow here, and he matches 5 out of the 12 people in Lineage 2. And the match threshold there is 4 out of 37, so he'll have a 0, 1, 2, or 3, or 4 out of 37 match with these people. But the non-matches, there's also 5 non-matches in Lineage 2 that don't turn up in his list of matches. These are separated from him by a distance of either 5 out of 37 or 6 out of 37 markers. And this, to me, is one of the most important reasons for joining a surname project as an individual, that you will actually be put in touch with people who are relevant to your family's surname that don't actually show up in your list of matches on YDNA, on Family Tree DNA because the threshold level is set at a particular level that does miss some positive people that should be included in your family tree or in your family group. So this is why surname projects will tell you more than just your list of matches alone. So do join your relevant surname project. But once you have a genetically related group, and this is important because some project administrators do not group people purely on the basis of genetic distance. They will do it in terms of maybe geographic location, which is perfectly valid. It's not right, it's not long, it's just a different method. You know, those who match each other but who came from Georgia, those who match each other but came from Virginia, you know, that separates the group in a totally different way from just looking at genetic relatedness. And this system of trying to reconstruct the branching pattern of the family tree going back to the origin of the surname is only really going to be possible if you use a group of people who are tightly genetically related to each other. Identifying branches within a genetic family, we're using mutation history trees also known as filograms, cladograms, and building a mutation history tree based on STO mutations. We're going to be talking about known genealogies, we're going to talk about a hand-drawn tree, we're going to talk about fluxus diagrams, and are they all consistent with each other? So if I just take the first 12 markers of lineage 2, these are all the gleasons in lineage 2, and these are the markers here, you can see that there are different mutations around the place. Here's one mutation, well, the first, there's the modal haplotype of there, there are four people who match the modal haplotype exactly, that's member number 1, 3, 4, and 5. They're all an exact match, so they form their own specific group. Now this mutation here, we can imagine that that must represent a branching off from the main branch at some point in the past. And then if we come down to these mutations here, they could represent another branching point at some point in the past. You don't know if it occurs before this branching point here, or this branch here, but we're beginning to actually split our group of gleasons into different branches based on their DNA. Then if we come down to this group of mutations here, we can imagine that that came off at some point in time, and there are two people that have these two here, but then if you look over here, a further set of them have an additional mutation, so there's a sub-branch that comes off that branch. And that's how you can build a hand-drawn mutation history tree. That's fine at 12 markers. You can do it for 37 markers and develop this more complex tree, and now there's additional branches shown in pink that are coming off what was the original family tree just based on the 12 markers shown in the blue. And those additional pink values are there. But there are several problems with this. First of all, there are parallel mutations. Here's a mutation. 464b going from 16 to 17 appears in this branch, and also in this branch. Here's another mutation, and that's occurring in branch number 1 and branch number 10. So these are parallel mutations. And then we also have another mutation that is occurring in five of the branches. So there's a lot of parallel mutations in this particular model which raises the question, is this the best fit for the data that I have available? And also, is the resolution of 37 markers, is 37 markers enough to actually give you a reasonable representation of what the tree actually looks like? These are questions that I don't know the answers to. So I turn to Fluxus, which is a software program that can give you the maximum parsimony version of the tree. And by maximum parsimony, we mean the least number of steps to account for the data. So it's giving you kind of the best fit tree. Now, the problem with all of these models is that you can actually have a variety of different models that the data would still fit, and you don't know which is the right one. The maximum parsimony approach, which is akin to Occam's razor, the least number of steps possible to explain the data, gives you perhaps the highest probability that what you're producing is the most reflective of what it actually is in terms of your ancestry. But there's no guarantee for that. The other thing about Fluxus is, while it can help, and it's useful to check your hand-drawn tree against it, it is cumbersome, it's fiddly, it's easy to make mistakes, and it's difficult to visualize a family tree from this. You know, I can convert this into a family tree with an ancestor at the top and descendants at the bottom, but it's not easy, and it's difficult to interpret and it's time-consuming. So, thank you to Ralph Taylor, who did this work for me, and he is one of our colleagues that we talk to quite a lot. Ralph knows more about Fluxus Diagrams than I ever hope I will. So, it's not for the faint-hearted, but it can be informative. There is a major problem, though. All of this is rubbish. Potentially rubbish because of convergence. And I'm going to talk a little bit about convergence. It could be a spanner in the wheel, it could be a complete red herring, it could be a bit of both, or it could be a major problem when you're doing these STO-based mutation history trees. And the reason is this. Markers mutate, STO markers mutate, and their values change over time, and you get forward mutations, which are, if you like, away from that modal value, and you get backward mutations, which come back to the modal value. So, for example, if we started off 10,000 years ago with a single marker, which had a value of, say, 8, over the course of the millennia, we find that some of the descendants of this individual, their marker value might change and go up, that's the red line above. Some of them might not change very much at all, and they just change by one repeat value a year around about 7,000 years ago, and that never changed again. Some of them actually go down in values, and then they go back up in values, and that creates a major problem, because this line here, the red line is diverging. There's a lack of divergence in the blue line, but you can see that the blue line and the green line have the net up here, and they have the same values. Now, what do you usually think that has the same value as you? Well, they're probably closely related to me, right? Because we have the same value. No, you were related by a common ancestor 10,000 years ago, so the convergence in this situation throws a massive fly in the ointment. There was also convergence back here, so if they were doing genealogy of 6,000 years ago, these descendants of the purple line and the green line would have had the same value, but they were closely genetically related, but in actual fact they were related by a common ancestor who lived 3,000 years ago. So that's just an example with one marker, but with 12 markers there's less of a chance that you actually get that with 25, 37, 67, 111. There's an even lesser chance, but the important thing to take away is that both back mutations, mutations that cause people to come and converge on each other, and parallel mutations will disguise the true branching pattern of the tree. So this is an example of convergence from the L226 dataset kindly given to me by Dennis Wright. And he developed one of these, or one of his project members, developed a cladogram, filogram mutation history tree based upon the STO values in the L226 haplogroup project and had this one level branching pattern. And then one of the new SNPs came along for the L226 group and it was called, let me see, FGC 5628. So how many branches, does it occur on just one branch? That's what you're hoping for, is that your STO base tree is correct, and then this new SNP will actually maybe define one of these branches up here, for example. Or maybe a branch down here, maybe it's going to occur there, and everybody down here will be FGC 5628. This is where FGC 5628 occurs in this tree. On virtually, there's one, two, three, four, five branches, and this is just the top half of the tree, it continues down for another four times this length. This particular SNP occurs on multiple branches of the L226 tree based on STOs. So this is, and you're going to see examples of convergence in these haplogroup projects very, very frequently. It's going to be less of a problem in surname projects because the surname anchors everybody into a particular group. The L226 haplogroup project has a variety of different surnames within the group. So whereas it will be less of a problem in surname projects, it will still be problematic. And an example of that was given to me by Ali McDonald, who is administrator of the Stuart project. And here we have Alexander Stuart, a fourth high Stuart of Scotland, living about 1200. Then we have the high Stuart line here and the bunkal Stuart's in this line here. The high Stuart's are S781 negative, that's their SNP there, S781 negative, this one is S781 positive. So this particular SNP differentiates between the two branches of the Stuart clan. But let's have a look then at three of the participants. One was Fred Stuart, one was Earl Castle Stuart, one was Paul Tampson. These was the Y chromosome position, the reference. This chap here was a bunkal. He was S781 positive. These two here were from the high Stuart's and S781 negative. So descendants of John Bunkal were the high Stuart's, Carrie were positive for the marker and the descendants of Walter Stuart, who were the high Stuart's, were negative for the marker. But here is the 67 marker values and here is the genetic distance. Earl Castle Stuart had a genetic distance of three out of 67 to Paul Thompson and four out of 67 to Fred Stuart, indicating that they were possibly very closely related. But in actual fact, we know that they're not closely related. Their ancestor is not in the last couple of hundred years. Their ancestor was 1200 in the 1200s and we know that from the genealogy. So if I was a project admin and I saw somebody with a genetic distance of four out of 67 or three out of 67, I would say, oh yeah, definitely put them in our lineage because they are related to us. Well, this is an example of how somebody that looks closely related actually is much more distantly related than you would expect. And it's only the SNP markers that allow you to differentiate between the convergence and what is actually a close relationship. So it's probably only really a problem with dissimilar surname matches. So for example, NPEs. It could be a possible NPE within the era of say zero to a thousand years before the presence so in the last thousand years. It could alternatively be a pre-surname match. So you have somebody who matches a little, matches the glistens. It could be from before the time of surnames maybe a neighbor from that area and you have a common ancestor going back before 1000 AD or it could be an example of convergence where the common ancestor is not a thousand years ago it's not even two thousand years ago, it's ten thousand years ago. So you have to be really careful about this. It's potentially misleading if you use it to pinpoint your ancestral origins. So that is a very, very important consideration and you do need to take it into account. Can it only be distinguished by upgrading everybody to 111 markers? Probably not because we saw from the Stuart example that even at a high level of marker testing you can still get convergence. Probably downstream SNP testing of everyone is the best way of distinguishing between whether or not convergence is present. Is there a higher risk of convergence at lower levels of testing? Absolutely. 12 markers, there's lots of convergence. 25 markers, a little bit less. 37, it's there. 67 even less. 111, the lowest chance of convergence but you haven't eliminated the risk entirely. At higher levels of genetic distance, yes. It's unusual to have convergence if it's an exact match in my estimation. It's much more likely if you've got one of those kind of close to outlying matches like 4 out of 37 I think the risk of convergence is higher there. If you have many different surnames among your YDNA 37 matches then a lot of them are probably convergent. I'm talking about 25 matches, 50 matches of 37 markers. A lot of them are not real matches. They're probably an example of convergence of play because of your own particular genetic signature being very close to people's genetic signatures on other branches of the human evolutionary tree. And certain haplogroups are probably more prone to this because certain haplogroups might have diverged and then converged and the finer branches might be overlapping. But we haven't done enough research to actually define which of those haplogroups it's likely to be. So in this confusing picture of trying to draw a mutation history tree and then being confounded by convergence we have SNPs to the rescue. Or do we? We had a tsunami of SNPs over the last couple of years by a variety of these testing companies and previously what we would do is we would join a haplogroup project that the administrator would look at our STOR profile and advise us on what SNP we should test. And we were doing single SNP testing. But then we had the big Y coming in, we had Y Elite from FGC, a whole variety of other companies producing these SNP tests. And like I say, we only had 111 STORs to deal with. We now have 50,000 SNPs to deal with. So there is a huge avalanche of these SNPs. And this is what the tree used to look like in 2013. We've seen from Brian's presentation, from James' presentation. It's a lot more complicated than that now. And there are so many finer branches to the tree it's practically impossible to put them on a single sheet of paper. So fine-scale SNP testing is probably the best method of determining branching patterns within a genetic family. But how do we do it cheaply and as efficiently as possible? That's still a challenge. Because the cost of the big Y, which is the cheapest of these tests, well, it's cheaper than the Y Elite one, which is up to about $800, $900, $1,000. Big Y is still $575. It's on offer here at $475, which is a great discount. And thank you, Family Tree DNA, who's listening for giving us that discount. But we'd like it to be $100. So, yes, let's have a round of applause for that. No pressure, no pressure. So, we live for that day so we can get everybody tested. There are other challenges, apart from the cost. Here's a challenge. How can I properly grade my SNP candidates? Okay, well, you just do that and you do that and you do that and do that. Okay, it's highly technical. So I'm not even going to spend any time on that slide. There's a huge amount of technicalities involved in this. The other thing that is a challenge is that when you can divide SNPs, these 50,000 SNPs into those that we know about already and those that have just been discovered. And I'm going to get my glasses so I can keep an eye on the time. Okay. The known SNPs are those that have already discovered, the new SNPs are those that have never been discovered before. And those that have never been discovered before, these new SNPs can be divided into two subgroups. Those that you share with someone else so you're sharing a newly discovered SNP with somebody else and those that you don't share, those that are just unique are private to you as an individual. You have unique mutations that nobody else in the world shares with you because they haven't tested yet. And what's going to happen as more people test is that, first of all, these newly discovered SNPs will then move over into the known SNP group. And it's a little bit confusing when that happens because they don't tell us when they are moving them from one group to the other. So that's a challenge. It's confusing. The other thing that's going to happen is that as more people test, there you are proudly holding onto your unique SNPs, your private SNPs that nobody else in the world shares with you. You feel like a special person and then somebody comes along, they share the SNPs with you, your private SNPs are ripped away from you, and you are moved up into Alex Williamson's big tree and a whole new block of undifferentiated SNPs is added. And here's another big problem. Where do I find the most reliable SNPs? Well, not in the palindromic arms, not in the centromere, not in my Q12, not in the pseudo-autosomal region, in the highly XY-homolog region, and in short repetitive elements, not flanking short repetitive elements, not in the post-palindromic region. Well, thank you very much. It raises the question, when is a SNP, not a SNP? It also raises the question, when is not a SNP a SNP? So there's still a lot of confusion about calling and identifying these SNPs. And a big question is, is a SNP really present? So for example, has it been detected? No. Is it present? In actuality? No. Okay, great. You have correctly identified that a SNP is not there. That's a great result. At the bottom, is a SNP detected? Yes. Is it present? Yes. So therefore you have correctly identified a SNP that is present. I would love to have just those two rows in that table, but unfortunately the whole area of rows is where you have conflict. The SNP might be, just because SNP isn't detected doesn't mean it's not there. And also just because a SNP has not been detected doesn't mean that it is there. So there is that confusion because of poor coverage of a particular area of the white chromosome. Sometimes it's very difficult to position a test, especially in palindromic regions. It's either here or it's here. We just don't know. But it's either in black rock or it's in bold bridge. It's actually tell, because black rock and bold bridge actually look really, really close to each other. They look exactly the same, so it's impossible to tell. Then there's ones like low confidence SNPs and unstable SNPs that change a lot. So there's still a lot of confusion about actually saying, yes, I'm certain this is a SNP and I'm certain that you have it and I'm also certain that the person we are comparing you to does not. So there's still a lot of confusion in that regard. These are my big Y results. My dad's big Y results. We have his matches here. This is the number of shared novel variants. So these are the new discovered SNPs and he shares 75 of the new discoveries with this person, 71 with this person, etc, etc. The ones I'm interested in are the ones that are an exact match. No SNP differences. He has no difference at all with these first eight people. I am particularly interested in looking at them and if I click on any of these numbers up here, then what I get is this wonderful pop-up box that shows me all 59 common SNPs shared with that particular person and what I do then and at least a little one of my project members has done, copy that into an Excel spreadsheet and do that for each of the exact matches, each of the eight exact matches and this one just shows the ones that are not shared with my dad's results. The important thing here is to realize that your big Y results are first of all looked at by the surname administrator of the project, then the hapla group administrators and we have the Z255 group down here, Neil Downing and John Murphy, James Keynes, the CTS446 and we also have the Z255 Yahoo group who discusses it online. So there's a huge amount of collaboration going on. We're also lucky to have Alex Williamson who does the big tree, Nigel McCarthy who is an administrator of the Munster Irish project and we've also sent the Family Tree DNA results to YFUL for a reinterpretation and what we're getting is this is our spreadsheet, we've generated this ourselves and again I'm just showing you the colors but here is a clue. H. Gleason and P. Gleason are identified by these yellow snips here. Nobody else in the group has these snips. H. Gleason and Mr. Little have this particular pink snip up here and that's that one up there. M. Gleason, that's my dad, H. Gleason and Little also have these green snips which are these ones here and then the Gleason specific snips and these are probably the snips that define the Gleason surname until some interloper comes along and tests and we define that actually no half of them belong to the carols. They're then the orange so these snips here are shared just by Gleasons and nobody else in the world at the moment. So as more people test that will change and then we've got snips that are further upstream in the human evolutionary tree. From this we can generate a pedigree, there's the most distant snip there and as we come along we can break it down and you can see that a branching pattern is beginning to emerge. Also the Z255 group have their own version of the branching pattern. The Gleasons are up here. These are the Gleason specific snips here. Little, Gleason, Gleason, Gleason, you can see that the two even within Gleason we're splitting into two separate groups. This is Alex Williamson's tree and the Gleasons are over here. We've got a nice little group of five in this diagram. Since this was taken we have an extra Gleason and the exciting thing is that the Gleason has been proven to be genetically related to the Gleason. So that was a major find because the surname dictionaries will tell you that Gleason was a surname from the 1500s, 1600s, 1700s. We've actually shown now that it is actually genetically related to the main Gleason branch. Here's a close up of the Gleason branch on Alex's tree. If you click on any of these names here you get some wonderful readouts of the people. And here's Alex's summary of the snips markers for this particular area for one, two, three, four, five, six people. There's a Carol who is quite close to us but he branched off before the Gleasons. And that just gives you an idea of these are the snips here with their position and this is where they fit in the actual tree. So we're beginning to see a branching pattern. Another challenge for me at Gleason anyway is these long numbers. These indicate the position of the snip marker on the Y chromosome. They will eventually be christened with a name like I5631 which I find less of a mouthful to say than those long numbers. But there is confusion over the naming pattern. Different people call the same thing different names. Thank you. I don't need that confusion. Please, let's sort ourselves out. So another challenge for the novice. The currently unique snips among Gleason members and here's two of them here. So this particular member here 338070 has three unique snips. That's that person down there. The little which is probably an NPE. She reckons that there was a Gleason involved with some ancestor back in the mists of time. Also have three private snips and this one is 60393. This one over here and then there's another one. There's my dad actually. He's got all of these unique private snips. Or at least private for now. Unique for now. Until somebody else joins the project we find that actually no, you're going to have to back some of those snips because they're shared with somebody else. This is what came back with the Y full analysis and again these are my dad's results. He's got five best quality snips. You know, confident that these are real and not imaginary snips and then he's got 11 ambiguous snips. These are based only on two calls whereas these would be based on somewhere between 50 to 70 calls. So in other words the machine has actually scanned these ones about 70, the region of the chromosome about 70 times and it's reasonably certain it's found them 70 times in that region. These snips have been in an area of the chromosome that's only been scanned twice but they came up both times. Slightly less reliable perhaps than the high confidence ones. But have a look at this. If we compare Alex's analysis with the Y full analysis OK. That one is the same. So the first one's OK. That one's the same as well. And then the third one. OK, that's down there. That's fine. And the fourth one, that's down there. OK, that's on the list. Fine, that's on the list. What about this one? Nope. Nope. So there is conflict between the different interpretations of the different people. And Alex has identified some snips that Y full has not identified and vice versa. So it's kind of different strokes for different folks or as I like to say different snips from different lips. Who is right? We don't know. So this is one of the challenges that still faces us in this particular arena. Now I'm going to check the time again. OK, I'm running out of time rapidly. I will go through this very quickly then. This just summarizes the STO markers that you've seen previously in the Lineage 2 family tree. But rather than showing you that diagram again, we've just taken the relevant ones out of that particular tree and put it into an Excel spreadsheet. When we incorporate STO markers into the mix, then we can get a much different kind of family tree. And this was a process that was pioneered by Nigel McCarthy. He's the first person that I know that actually has taken snip markers and these are the ones in black up here and also then added STO markers. And he's produced this wonderful tree. He's got our Gleasons in here because we happen to be in an area of interest to his McCarthy project. And his best guess is that the Gleasons and his or one branch of his McCarthy's were actually related somewhere between 1,000 AD and 1,000 AD. So it's amazing what we're beginning to be able to do but Nigel pioneered this method and our project member Lisa Little took it a step further and applied it to every Gleason in the tree. So what we have here now is the Gleason specific snips which were the orange ones in that Excel spreadsheet. We then have some specific snips down here shared on this particular branch but there's nobody else in that branch that's taken the big Y test. And then we have a group of people over here who are defined by this snip for the entire group and then this snip just for these people here. And this now is beginning to look much more like a combined mutation history tree slash family history tree. And to that we can add our ancestors. Here are the ancestors for this person here. Here are the ancestors for this person here and so on and so on. The important thing here is that now just like a Christmas tree we're able to hang our baubles onto the branches of the tree and this is where we want to be. We want to actually see if we can find out the branching pattern which branches are more closely related to each other so they can go back to documentary research and it'll help focus their research to see if they can define who is this common ancestor of the branching point of here? Who is this common ancestor of here? How would it appear? The other thing that we can do is we can actually use the snips to date the tree and date the branching patterns in a very crude way. This one is about 550 years ago so you can say that these people here are possibly related around about 1400. The common ancestor was about 1400. Now there will be large margins of error on either side of that estimate. But we are getting to a point where that estimate will get more accurate as time goes on and one of the big questions that I have is that if we add in the additional 400 markers that Y-full are going to give us will that give us enough granularity to really define the branching pattern of the tree? To really say to people the likelihood is based on the evidence that you, your two branches are related by somebody who is three generations higher than your brick walls in your respective trees. That would be a fascinating thing to do. These people here related maybe around about 1200 assuming that the average date of birth was about 1950. The other thing that I want to do also is to hang, I have encouraged my members to write NDKA profiles most of the known ancestor profiles and the sort of information that I want in those profiles are additional markers of possible relatedness and they include STRs and SNPs, that's fine we'll get them from the project the location, where was your ancestor born where did they die? The religion because religion gets passed down in families so two Presbyterian Gleason branches are more likely to be related than a Presbyterian and a Catholic Gleason branch. Also occupation because occupations were frequently passed through families Myocarrals were a dynasty of pawn brokers in the midlands during the 1830s. The nickname very very important for the Gleason clan because if you throw a stone and hit a Gleason it's better if you actually know which Gleason you've hit so the Gleasons have nicknames in Tipperaries, so do the Rhimes who else has family nicknames or Agnomans in their family trees? A few people? Ok, yeah I'm not sure whether I'm a Gleason Makavadi a Gleason Helper or a Gleason Rabbit The reason why I'm a Gleason Rabbit potentially is because there was one branch of Gleasons that used to kill rabbits and then they'd cycle around with the dead rabbits on their handlebars, their bicycle and they'd sell them for a shillin' ago and they'd get a lot of money for a dead rabbit So the particular branch of that Gleason clan were the Gleason Rabbits So are we close to this kind of diagram? Well, we're getting there, it's a huge collaborative effort Thank you has to go out to Lisa, Neil, John, James Alex Nigel, there is a host of people working behind the scenes to help us understand how to analyze this data Are we there yet? No We need more people to test But then, that's something we always say How many people do we need to test? I mean, what is enough? James talked about the penetration of 0.12% You know, one in a thousand of the living descendants today Do we need that many? Do we need more? You know, what is the ideal number? We don't know Will additional markers give us finer resolution? I'm sure they will Will they give us enough resolution? Well, I don't know I mean, if there are 80,000 Gleasons in the world today How much resolution do you need to actually divide them into branches efficiently and effectively and accurately? I don't know Will additional markers give finer resolution? Yes, how many do we need? Are the 500 STO markers we have Are they going to be enough? Are the 50,000 SNP markers that we have, are they going to be enough? We simply don't know the answers to these questions at this point in time but you can see that we're heading in that direction today, DNA is a pointer it points you back to your documentary research we still need to overlay that documentary data on top of the DNA data and that is problematic because some members of our projects do not supply their pedigrees or their pedigrees are incomplete some people are just starting off from their family trees and they just don't know and that's just the way it is that's the state of the nation if you like we also need to add markers of potential relatedness to the most distant known ancestor profile we perhaps need to take a one-name study approach which is going to be a huge job for common surnames and that means collecting all of the documentary data worldwide assigning it to different individuals and creating family trees from that and then you're left with a bunch of orphan records that you can work with and say okay the most likely family that this belongs to has to be by a process of elimination this family over here that's a huge amount of work and it would be great if we could automate it so there is a need for relational databases such as access I haven't a clue how to use access I don't have that level of technical ability who does I don't know who's going to make it easy for us so the average jobbing genealogist can come along and go oh I just push this button here thanks that's what I'd like to have that button and then it gives you the answer but it ain't that easy the subtitle of this talk is how I gave up drink and switched to valium so that is the state of my particular journey thank you very much for listening if there's any questions I will be very happy to address them thank you Brad could you volunteer to use the microphone we have a question from John there in the back if you move beyond the speakers you're likely to be blasted with this thank you Marisa fantastic so very inspirational I'm sure in some ways although you, James and I have some slight differences we're all in the same direction to the same goal I think it's been very useful in both those sorts of ways there's one quick publication and one quick question the publication on the difference between the wife rule and the tree that you showed here but you're actually a very consistent result between the two the reason is that Alex Wilson is working from primary tree DNA VCF files our actual cut of 10 reeds so nothing below the 10 reeds we show in those files whereas wife rule are pulling out the very marginal reeds including as you say these two reeds they are the ambiguous reeds and the two reeds are not enough to be sure the result there that's why the class is being ambiguous you're in some of the treats that actually are in some of what wife will do with the 3 to 9 reeds are they red enough to be interesting I think they're quite right to flag them up because it's good to know that they're there but how reliable clothes they are I don't know the question I have is having gone through the process of putting the snips onto your previously STRs tree you showed us earlier how you have many multiple STR reeds all over the place do you still have that repeat STR reeds evenly more than that that's something I have to look at and I would have done it if my computer hadn't crashed on Friday but that is something that I want to do is to look at the historical developments of the trees over time as more markers are discovered and are added to the to the tree these slides are actually out of date because we have they were done like 2 months ago and things have moved on and what I'm imagining is that this will be a dynamic process and I will be changing the tree with the advent of every new big Y results from a new tester to the project and this GLSEN member is likely to change everything Alex has been sent his files so has Nigel, Waiful as well and now it's just a question of waiting for their analysis but yeah this is going to be very dynamic and I think one of the follow on exercises I have to do is to look at the 37 marker tree that flux is generated the maximum parsimony tree the best fit tree and just to see how the best fit tree based on STO data changes when you actually add SNPs and which branches are completely wrong and which branches were spot on so that's an analysis that still needs to be done but a very worthwhile analysis and very informative thank you for that question I'm trying to get to one more presentation today if my time allows is there any other questions one question here we'll take this last question and then that's it is that type of information of any in comparison with the D&E oh absolutely and I think this is a marker of potential relatedness because not only do you have people with two variants of the same surname but they also have the same occupation so they were both poets as well and we know that from in ancient Ireland that occupations were passed down through families so this could be a very very important piece of information from the ancient genealogies and that's why we need people like Cathy Swift and other people who are experts on these ancient genealogies to inform us about these wonderful nuggets of information that will be very important for our surname research thank you for that question one from James just in case people were worried about what I was saying earlier on it was music to my ears to hear that you reached many similar conclusions to mine quite completely different group it was music to mine to hear you had the same conclusions as mine so I was much relieved it's difficult to express to quantify but to explain to others how we do it very different angles I just want to say that people very quickly are putting in different directions in different ways no that's a very good point because we are all kind of pioneers and James used the wonderful analogy of being Christopher Columbus and finding a new snip and having the most glorious feeling of elation because you are there on the crest of the wave of scientific discovery and everybody in this room is on the crest of the wave of scientific discovery which is what makes this hobby so exciting and why we come back here and volunteer our time and why half of us are on Valium it's because we are enthusiastic about it so I'm going to leave it there my thanks to John, my thanks to James my thanks to everybody in the audience for actually inspiring us to keep on doing what we're doing thank you very much