 Ladies and gentlemen, welcome to the second lecture of the day for genetic gene analogy Ireland, and it gives me great pleasure to introduce to you James Irvin from ISOD. Now, James has got to be talking to us about the Scots-Irish case study, the Irvin DNA Project, which is one of the most exciting DNA projects run by family tree DNA by virtue of the fact that it is probably the largest surname study in the database. So, James has been an amateur genealogist since the 1950s, brought up at Ulster amongst his maternal Scots-Irish cousins, who later found his maternal ancestors working orally, reputably descended from the Scottish Borders family of Irvin. Since retiring from a career in the shipping industry in 2000, he has written, edited, and published several books including Tracey or Orkney, Ancestors, and founded and administered the Clan Irvin Surlain DNA Project. So, James is going to take us through it today, and there are a lot of slides, but he will rush through them at the beginning to get the real juicy bits at the end, which relates to what John Cleary was talking about yesterday and how the whole scenario of genetic genealogy is changing by virtue of the big Y test and this slips an army that is causing this intense question and discussion. So, I would like to give a warm welcome to James Irvin, please. Thank you. Privilege to be here today. It's my first time back in Dublin for several years. Having come from the north, of course, Dublin was a little place in my heart and it's been lovely to be reminded of some memories of long ago. I've been very lucky as an amateur, like many other administrators, to pick up a surname project that's proved to teach me more about my surname in the last 10 years I've been administering this project than in 50 years of conventional genealogy being interested in the surname. I'm going to be a bit controversial, so I hope to leave time for questions and get kicked as to why I'm wrong. I'm not trying to say what we find in this project is the only answer and it's typical for other projects, but I hope you will be able to draw from some of the experiences and lessons that might be, the methodology might be relevant to other surname and indeed haphazardly projects. Also, I'm going to go through the slides fairly quickly. They'll be online, so for people who can't keep up with it, and I'm sorry, I'm going to rush through a lot, that'll be a fair number of you, you can read about it online when John's got it online, but I want to get through to the juicy bits, as John said, as Maurice said. A bit of overview, but as I've said, I want to look at the principles. I'm not trying to get you interested in our particular project, but the methods and experiences we've had in getting online. The surname is from the Scottish Lowlands. There's a good tradition about what happened and all the rest of it, but in fact it turns out that the diaspora is very much Irish and American and the conventional genealogy is in fact quite weak. We've got a very active clan association which has been instrumental in the size of the project, so I've been very fortunate to be able to tap into that. I'm very conscious that we've still got only just over 0.1% of all the urbans in the world. That's not untypical of other projects, but it does illustrate that we're only scraping the surface of what we're looking at. We've had terrific growth. We're now up to 19 big, wide tests, which as an administrator has been a rollercoaster of an experience, trying to understand what's going on, and we're beginning to come up with some answers that are fairly controversial and that's what I want to get on to. This is an 18th century copy of the 17th century history of the family. We were all brought up to believe that all the urbans came from one ancestor, a single origin surname, and this is the story of how it happened. Written by a historiographer, her oil of Scotland, it should be a very eminent fellow with a strong Irish connection because he sent it to his brother, who was the lead of Castle Irving in County Fermanagh, and this is gospel to all the traditional genealogists. In fact, a Bible, yeah, a real Bible. This is the sort of message that it was giving. A lot of different branches in different bits of Scotland, all going back to, if you extrapolate a bit, the time of Irving in Ayrshire. You'll see there's one Irish branch, but we're all led to believe that all the others would tap into it like that. Now, a surname project should answer some of those questions. I'm not going to go into this in detail, but we are an open project at the bottom. We take anybody. Anybody who's got the surname, and indeed anybody who has a DNA that matches the surname is very welcome to study. Once they're in, I start kicking their back sides to get the quality up. I don't say you can't come in until you do this. And it's certainly not dependent on the being a genealogy, and we've got an awful lot of baggage that doesn't add genealogically at all, but it does add statistically, and it's very relevant, I think. This is the growth. This has virtually been reactive. I've done some promotion, not a lot, but it's just by being there and being on the web, people come in and join the project, and it's been a privilege to ride this wave. Now, before we even start analyzing the DNA data, there's a lot of work you can do with these people who join. And you see we've got 390 people. 76% of them come from, or indeed over 80% come from America, and only 1% come from Ireland. And you might say, what the hell am I doing standing here today talking to you about Irish matters and only 1% of my flock come from Ireland. I'll show you why that's not relevant in a minute, but I'm hopeful I will get one Irish Irwin to sign up this afternoon. I haven't got one yet. But in fact, there are very few Irwins in Ireland, and to get a feel of how relevant this is, if you take the population and compare the two ratios, you can get what I call penetration, and you'll see that the Americas, one would expect is quite strongly represented. I've made an effort to get the Scottish representation up to a similar level, so we don't have too much bias. And we're a bit weak on Ireland, but not quite so bad. And again, I'll show that isn't too important. But anybody can do this, it takes a bit of work, but I think it is very important. You do have a feel for any bias that is in your project. If you don't understand that, you're going to be misleading yourself a little and other people. Now just from the membership of the project, this is going not from where people live, but where they thought their earliest ancestor was, and all the people who can identify a county in the 400 I've got, this is the square. Now the green bits are the bits that we were taught traditionally where we would expect fine Irwins. But if you look at the black numbers, you're seeing a very, very strong prominence about the Scottish borders, south of the border as well as north of the border, and Ulster, indeed old Ulster. Some in the south, I'll come to them in a minute. So we're looking at something that fits what we call the Scots' Arish. I wasn't looking for this. I sort of intuitively knew it, but I wasn't looking for it. So what are the Scots' Arish? Now, there's a lot on this slide, and I think my words are very carefully. It's a very complex and controversial subject. But basically, typically, we're talking about people who came from Scotland, south-west Scotland, to Ulster in the 17th century. Variety of reasons. Others came later, of course, and under Ulster, but the majority came then. They came as landowners or tenants. The landowners were called undertakers, because they undertook to have tenants, and they weren't any old tenants. They were Protestant tenants, because they had to be loyal, because, of course, the Catholics would be kicked down to the south by this. We all know about the politics. We won't go to that, but this is the manifestation of it. So the bulk of the people coming out of them were border-revers who were criminals, who were being caught by the courts once we got a bit more peace with James I and VI, and they were just coming right out of the courts. If they were lucky, they weren't executed. Sometimes whole clans were sent over to Ireland. The Grahams were sent on block across Spanish from Scotland to Ireland. And later on, they became Presbyterians. They didn't have Presbyterians at this time. Very few of them, therefore, had pedigrees to take them back to Scotland. Now, the Americans, not all of them, if they're any present, I'm sorry, I upset some, but a lot of the courts say they were then sent from Scotch, Irish. We all know, of course, that Scotch is a drink, but in America it's also used to describe coming from Ireland. They, in turn, didn't have many pedigrees to link them back to Ireland. Some did, but not many. So we're dealing with a bulk of the people having no pedigrees at all, maybe within America, but not linking them back to Ireland or very unlikely back to Scotland. I think there's only one out of the 260 I've got who can take a pedigree back from America back to Scotland. So only one out of the 260. And that's a special one. So it's the antithesis of what a lot of the more traditional surname projects are looking at. They're looking at low quality material from a genealogical point of view, but we can make a lot of it, nonetheless. Now the spelling of the name, of all surnames, people get excited about, it does mean something at times, I mean the Urawins with a U, I know come from Northumberland and Durham. And other ones in Scotland I can have a bit of a guess, but the time you get to Ireland, particularly to America, the spelling has got so corrupted, it's noise. And recognise that where you end up, things like Arnwin is actually a German surname, but it's got mixed up and there are people with Scottish ancestry that have got a German surname, but they're mixed up. And similarly as I've said, the pedigrees generally don't go back very far. We're lucky we've got some that go back way back to 1323, so we would have been born in the 1200s. But most not very good. Now all of that is a discussion of looking at DNA, but what you can get out of a DNA study. When you start going into the DNA you've got to think about quality and this is very relevant. I'm a big fan of 12 market tests, my most interesting result is a 12 market test, but you've got to be very careful how you handle it, particularly if you're dealing with statistics. And we're up to 93% now, 37 market. Now when you get your results most people are happy to take the FT DNA and work with that. I find that just totally inhibiting. I've got to put it on my own spreadsheet and start working with it. And this is the sort of data you get. It's very similar at this stage to what FT DNA give you. But you'll see we're getting clusters, not exact matches, but the second one you can see very clearly. It's very different from the first one, but the two match each other quite clearly. And then other ones, for example, and one at the bottom of Singleton, he doesn't match anybody, so I'll go into how I work further. I've got to go back to some basics. And this is where I'm going to skin through very quickly because it's slightly controversial, because I don't do it the way most people do it, but you can read it by doing the slide. So I've got three lots of things to get at. I look at genetic distance and I just get sick. I'm sorry. It's as simple as that. You're comparing things. It's like saying, what's the average size of a bit of fruit when you've got a mixture of apples and cherries? You've got to be more sophisticated than saying it's just four apples and four and two makes six, so we'll average it. They're much more complicated than that. I use tips. Now, tips have got lots of disadvantages, and I don't know how tips work because it's a commercial proprietary thing that FTD and AMO tell us, but it gives me a single parameter that takes into account all the things that worry me, and I don't care whether it's accurate or not. Relatively to the other ones, it's using the same yardstick to work from, so it suits me. It's the 12th generation one I use to the nearest hell number. I won't go into the detail. It's a specialist thing. Genetic distance, I find, is people have different yardsticks and you look at different projects, they have different rules. FTD and I use the 1247, which John Cleary got 12s of 10% rule. Now, I don't think it's as simple as that. I don't think that's how they derived it, but what it boils down to is 10%. The data and applied 10% rule, I find 7% of the crystal clear matches, FTD and AMO have said are not matches. I just say that's too simplistic. They've got a very useful, extremely useful, but a very crude compromise to look at matches of matching surnames and completely different surnames, and one criteria to judge them, and for me it's too simplistic. I think you've got to be more lax when you're happy about the surname but much more stringent when you're not happy about the surname. I'm going into matrices and more definitions. Let's go on to this one. This is a very interesting slide. The blue line is the growth of the project. You see, for 10 years it's grown pretty steadily. Don't ask me why, we've been lucky and maybe I've been diligent. The singletons, these are the leftovers that have no matches. You see, at the beginning, I couldn't match. By definition, when you've only got two people, you probably won't match them. And that grew not proportionately but for about the first three years, we were only getting one genetic family and the singletons were going up. Then the second phase for about three years, the number of singletons come down because the number of genetic families that I identified is going up. And then the third phase, which I like to think of as maturity, maybe I'm a bit naive, even though... and hardly any more singletons, even though the number coming in. So now when I get an every marginal new participant, I'm very comfortable that I'll be able to tell him by return which bit of Scotland he came from. It's as simple as that. Not 100% confident but very comfortable to say with a few small caveats, this is probably the way it is and I haven't yet had to eat my hat, which I don't like doing. The whole variety of synonyms here, very emotive, I like the one at the bottom, a surname discontinuity event because there's all sorts of moral overtones and so forth. I have three different sort of hierarchy of three. It came from the geneticists. There's NPEs in surrogacy and illegitimacies and this sort of thing. And we've widened it because when you get two of our synonyms that match but they have different surnames but they match very well, then prima facie quite possibly got an NPE. And a lot of this will be because of quite innocent things like a young man dying, the mother remarrying, and the son taking the name of the stepfather. Quite innocent. And then we get some at the bottom which are a bit more complicated because if you haven't had a father with a surname, you can expect to inherit a surname. So right at the beginning of surnames and when surnames are becoming a registry you've got this sort of grey area. So you have to be a bit careful. It's not just black and white. Now I get a lot of examples in my project and I'm lucky and it clarifies me of people with a surname of Elliot but with our DNA and I get people with our surname but their DNA. A fair number of these and I'll show you in a minute. And somebody else called them EMPs and IMPs. So when we're talking about NPEs we can actually be talking about two completely different things. So if you include or don't include NPEs in your project you've actually got to address two different issues. This is an example of an EMP. When you look at your matches this is a matches page for those that aren't familiar you see it's predominantly at Elliot's but we're picking up a few Irvings. It's an Elliot up at the top there. That is an Irving. First one there. This is an Elliot surname and he's picking up Irvings and then the odd verse is Irving's surname, Elliot. We're getting some Fairbones so we're actually getting a third NPE creeping in and you find I've even got some in the line four NPEs I can trace in one ancestral line. It happens. So everyone's got to be aware of the possibilities. Now there's a website in the very very high level authority but this is another genetic analyst saying in the borders you've just got to have antenna you're going to get NPEs. They exist and this will explain some of the reasons why they exist. Handling them handling the two types is quite different. If somebody says to you I want to join your project because I'm an NPE there's nothing embarrassing about it. They've already crossed that threshold. Obviously if you're finding that they've got our surname, they've got your surname but you've got different STA signatures, it's a bit more difficult to break it to them but I've had over 50 of these and I haven't had a single backlash of saying are you accusing me of having a legitimate ancestry. If you handle it sensibly you won't get into trouble. So I argue very strongly that they should all be included. Now this is going back to the slide we had before. Each of these ones is in the order I want to use them I've shut them around on the same spreadsheet I've got this data and there's all sorts of interesting things coming out. The number on the left is the same the next letter is where they live so you as they live in the United States as they live in Scotland the earliest ancestor the name some more details where they came from the haplogroup, the number of markers they've tested the genetic distance at 25, 37, 111 just to show you how the noisy confusing major it is and a tip score they're all nice and sequence and you can pair different numbers of markers nicely. Now starting at the bottom we've still got a singleton the next one up is two from Munster and you'll see I name them with a name that means something. I find this terribly important just to number them one to ten doesn't help. These are I for Irish M for Munster two of them and the top one is lovely he actually knew his Gaelic surname and these are the Munster people that were being talked about yesterday and these two are members of that project as well. They're Ervings they joined us first and they're obviously nothing to do with Scots Irish at all this fellow he could remember his father speaking Gaelic, he's a Catholic I mean it's just obviously nothing to do with the majority of the Scots Irish Ervings next up some Orkney, one's one of them's me and my 10th cousin was a descendant of the not a Washington Ervings but one of his brothers which for me was significant a couple of Elliot's here, the Ervings surname but they match Elliot's perfectly and then also in pink up in the in the Irvings we've got an Elliot who is clearly an Erving DNA wise so this is an example of the two types of MPU the block in red is telling us that these ones came from Bumfriesha, all of the borders genetically recovering from the Thumbland Durham and Cumberland as well now along the right we've got some interesting things we've seen the MPs, the little blue one you see he's a 37 markers he's got a genetic distance of 5 so he does not appear in FTDNA's matches page and when you've got 67 and 111 he's coming back in again but you get them at those thresholds as well and then up in purple I've got two full brothers genetic distance of 2 at 25 genetic distance is a very proved measure and the mother of those two guys said to me are you accusing me of sleeping with two different men and I said no this proves you didn't but they do have a genetic distance of 2 as full brothers then it's groomed span is exactly the same problem so I'm not the uniqueness at all genetic mutations are random and sometimes they happen all quickly we have found some 5,000s and 6,000s but they're pretty rare and this block at the top you see all those knots and 100% we've got this huge block of ones so this is some of the things I'm coaxing out of the data even though I've got relatively little genealogical data to play with now this is the most important slide that comes out of the study we've got a total of 30 genetic families a lot of them are the IMPs, the IMPs in the borders and the Scat and Ryan the African one is the interesting one this was fascinating 12 markers, that's all he's got 12 markers, he came up, I'm Hapla Groupie and I've never heard of Hapla Groupie give me a bit more detail I was very proud of the fact that our great grandfather was an emancipated slave and I was able to say well he's probably his slave master was a nerve in from the Scottish borders hallelujah he said and hallelujah I said fairly right wing middle American Scots Irish but I've actually got some black African Irvins as well so we really are heterogeneous but the most important thing is that this top group has got 66% of the project is this top group 260 of them with a common surname and this is what I think is the biggest of any surname project I'm terribly privileged to have this and I'll be deep-gaining for this in a bit more detail but if you add up the MPs on the right something like 15% and the MPs across the top this is double counting actually to add them together you can get up to 30% in a project that may be MPs in other words in this room if my figures are right about a third of us are MPs of one sort or another so it's not as rare an animal as you may think we may be exceptional, you may be squeaky clean but I suspect none of us are quite as comfortable as we would like to think we are may include me as well it's just like conventional genealogy you can't help it the Irish surname is textbook identifies one of these the bottom one, the seaman so that's how I knew it was Munster and then to hear the lecture the day before yesterday on the Irish Munster Irvins completely confirms it and I think we may have found the other one and the third one, I think maybe a subset I've got to do some work on it but maybe a subset of the Munster ones so the native Irish ones I'm able to work on as well not in the same detail but it's nice to have the backup of independent sources now remember that we had this first column of numbers we've had before the Americans were dominant when we get back to the earliest confirmed the residents the earliest confirmed answer Sir Allen was dominant but when we get through genetics back to where the ancestors came from Scotland is dominant so you can see the way we're moving from what we know now to what DNA is telling us and it's giving a completely different picture and whilst the individual percentages may not be quite right I suspect they're very typical of a lot of other Scots-American families but they started in Scotland, went through Ireland and got to America and an awful lot of our participants say I can only go back to my genealogy for two generations where did I come from and I can't say they actually went through Ireland but I can go back to where they came from in Scotland and they're just over the moon and I can say it with considerable confidence so the traditional tree of course is torn up and the X's are where we actually proved that not that it was wrong but it was extremely unlikely it was right and politically this was a big breach for like ten years ago to tell all these people who spent years doing their genealogies and worshipping this Bible of 1678 that it's probably very different but I've now got this picture and all these branches coming down here B is borders and these suffixes I'll explain mean something else I've been talking about SDRs now I'm going to move on to SNCCS and if you don't know the difference I'm sorry we're going to move on, you can read about it this is a lovely slide anybody can make this for the surname project the phylogenetic trees you can get them from FTDNA, you can get them from ISOD but of course there's all sorts of noise I've taken out of them the ones that are relevant to my project and if there's a colour imitation it's either a prediction by FTDNA in red or it's an actual test in green so each of my 30 genetic families I can fit on this tree so yes we are all related all the ovens are related but you've got to go back nearly to Adam to get that common bit at the top there's the affidavit right out on the top left the monster people are right down close to us quite close, the C446 is quite close to us when I say us, the big group I personally am over here in the O1s this is where I personally come in so I don't actually belong to it but as administrator I say we so they're all on one tree and this pulls it together as somebody said in one of the earlier lectures a tree, we're all familiar with a tree anybody can do this you can see I put some dates in on the left hand side I'm not interested in deep ancestry but this does help to put it into perspective what we're talking about and it keeps needing updating I thought I'd got to be ready for the lecture and I found a four or five thing for if you've never taken my events so you keep having to update it and the tests are done and it's inverting from the predictions in red to green and if you find it was another family you've got it fitted on SF was a singleton now, before Big Y came along I thought I've got this huge family quite a responsibility, quite an opportunity so I invited some gurus to help me with network diagrams and this is the best we came up with now you can make these prove anything if you want to but how you weight them you can do it with no weighting at all and you get one picture but then you say if some of these markers mutated different rates you ought to give them different weightings and you can make that paint any picture you want but this is the best we came up with and you can see from the colours that it does represent roughly some groupings and from that I was able to go on and divide my 30 of them into 10 groups or 11 groups and give them each a label the first one is the mode so I've got these big bulk of all the same that's the mode most common for the whole lot 34 of them and then the Dys that was the predominant one and then the other end I've got the unassigned there's quite a bit left over at the end about a third of them or maybe it's only a quarter now that were left over and we couldn't map a lot of them had Aris ancestors some of them I can say which bit of Dantresia they came from some of them are very characteristic of an NPE the elliates all seem to cluster within our family we've got part of the board of family but all the elliates are in one tight group so this NPE would have occurred a long time ago and some of them I just named by the the Dys number and then this network diagram will give you a time for the most recent cluster I didn't want to be very comfortable with it and you can look at the earliest genealogy I'll come back to that but this is where we were before big Y huge big tree how does it fit together you would expect there to be some hierarchical element in this and we can see which is the sun that's probably the others and L5-5 was our private SNP which is identified in 2012 we were lucky to have a private SNP and after three years nobody else has but no other surnames playing membership of this of this SNP they will do one day but given the rate of growth after three years I'm fairly comfortable that like some other surnames we've effectively got most probably a private SNP now big tree comes along and we're very fortunate in the L21 we've got added Williamson which actually you'll have heard about it this was updated this week I managed to squeeze it out of him and you can see my 12 big Ys all here in the first two blocks all of those are Irvins, there's one NPE but the states are not worried about that and this is what Alex Williamson has come up with now frankly that's as much as you're going to get out of big Y probably I thought and the number of hours I put in it's just uncountable but I've enjoyed doing it but Alex Williamson said he was going to pack up he was going to stop doing this thank god he did because if he had packed up I'd said to myself I've got to be able to do it myself so I've learned how to make an Alex Williamson tree I'll come to that in a minute to show you how but I now can stand on my feet and tell Alex Williamson in fact I gave him the prototype he came back completely reworked it he differed on quite a number of very minor details but I've produced it independently of him and he's come back and confirmed my work though in fact of course it's the other way around and I've confirmed his work in advance so I'm very comfortable that independently we're coming to a similar conclusion now when I process what he's done into my format you'll see we've got the bit I've been talking about the L555 we've got a lot of surnames over on the right that were on the right of that one I've just shown you which come up higher above our private L555 and one would expect to be getting surnames coming up lower down but they haven't come yet and to the left we've got five urban big wide test results but they're not from the borders we've got there on the fourth one in the urban monster one so that's going to be very interesting for the monster group but I frankly haven't got time to work on that I'm working on this big lot so we've got the answers coming out the ones at L555 there's a big block there we've got what I call an intermediate and then we've got the private SNPs of all these 12 but we've now got tests now what we're going to do initially after all we've got to do I've got to put it together Alex Williams can analyse it I can sit back Alex Williams and says I'm going to pack up Christ I've got to do it myself so I focus on the big family we haven't got tests so I've suggested that somebody did some tests and at $600 $500 a time six off that's quite a recommendation to make with responsibility for rising work it all through and I've now found interesting results but of course like everything in life you get more information but you find you need more to get back to where you thought you'd be so I'm now this week our Hispanic Greenspan for a private surname pack we've got all the people of this genetic family who haven't taken a big white test instead of spending $500 instead of $100 and we'll take a lot further down the line and I'll show you why in a minute I'm not interested in all this noise about how you name a variant an SNP who whether it's named by this company or that company whether it's Y or whatever I'm not interested in other people's private genetic trees and adding on to them I want to understand what we're doing then I will start telling the world what we're doing publicly as it were but I don't want people drawing my trees on their trees to be able to find out only a week later but now that we've got a bit more information we've got to change it although it's not that I don't like participating but I just don't want to do it yet and this keeps me out of a lot of the noise that goes on about the arguments about why this isn't happening and why somebody's done something and so forth if you do it yourself as I've done you've got to look at it a bit analytically you do a big Y test FTDNA giving you bound data which is a huge big file somebody will tell me how long it is but it takes several hours to download if you want to do it in raw data you can get it saved and all the rest of it and then the different tools they give you FTDNA give you a VCF file and a CSV file the VCF file is probably good the CSV file is pretty long and then you give you a matches file it's an even cruder simplification which actually I also don't understand but if you don't want to spend a lot of effort and you've got something in one page you can digest and you can go out and have an analysis done there you see a white fold you clarify you pay $50 for those all of those are algorithms you plug the data in somebody's given it some rules and I've come some fairly simple answers or you can do it yourself and you find you get different answers not radically different but frankly I would not spend a penny of those files for quality I need those that work that's been done to identify which ones to look at but they're not comprehensive and when it comes to quality I prefer my own rules not with the other rules are wrong but you've got to adapt them it's like looking at a money picture and saying to a computer describe that money picture in 12 digits I don't know how many perples there are we're asking the computer to do things but we don't understand it can do it pretty well but not as well as the human eye I would argue so I've worked out a little process and what we're trying to do is to find out the variance identify the variance that are high up the tree because they're all cutted up with the ones you're interested in the ones at the L55 block intermediate ones and private ones and then I've got to look at the probability the quality the number of reads and the consistency I'll go into this in a bit of detail and then I update the tree so I list in black all the variance a variance is a snip or something that isn't a snip that they call an indel we won't worry about indels because we're totally not to worry about them and to use the down viewer which is the tool that you can use as an amateur you've got to work with numbers so the ones that have already been the variants that have already been named CTS or FGC or L21 the original position numbers to be able to use a tool so the fact that it's been called L55 or L21 or whatever it is actually you've got to get rid of that to use this tool and then you put in the two letters characterizing the the two base ones and then you go into this table and for each one I pull out three digits the number of reads the number of indels and the consistency I'll explain that the consistency across the line so you can see this example has got A's and G's and this is the intermediate one if it's L555 you expect to be all the same letter it would be wherever it is it would be T's all the way across or if they're private then they'd be all different ones just unique to that one I'll show it so that's what you're starting trying to do and then you look into the down viewer it's online I can set this up in 10 minutes most of that as well the computer's thinking it's handing in a lot of data I can almost do it from memory it's quite simple when you look at it in that vertical column immediately it's saying there's something funny about the second third and the three from the bottom it's different that's what we're looking for and in the particular one it's clicked on there's an arrow that doesn't come out in the slide but that particular green box the second base letter was an A it was kind of 32 and its quality was 94%, 94% so those are the digits I'm looking for and I do that for everyone in that line and then I do that that's just for one variant and I've got to do that for all the variants I'm interested in so I end up with this huge amount of data but there's some patterns coming out you'll see the intermediate one it's about the third block down it's got a square under there's an A, then two G's and then three A's that's the intermediate one the ones above it are blocks right the way across the ones below we're getting boxes for each of the particular testes they're the private ones for each of them now the interesting bit is where you draw that bottom line because if we look at the I'm going to have to use the pointer here we are down here I use a capital letter of its good quality and a small letter but it isn't now that's a small letter but you see it's only 70% consistency so that's below my 85% threshold it's about this threshold 85% is about what other people use so it doesn't qualify quality wise otherwise right the way across they're all C's and it would be different so it would be in that box but in this box here there's a little A that I have included which is inconsistent but you'll see it's 84% it's only one below my 85% but I've included it because to me if the test had just been a bit more rigorous and this is how they handle the thing in the lab it's not a subjective thing it would have easily easily probably would have gone up to 85% or more and you've got to look at each of them as to whether you include them up here and there's a potential intermediate there it's pretty low grade and of course it's contradictory to the other intermediate which would mean you couldn't draw trees it's all mixed up there's a lot to digest here there's no rules but to me I was getting comfortable that I can predict something and when Alex Williamson comes up with the words of the same answer I'm obviously doing roughly the right thing that's just 20 variants for 6 participants variants for 12 participants you're getting something like this that's getting even more noise but the same principle is you've got the big blocks the top bit that's the L555 block there and these ones that are all about the low quality the green lines are where I used FGC and WIFO I didn't use them but the participants wanted to check my work or before I could do it and they were not coming up with what I call they were coming up with interesting and useful and fairly reliable answers certainly as reliable as mine but they weren't coming up with what I would call comprehensive answers so contrary to what John says I don't say recommend that you use them I would then say John and I agree a bit so if you do use them you use both of them if you're going to spend $50 I'd spend $100 but I would still recommend you do it yourself and if you've got the patience skills to be able to do what I've done much more and it's indicative once you get the hang of it and you've got data it's indicative once I start off I can't put it down it's just so exciting I have found SNPs myself by chance just on that screen no computers found it I've found it I've done that myself nobody saw me having to do it and it's in there and now it's moving so so well done nobody else found it the trail is fantastic just by chance not by skill by chance and so that's the bottom half and here we're picking out the four of them these blocks if you look at them the colouring that's red this first one rub it in that's red green green green all the way across or yellow yellow yellow all the way across so that's why it's in the box and that's the same all the way up green green green all the way across nearly there at the end so I get something very similar to what I just wanted to came out with a bit of difference I've got some question mark ones but I'm not sure about it so I give them a sort of 50% probability but you'll see I'm getting some similar answers the L5.5 block at the top there's one intermediate and beginning to get a feel that I'm getting an indication of what I should expect for different subgroups in this and you see there's a bit of a slot there in the line they've got different ages now this age business is very interesting CRMCA very simple to calculate as I said last night the answer is in years the number of snits in a vertical line for whatever you're measuring time the average number of years for a snit but you've got to remember you're dealing with probabilities so if you're dealing with a number and you average them you're going to get a much more reliable answer if you're dealing with one it's going to be unreliable and how unreliable would you believe the difficulty is the average number of years for a snit is 118, 120 that difference doesn't matter but you can get 60 in the 130s and there's some work on that as I'm identifying what is a snit is it 84% is it 85% it makes a bit of a difference but it's something that needs investigating so let's try here we have what I've got the first thing to note that I could have done earlier we had what I call a starburst just below L21 a huge spread of new snits and then after a couple of generations I've got a block of 20 snits all in a vertical line we don't know what the sequence within that is and if we're talking about 120 years or 1000 years of no forkation in the survivors of the people that have tested of course some of that is going to be tested by a new number of people but you'll see if you look at how these blocks are clear time of the time and then underneath that I get another starburst huge lack of a spread unlike John who's getting a lot of hierarchy I'm getting that entire my entire genetic family which I find strange I'm not comfortable with it at all the what I do is I have the number of probables and half the number of possibles and the left one gives me 11 snits 11 times 120 will come to 2400 whatever it is I'll do each one individually in a minute but I add more luck dividing by 12 that gives me an average of 6.3 11 8.5 an average of 6.3 and I get about 750 years take that away from 1950 which is the average age of birth of my participants and so we get the age of my L55 block as about 1200 I guess it would have been 1300 so it's not too bad and if we do the L21 we're getting about EC2000 as I mentioned last night what seems exactly that but a gut feel I seem to be on the right track so that's very comforting when we come to the individual one start at the bottom we're getting a huge amount of noise because 11 snits of course aces back to 80, 600 and one probable snit aces back to about 1700 so that just doesn't make sense at all but that's because I'm sure we're looking at only one individual one when I get the SMP packs I get 5 or 6 under each of those I think it's smooth so I might get a better picture in small print I've got the old ones that were done by 2011 from the network analysis my earliest genealogies you see the stream from the end of all coming out the 1700 old but I think that's cool right, some further reading if you want to look at it some findings about the project I wouldn't bore you with that a minute to go find things that might be relevant to other surnames small projects can learn a lot from large projects I couldn't have done this if I wasn't lucky to be in a large project but other people can learn from my experience spending, there's some sleeting penetration ratios easy to do, gives you a bit of a check matches being careful, I'm not saying to use them I'm not saying they should make mistakes but just bear in mind they're not quite as gospel as they might have been tip scores I think are a great tool and we certainly do consider them they're not as embarrassing as you think big wide, terribly cumbersome to handle but Bandai they give you some very exciting results I hope my enthusiasm has infected you inspired you Starburst and bottlenecks are they population explosions are they something genetic we don't know and we need to snip back up so I'm asking for the first surname snip back up if I get it I'll be lovely if I don't I've got to go to YSEC there's a little sweat even though it's cheaper and at the bottom of the tongue in cheek you read that bottom block what I'm saying is you don't have to be an expert in genetics if you're fairly good at other things you can get an awful long way and you've got to be lucky if I had to I should have added luck but that doesn't mean you can't be lucky or you can learn from my luck thank you very much time for a few questions do you have a gold class skating question for James? okay we've got two let's go for ours first so maybe you said this tonight estimating the origin it's talked about 120 years per common snip or 118 where is that how do you calculate that? I'm not makeshift I've inherited that as I said and I think using other people's work you've got to use some and I take that as a gift but it's right enough doesn't worry I'm much more interested in giving an answer I'm comfortable with where I got it from is a couple of emails on chat shows and this sort of thing so on period we'll tell you which papers to refer to to find the origin origins but it's not blogging these are those stuff done in 120 years then it's right my full and FTDNA use those numbers so they're authority they wouldn't say they're right that's just the work and from what I've heard from Nigel McCarthy the way that he has calculated the time is that we know that L21 is about 4000 years ago about 2006 how do we know that? I don't know but then you count how many papers were there between 4,000 years ago and your testee, let's say that we're 21 then you divide that distance 4,000 years by 21 and that provides us with the average rate of 100% of the quality of whatever it is that's the crude thing John has something to add about that no, we have not not yet, we're saving you're saving the good bits does that more or less answer your question Mark? yeah it's not an exact science but we're getting more exact as time goes on thank you for your presentation I was wondering what is the exact status of Alex's tree Alex Williams' tree and if he drops it are you going to pick up really? I'll pick it up for the Irvins absolutely because I feel I can but I'm run off my feet just doing what I'm doing how he does it, I've got an idea he works from CSVs and he's got an algorithm that he interprets quite liberally but as I understand it he's not thinking of giving up now thank god he's scared that he was but I think he's going on but it's a huge work he says I'm away, I can't do it this week but I do it six at a time and I scribble it in so I give him six times as much work so I go to the bottom of the queue but he's each human he's unpaid amateur like us all and it seems to want to go on with thank god so James very great presentation I've got a small project and no big Y member yet so you think there's a certain number of testees that should be in a project to do the big Y test we've pretty much figured out the term of SNP I think from just the existing SNP but what are the pros and cons of going big Y for a small project I'm not going to stand here and make any recommendations at all it's big money I'm very happy to talk to you afterwards about your particular time but when you're talking about $500 a time that's an awful lot of money and if you don't feel comfortable and you can handle the results I think you will be awfully careful about recommending of those 19 SNPs I only recommended 6 and that was after I'd worked out I could do it but if they want to do it you recommend they don't do it it's a slightly different question that's their choice of course if they want to spend the money then you've got to say well if you do do it I might be able to help or I can't help when you do this and so forth I feel there's a big responsibility to give recommendations I suppose if the price came down I suppose if it came down I think we'd be in a much better position to recommend that everybody gets tested in your ideal world would you like to test everybody in your project using the Big Y? If I dreamt and went to heaven and was aged 40 instead of mid 70, absolutely but the thought of doing 230 Big Ys the way I've done these thank you it is a huge amount of work and we get that from the slides and certainly with my experience with the Big Y it is it raises lots of questions it's like grey hair it will answer one question and then seven questions will spread in its place so it's a little bit of a never-ending story which is not too surprising in many ways but where do you think we're going to be in five years time? I brought you said five weeks and five years but I'm not too sure about that I'm going to Houston and I'm going to give this talk in Houston now that is of course the Mecca and I'm going to be talking to the people behind here and they're the way up there they've been having generations and I'm going to be equally provocative and if you don't shoot me down they certainly will but I do feel that this is in science and I feel this has been proven to be a part of that of actually discovering two SMBs oh thank you James thank you for your appreciation this is every step of the way thank you very happy to take questions out of the bag but let's let the next get on James, I'll take the microphone and I'll give it up for mine I can't send it up within my pocket there it is that's it