 Okay folks, it gives me great pleasure to introduce our next speaker, John Cleary teaches languages at a university in Scotland and has created a tour in colleges and universities in Germany, Japan, Malaysia and the UK. He's been involved in educational development projects and teaching modern European languages which have led him to travel widely in Eastern Europe and Central Asia and in a previous life he also worked on museum and wrote a history of the people who had built and inhabited medieval arms houses. So, four years ago, since Family Tree DNA introduced their new Y chromosome sequencing test, the big Y, this talk will review how this popular test has transformed surname projects in this time and how the SNIP tsunami has upended and transformed the shape and size of the Y chromosome haplotree. So please give a wing-borne welcome to John Cleary. And everybody, I'd like to thank you all very much indeed for coming out on this foul day. I just walked across my hotel and it was appalling. I wasn't coming to speak, I wouldn't come here. So thank you very much indeed for your diligence in turning out to hear us and our other speakers today. So I'm going to talk about Y chromosome research and I'd like to say at this point that this talk is a follow-up to a talk called I gave for beginners using SNIP testing in microbes of research. So I gave it to Birmingham in April and Norris has made these talks available on the YouTube platform he maintains. So today's talk is not beginner's talk. I just need to warn you at that stage. I'm looking at a slightly higher level at what you can do with big Y results once you have them. So I'm not going to look specifically at projects of surname today. We shall look at some tools which are available to use to make sense of the data you get from the big Y and I'll then end by reviewing the recent changes in family tree DNA's presentation of big Y results. So can I ask you first of all how many of you have taken a Y test and only half of the audience can possibly have done so? But ladies, maybe some of you have arranged Y tests for your male relatives and husbands and so on. How many of you have used the big Y test? Okay, so quite a few of you have moved on to SNIP testing and one of the things I would argue and I think others around would agree with me is that we've gone as far as we can with STR testing and that if we now need to use SNIP testing like the big Y it would have been meaningful and sensible results in this. So just to go through the initial slides. I'll stand close to the machine then. Just then to the context of what I'm going to talk about. The big Y for those who don't know yet is a next generation sequencing test. That is it aims to sequence targeted regions of the Y chromosome. Not the whole Y chromosome. You can't read at all. But it aims to target those regions which can be read and can generate good quality results. Rather than simply aiming to see what up and down changes that may occur on very limited regions of the Y chromosome, it tries to read long sequences and identify particular mutations on those sequences. It's a discovery test. The aim of this test is to find new SNPs that were not known before. So I'll not be talking today about other forms of SNIP testing such as single SNIP tests. Some of you may take these or SNIP pack tests which again some of you may be talking about. James Irving will be talking about these tomorrow if you can come here and talk. So just to remind you we're famous you know three types of DNA testing using genealogy which are on the white chromosome, the mitochondrial DNA and the Y chromosome chromosomes. And the Y DNA allows you to follow the paternal line. The mitochondrial DNA allows you to follow a matter of a line whereas the autosomal DNA allows you to follow any line you like but only as far back as the autosomal DNA can reach. Which generally thought to be around about six or so generations. And the advantage of the Y chromosome is that it does allow you to go back further. So even though it only tracks one line, far as far as far as far as farther, you can go back a lot further. So if you're interested in busting brick walls or working out genealogy of your 17th century or medieval ancestors then autosomal DNA will not do anything for you but Y chromosome and mitochondrial DNA can assist. And just to remind you I'm sure you're all familiar with these but we throw these abbreviations at people, STR, SNP and assume you know what they mean. So there are two primary forms of mutation which genealogists track on the white chromosome. The STRs are what white chromosome research used to be from late 1990s until probably around a few years ago and they're still there but I think they need to be used in tandem with SNP testing. And so the STR are a short tandem repeat. It's essentially repeating chunk of DNA that repeat a certain number of times. And what we used to do was count the numbers of those repeats. And click up and down, up and down, up and down and as a result to use them is quite a specialized and rather problematic thing. On the other hand they're usually the well-known STR sonar project charts that some of you are familiar with. Then at the SNP you do have a single nucleotide polymorphism but that's what we say SNP. But the emphasis here is on the single. It's a single point of mutation on one particular base or position in the white chromosome. Here in the diagram you can see the T in the top row has changed to an A and a two down below. So one of these is a SNP number which one either is T becoming an A or an A becoming a T. But that's what SNPs are. We look for these mutations on the chromosome in the belief that once they mutate they stay there. So if one medieval man has a mutation like this then all of his descendants in theory should have that same mutation. And that way you can begin to find people who are related to each other because of these rare mutations which they carry in their family branch line. So that's the SNP number. And just in case you're confused by all these GCZs and T's these are the bases which are the molecules that form the great DNA molecule stuff. So we use this as a shorthand and again in this business we talk a lot about arrays becoming T's and so on. Again don't worry about the chemistry underneath. You can read about it if you like but you don't need to know. You just need to recognize the characteristics of these four symbols CG, A into the, and how they mutate all into the other. So I gave people a history of the big Y because it's about four years now since the big Y was first announced. Finally three DNAs come from Houston and they made the announcement they're going to move into what they call a full Y sequence. It's never really been a full Y but it doesn't really matter. It's extended Y sequencing. And the first orders were placed during the autumn following. And I think it was around about April the next year after a short delay when the first results began to be released. And this then became known as the SNP tsunami. It soon became clear that these texts were discovering a lot of SNPs. Far more I think than they were expecting when the first orders were placed. And according to Foundry Tree DNA they now claim to have around about 20,000 big Y results in their database. That's a ballpark figure of which about 18,500 they say are customer tests. Another 1,500 are academic studies which use the big Y. Some of these have been published, some not, but they exist in the Foundry Tree DNA database and can be used as SNP identification. I was going to find what data I could on other analysts and companies who make use of big Y data. Somebody may be familiar with the third party analyst company, Wifle, who will take raw data from big Y and analyze it for you to give you a nice user friendly readable interface to look at the results. And Wifle have about 11,000 or so big Y results in their database. So that's a little bit more than half of the customers have taken big Y, acquired their raw data and set it on to Wifle for further analysis. I was interested to what kind of chains have they been across the four years since big Y was first announced. Foundry Tree DNA won't be able to tell me, I don't know if they have the data, but they weren't able to let me know that. But again, somebody may know Alex Williamson's big tree page. And Alex has a breakdown of the number of bound files that raw results sent to him across the past four years. And the breakdown here, the ones in blue are the L21, the large RL21 haplogroup, which many of you are probably the members of, I am, and it's the most common haplogroup, white haplogroup in Ireland. And as you see in the early days, there was a very heavy demand from L21 members that were very well organized by their project and administrators. And that's remained more or less statically across the four years. There's been a big increase in the number of other people from related haplogroups or not, L21, but other so-called P312 haplogroups which are, I feel like, siblings to that big L21 plate. And these are growing rapidly as Alex Williamson extends what he does to other haplogroups. And the 2017 figures are under for the year to date, I think, in September. So there's still a bit of 2017 left. There's a sale conducted in the summer, and the results are still coming through for that. And there will be a sale at the end of the year, and some of those results will be out by Christmas. So I'd expect 2017 to show some rapid increase. So, in general, there's been fairly steady traffic, I think, with a big wide across the years, but a gradual and accelerating increase in people taking tests out. So as we just have said, some popular tests. I'm assuming this is some kind of proxy for the way in which big wide generally has been taken out. Now, the other big R-haplogroup U106 has this year joined Alex Williamson in creating their own version of the big tree, and they have just about 1,000 results displayed on that already. I think they have more than they've ever been put onto the tree. So clearly there's an appetite to take the test, and there's an appetite for having the data analysed by others who can build this into useful information trees. That's one of the themes I'll talk about. So we'll be going to talk about the stick to me, and a lot of the talk before the big wide start was in terms of terminal snips. People want to know what is my terminal snip. In general, this would mean something that would probably occur in access to who lived in the Olympic, certainly usually before the AD era begins. Whether it was not something that would help you with genealogy, but something that would give you some sense of how rooted you are in what was called deep ancestry. And I think when the big wide first began, I think many people assumed that's what sniff testing would carry on doing. Just on a few more of these, and branched a bit more finely, and certainly produced trees in which at each branching point there'd be no snip, sitting by itself who has a snip for that branch. That wasn't how it turned out. Super came clear the first results of an enormous tidal wave of snips about to hit the project administrators who'd been collecting this data and manually processing it into something meaningful. And this will turn the stick to now me was coined by somebody not sure who, but it spread in a very widespread time and very appropriately. We've got a few examples here of how it works. These trees that are designed by Mike Walsh and many of you may have come across to a very active administrator in R1B. A couple of groups in many of the projects under R1B. And he, for years he'd tried to design trees to show how the key snips in R1B relate to each other. This is Mike's tree from September 2013 so just before the announcement of the the big wide. And in a few months, this essentially is all of R1B back to M269, the most common branch of R1B. M269 will be some like 5,000 or more years old. So this is covering a huge time and these ones down the bottom, they're probably not getting very near to us in history either. Then within a few months, Mike has changed with the design of the tree to try and get more snips on. So this one, the first results coming through from the big wide in May 2014 and he's squeezed more in. He's weird as out of the tree like this. He's now just an L21 by the way. So he's come down the half of a tree which is just one of these. He's still coming with a nice clear design showing what the big branches of L21 are and what the main sub-branches are. But then he's having to pack more and more snips on. So this is now the last one I saved actually and this is from sometime in 2015 and he seems to have packed the space in. I think he's started dating it. I think he just can't fit any more on. I think that he probably puts new top-line snips in there if they've discovered it. But there's not space for all the ones that descend on the low frequency. That of course has had their effect on which to kind of fragmentation because increasingly people are becoming interested in their little tree. So the whole deep ancestry lets understand the whole of L21. It's been replaced by let's understand the history of A whatever that is, 4, 3, whatever. And so it was focusing in on the path of the tree where you maybe were located or where your project members were located and it's near in time but it's a much more fragmentary part of the overall tree. It's the only way you can get things on to the tree diagram as I I've been working on a sub plate in R18. That one used to be very similar across 2015 and 16. The time I reached 2017 I tended to make my my window so enormous and still squeeze things in. You have to realize this actually took a few times to find some other way to represent this data. There's just so much. It's an even more recent version. Some of you may also be familiar with another version of the tree which has been enabled by by Y-Test. Essentially, if my boss is producing the top level trees what Alex Williamson is doing is producing the trees that take us down to the individual testers. You can see here some photo here called Cleary. That's me. That's some other Cleary I found somewhere and I'm going to go and test it. Now, we see a very long chain there at the top. We've got something called FGC 5494. Again, in this business you have to get used to these alphabet spaghetti strings with letters and numbers. But they all represent particular snips that have been discovered. So on the top there is one of the major branch snips under others 21. So that takes you back to my watch's tree. And then of course these then are all these snips discovered in the various big Y-Tests. So if you start from me and go upwards to FGC 5494 then just drop down one level. All of those were found in me. I was the first tester to have all of those snips found. And they're all private snips and we've got my big Y-Tests back in late 2014. I had 40 private snips and I thought, ooh, that's very nice. They're doing this. But of course things have to wait for the testers. Find more testers and wait for others. So again, developing some structures. So I think the top, see Y-15901, it's really comprehensible. But that was one of my private snips. And eventually other people, these brothels who emerged out of the blue gawd my new parents and turned out to be much further away from even we thought we had been on the basis of STRs. And those begin to emerge some sort of macro, hey, we don't have been Norwegian. So most of these they're Irish. These are Scott's Irish, probably. They're probably also Scottish. And this is from Northern Norwegian. So this subway, obviously it's got some kind of spread. We don't know why or how that happened. But again, there are questions here to ask about directions of travel. Where did the snips originate? Probably in Ireland. But how did they get to go away? And so on. But essentially this is what the art of snip testing is all about. Finding those snips, finding who else shares them with you. And then building structure trees in which the more people share the higher that must be you must be older. And the fewer people share the snip the more recent it must be. To come down to this block here, which are all shared by just the sort of two clearance. And they may be clearly snips, but probably not because there's a lot of them here. That probably goes back a long way back to the early Middle Ages. So I think there may be other people that come along sharing some of these as well. So what we see here then is not that nice little she's a snip or the branch she's not the snip or the branch but blocks. So big blocks and snips that all seem to be often playing space on the block. You take this one here, it's a big one with Y90 a 9 on the top and why is that there on top? Well only the cars that happen to be the one with the lowest position number on the Y chromosome map. So therefore it's close to the oldest. We do know now that these are less old than those and that those are less old than the ones on the top. So by finding new tests we can split the blocks and begin to build the structure into a tree. A block splitting is large of what the game is about at the moment. Finding people who will test who might split the block and maybe close their STRs suggest they're a bit like you but not that close to be a close cousin and therefore James Irving I think tomorrow I'll tell you about step-axe so I'm not going to go into the day as a tool for working on this kind of thing. So this is one of the go here itself so then this gives some challenges and some goals so it creates a goal of taking these blocks and trying to split them into a more structured tree and ultimately also to hang one's genealogy we're just beginning to do that here with these two theories we know we're related we still don't know how we're related so the common ancestor is a bit before the loss of genealogy records in Ireland but they said possible to work out what the science relationship is but we're more at the stage we can start hanging genealogy off the bottom of the tree that's what it is so now let's talk about some things that soon became very clear was just because something was called as a variance for a snip or even had a name didn't necessarily mean it really was useful for this kind of tree building exercise and it soon became clear that many so-called name snip named snips from earlier research before being white were just not that useful when it came to from the validate trees so the goal then becomes to find those snips which are useful for a big work now biogenetically this just means in tree building and building out that order tree and some snips they come and they go or they seem to spread all over the place in an inconsistent way whether that's a fault of the test or something in the nature of those snips but of course what we're trying to do is identify those which are useful and therefore when you get the results you have to start asking questions about the the varying calls the new snips the result says that you have you can ask questions such as is this new snip actually useful for a tree building process and can it also be tested reliably on another platform so a big white self with your test whether it's some degree of fault-negative or fault-positive so once you find a new snip that could be useful or important for building a branch you want to know if that can be tested across so I'm going to kill the clips there I'm going to look at some of the tools that can use in answering these questions so first of all if you want to find out how reliable a new a name snip you have is you can use the white full tree to play the tip to talk through the course well well maybe if we defeated the ability of the projector and see the the C is playing or something like that so basically two different types of information that could be directed to the projector that could be a product that would be sort of on the record there's no audio it's just in the channel or in the area that you're using to play do you want to get it yes it is it's on the computer it's on the computer it's on the computer it's on the computer you can use the computer to play the game thank you Marcia let's see if this one will work it will be interesting no I think ok what I was going to do since we can't get into the something that's been I think it's probably here I think what I was trying to do since we can't get into the internet I was trying to go through a process by which you can validate snips by showing screenshots when I called it earlier and forward to it but it's still beautiful still lose the moon here but everything I recorded isn't coming up so this time I'm sure let's jump over to what I was showing was two things first of all if you are familiar with WIFI you can use the tree to search any snip name so search blocks and WIFI tree type on the snip name and search and it will come up at which hepler groups and you can find that in and you can see straight away if there's one it's perfect you have a unique snip there's probably going to be a good branch marker if you have multiple hits and many snips will come up like this then you will have something that's not very useful for building out a tree and in between you may have something like three, four times get into different hepler groups snips will repeat there are only 57 million positions and there are three billion men in the world so clearly some snips will be shared and will occur multiple times how many times is a question we're not quite sure about yet but the WIFI tree gives you a means for testing whether anything can be whether the snip is actually valid or not and we'll do then skip over this part and go back to blah, blah, blah and what does all mean then what has the snip to normally do for research well I think it's challenge that we always knew that STRS were probabilistic and I think it's actually challenged some of the models of trees that were built on the basis of STRS and in particular with STRS we can't see back mutations so we know that you're up and down we can't see how many times an STR may go up and down so therefore we're probably not getting enough data to go back over several hundred years and we want to go back to the Middle Ages they're not going to help you build a very reliable tree and SNPs are far more reliable but also what SNPs will do is lead us towards building trees as a goal of research rather than building matching tables and working up a probability of who you may be matched to in what that means building trees like the Alex Woodson tree was now worse about than they do wide research and talking trees I'm going to talk now the chains that Family Tree DNA has done to make their own services a bit more useful so first of all the best thing Family Tree DNA has done last year is they have updated and upgraded their hackletree and then a point is somebody who oversees it managers who control receive submissions of new SNPs from project administrators vets them to make sure they're good and then builds them onto the tree and a little clip and a hackletree and we see here lots of SNPs and here's a big block but all these SNPs are equipment because no one's split that block yet and there are many others who are using this data to build what they call mutation history I used to say very briefly Morris does speak about this yesterday I think and Dave Vance will be speaking a bit later today about his approach to building mutation history trees so I do come here and James Urban will speak tomorrow about his own approach to white problems and research and how to use SNPs and SNP tests in a surname project I'm going to look at the family tree DNA treatment if you test the big Y then you'll get this little button on your STDMA site which is your time or SNP according to their current calculation of it if you click on that you'll take you to the place in the hackletree where that SNP can be found it's here there's one more underneath the pound in this test that you've got but all the ones you agree above it will have so we now see that structured order tree and what a different tree to the others willing to see one with the vertical lines this one is actually a horizontal tree designed to find a location what it can do it's kind of a bit tricky to read that so what you can do is jump across the highest level go upwards eventually you'll find the ancient SNPs in this case you'll go back eventually to M459 and R18 and the entity may be very very proactive in building the tree this is good because it means that new testers will then get total SNPs which are further and further down the tree but it does need project administrators to be proactive in sending the data what kind of data can they send they're not sending positive evidence of who has the SNP hopefully arranged with the branch plus negative evidence of who does not have the SNP in neighboring branches so you can see exactly whether or not it's part of the SNP part and the graphic of the show how this works here we have a send-in for a mythical ancestor called John Doe and all these people have DNA tested so number one is number 11 and some of them will have the SNP amount like 8 which looks quite nice so we're going to branch here maybe it's a bit appeared about here we don't have enough data yet because we need to know the negatives so with these two negatives in the next nearest branches we now know we can limit the position of a two branch here on the other hand if you don't branch those negatives you can't make that firm judgement so here again we have four people with C's and it looks like we've got a nice consistent branch round about here maybe that's where C appears in this tree but the two on top C means no pool no result about that position and therefore we can't be sure C belongs here there's a couple along here and a couple along here so what do you have to do if you want to declare the SNP you get it on a tree it's get those negative tests either through a big one but maybe big one doesn't bring that position or alternatively through doing single SNP Sanger testing insulated people who fit down these branches so you can zero and then when exactly they're a student here so we're going to move on the final stage now just checking the time to talk about the the new tools recently released by family tree DNA full of big Y the half of the tree has been enhanced over the past year or so and I think that is the best thing family tree DNA have done full of big Y the new tools are interesting in step in the right direction a year ago there was a lot of talk about the new big Y it was taken 12 months to appear and I speculated what these changes may be would they know the dynamic generate a tree dynamically from test results no, they're not doing that they're moving that direction but at the moment they're still if they're manual checking in some machines other people like Wifel do have a dynamic tree but they have a small database so family tree DNA have all the big Y tests so ultimately any tree can be verified by the data they hold and I thought that we we actually knew they're going to convert to some stage to the much more accurate reference sequence of the human genome now known as HG38 and those who've done the big Y probably had emails about this the change is currently in progress but with about 30% of existing test results have been changed with the rest of it they'll know the next two weeks we wondered if they would change the regions of the Y to target more steps known to be other areas no, they're not doing this the test is still the same the child in the same area so the analysis tool changed the test is the same we wondered if they might have tests with longer read lengths this probably will come at some stage but not just yet so we're still working with the same roughly 150 base per read lengths this means essentially you've got shorter bits of the chromosome to reassemble like a jigsaw into the bigger picture so the longer the bits you really really go the easier that jigsaw becomes imagine being in school there's nice starter jigsaws four or five pieces which toddlers use you can compare that with the tiny pieces you've got to assemble a jigsaw you do and you probably use a picture if you don't get it you do jigsaws but if you do it you've got a picture to guide you and put your jigsaw up that's how the human genome reference system works in building up this jigsaw of why crime doesn't reach we were wondering if there'd be better results presentation tools because family tree DNAs presentation tools were quite poor really originally and this is partially the case there has to be some improvement and of course we wanted all of this to be at the same cost or less and actually in general this is the case because family tree DNA has had multiple sales this year in which you can buy the big Y for lower and lower prices effectively if you're ready to wait for sales effective cost of the big Y is coming down there's usually one more at the year's end by the way so let's look at the tools then this is what new matching has been done as of October 2017 they have conversed to the HG38 they've introduced a new matching system a sort of matching essentially arranging your matches according to shared branches which sounds like common sense and it's actually potentially a very powerful and useful thing they've already removed a lot of unreliable names here on database which no one could infuse the issue and they've created a browser a graphical browser for the Y chromosome so you can see the reads that have been discovered within and look see it in a moment there are also this is actually very useful for administrators they are increasing the amount of data in their raw data files known as a VCF these are the files that Alex Williamson and others base their tree upon so giving them more data to work with will probably help them to build better and more accurate trees and they're also changing the threshold in how they'll call the SNF essentially the old raw data files position had to be read at least 10 times before they'll enter it into their raw data in fact there are many positions where fewer times than that and the general consensus around some may dispute this but the general consensus that if you find something called as a SNF at least four times you're probably going to get something there that's worth looking at so it seems if they're lowering their threshold it's a new VCF file to four rather than 10 again there might be a bit happening for some of you but it does matter if you do do that you must download your raw data and then find something to help you analyze it and then prove into the data here will help you use better analysis now we may then try to get through all of this but look at some of the most important changes first of all the HG38 conversion this isn't as baffling as it sounds imagine you've got a map of the old order of survey maps in aerial measures so your coordinates are all in in miles and you want to convert it to a metric measure in kilometers instead that essentially is what's happening there the map of the Y chromosome is basically a long line sort of one at one end work up to 57 million at the other end at each position have a number in sequence more or less is just being replaced by another one that's better designed and the little map here the contigs the contiguous sequences that exist in the old system most of them just transfer across then new ones get stuck in between because they've found more sequences or map them more accurately so there will be some position numbers changing additionally some sequences that were thought to be on the Y chromosome in the previous build are actually found to be on the X chromosome or on the chromosomes so they've been taken out and the net effect is that your 59 million positions on the Y chromosome in the old build has now become about 57 million in the new build this means that all the position numbers if you've looked at them will be different most of them about 2 million lengths especially in the higher ranges a little example here this is one of the snips we looked at in the previous tree YB609 on top we have the old build in HD19 22 million something and down below that you have the new reference sequence number 19 million something in about 2 million lengths what does this mean not a lot it just means that the new reference sequence is more accurate and therefore more accurate snipped cores can be made a few snips might damage if they're just fluff and rubbish they're gone or if they're on other chromosomes they're gone but the majority have been mapped across and it's possible now to get more reliable results there's no change in the particular notation that's still a G to an A there is no change in snipped names all these wonderful L21L448 all these numbers they remain the same just that their underlying position number will change but if you know a snipped name there's no difference that doesn't change administrators would need to update their SNIT catalogs to reflect the new numbers that actually isn't very hard there are lots of useful online conversion programs filled batch convert a whole range of numbers you can copy and paste into the converters and then copy and paste back so it's not actually a very difficult thing just need to get used to using them and some analysts are yet to announce when they will change no names mentioned so moving on then to the second big change the assortment of matching this happening is potentially very interesting we've been calling this for a long time and what it actually means is you get a little chunk of the apple tree on your matches page and it tells you how many matches you have at your terminal snipped level and how many in the branches above that and it gives you four branches above and see this person but in running tests quickly they already have got 13 matches at this level and this looks great but when I first saw this I was like wow this looks really impressive there are problems however and the question is what are the big problems is what are the levels do we choose here so the moment they are going for just the four above your terminal and this actually can create some anomalies this person here for example has got no matches he's been posting on various lists about this partly because his matches test haven't been converted yet but also because his upper level here sculpts a bit lower than the previous one the reason is he's got gone further down his tree so the further you go down your tree you're still only going to get four so this person very very similar that's the one we just looked at actually and we just looked at more matches whereas this person has got fewer yet that YP984 is underneath the YP355 so this person should be matching some of the people who are matching this person and the reason is this we've all got a simplified tree here on the sub plate you can see that the first person is one here with the red and again there's a term in the split now 1, 2, 3, 4 and that's where he stops then the second one is this person here with blue that's his terminal again 1, 2, 3, 4 that's where he stops and then the third one was this person here who hasn't gone as far down the tree as he has so he stops here that's now officially his terminal and 1, 2, 3, 4 takes him higher up and this means that these two people should be matches they'll see each other but this person won't see some of the people that this person will that I think is actually an anomalous this person here will go way back up above 1, 2, 3, 5, 5 back into ancient Scandinavian smiths and we'll see a lot more matches there from other members of the big L448 sub plate including many Scots from the Western Isles many Norwegians and Swedes and so the other testers are missing this now maybe the testers aren't too bothered most testers would be looking over their genealogy which you do and therefore what you want to know is who's close to me who don't need to approach because they may show you a common genealogy with me in the historical theory it does create a problem put back your shades switch off it does create a problem for the click word here we go good, go ahead and take it it does create a problem for project adventures if you want to build the tree up and you have this person in your tree they didn't very well in this far but in a sense their reward is not to see as much data as much matching data as others and if you're working through that person's results you want to see them either so I think we need to consider how the sort of matching should be built and how it might be mapped on to the apple tree to make it truly useful but a great step in the right direction I think it's been a huge step forward now the white chromosome browser has been added this time a bit of a surprise I know many of us were expecting it to come and it's very pretty it's very nice and colourful unfortunately I think it's very limited in what it offers it will allow you to see a snip and essentially these lines are reeds these are all the 150 base per one reeds chunks of your white chromosome which are being reassembled into the jigsaw by overlapping them together along the ring across this full set of overlapping short reeds and of course here we have a nice snip all the way down clearly every single reed is finding this one to be a variant and therefore that's probably a good snip at all we've got to move an extra data as well this is from Wiken 355 which is a known snip it's a Scandinavian origin and it's probably about 65 to 60 years old plus or down to a few centuries and again it's very consistent there's one reed here that doesn't have it but that's not a problem because at least 90% of your reeds are showing the variant that's probably all right and from what you did I've given us some extra information if you click on any of these A's it'll tell you that it'll give you the position derive means mutated there is the actual mutation the problem is 99.99999999 that's quite nice I don't have quite some confidence myself the problem is this confidence rating turns out to be of each individual reed so as you take it up and down you'll get different confidence ratings what we don't have is some nice summary data about this particular variant now it's marbled on a utility online called IGV integrative genome reviewer which does this kind of thing but it's much more flexible so those that zoom in and zoom out you can pull right out to see the whole pattern of the reeds and where they're clustering zoom right in and look more closely at them but even more usefully is if you click at the top on the reference sequence it'll then tell you how many reeds there are below that and how many of those are off the off the core so what percentage you'll get off the mutation and with this one you have to count and you can scroll down which you can do you can count all the way down and then you can go back a couple of times and count the ones that haven't mutated and you can work it out for yourself but if you've got more useful if this kind of data was included now the problem is this is the data that is given in your raw result bound file and the bound file is a very large file which must be generated by Pandit 15.8 and they don't really want to generate more than they need to but as I said all big white testers get your bound file you still need it because for the moment this very nice graphical browser gives us something all over our SNPs to see what it looks like it's not really telling us what we need to know about them how reliable they are as a whole and some kind of overall quality control check for SNP now this is the kind of thing that third party analysts like Wifle.com or for Genome Store will still do for you so I still say as good as this is as good as the improvements have been they might get going far enough and they need some enhancement and development and people can really use the tools so at the moment big white testers still need to get their raw data so I still recommend going to third party analysts to get the full picture but a few more examples here this one that clearly probably is not a very good SNP because you have a lot of low quality reads that's why they're like faded looking surrounding the G plus the G calls there are unlocally SNPs in a click of a down here you get a much lower confidence rating and probably looking at this I would say eyeballing is not going to be very good but we need to be more curious and scientific really to judge whether or not this is going to be a useful call so I think the Chrome's and browser is going to have a lot of limited use a few more examples and this one is a Chrome's and browser showing this SNP basically you click on the call in the person's results and you see this so this is given as an unnamed private SNP in this person's results great or what's that that's not called that's not mentioned in the results what's that? and if you click here again you see you get confidence 80% so maybe that's just not very good but actually so's that if you click on that so they're all getting the same confidence ratings because they're the confidence ratings of the horizontal lines reads we want to know more about these vertical lines what are they adding up to and at least with this particular view we can see that there are more SNPs to investigate we can go to other tools put the numbers find out more about them see if they're not ready see if they look like they'll be useful but at the moment the browser itself is not telling us what we need to know about them also it's very limited we can scroll to and throw it covers about 300 positions but you can't go looking at other parts of the browser so we'd like to keep sliding through hunting for more SNPs that haven't been called but we can't do that you can only click on the ones that have been called but again it's very limited to you they might tell you something but ultimately it won't let you go searching other things not listed unless you're lucky enough to see them on the view of the browser so it's a useful idea it's modeled on a service provided elsewhere but it doesn't go really far enough in splines with useful information I won't play my own video because it doesn't work it's just a summary word to say essentially limitations of the terms of browser and one more thing that this hasn't been changed there's been no change in the way which FamilyTreeGNA present the results so we're still going to alpha the list of known SNPs you can browse them down or search but it's not very directly user friendly and it contrasts rather with the one that Wifel.com produced which is actually arranged in the order of the tree so as you at the top of the ones that are near to your terminal position as you go down they get further and further going up the tree there's some meaning meaning in the way in which they're arranged you know the lower down you go the older those SNPs are it's actually quite useful I think and it could be adopted as a means for arranging SNPs Wifel.com as well and I just said I recommend people still get their down files anyone who's done a test already apparently will have their down file re-aligned to the new reference sequence but we don't know the number of words that we've done automatically or whether we need to request it we may need to contact a company to say please re-align my down this is actually a huge job with a lot of processing power required so I think they'll probably just do it on a case-by-case basis but I would recommend doing that I would still recommend sending the down files to third party analysts at the moment because as good as these chambers are not yet really filling again for me so this is some of them I've looked at the tree-building approach to using SNPs we're searching surnames or the tone line using the Y chromosome I've talked about how I think that SNP sequencing is the only way we can do this today as TRs play a role but they're secondary now to SNP sequencing and we need to find ways of assessing the SNPs we find to see whether they're reliable enough to go into a robust tree and that Tesla should still download their data and there's still a role of a third party analyst to help us extract more value from that data I'd like to end by thank you for people here who have given help in the operational talk either knowingly or unknowingly so thank you I'm much most hovering to the mobility down so I will stop now and then I'll just touch this time for a few questions and questions for John, Jim, Tony here in fact one quick one and one little more detail will we need to resubmit the new bound files to Y4 or other third party groups? Y4 haven't actually told us what they're doing yet they're now offering HG38 positions if you click to look at SNP data they haven't converted over and I think they're probably discussing what they can do they also depend on processing power which they can let them to buy but don't know if they can reprocess all the bound files or free the other one is on your victory you had two clearies of Mormon and then thousand genomes above it my closest match is thousand genomes is there a way to access that to then transfer to family tree DNA and use the data from the thousand genomes person I was willing to ask that if you click on the thousand genomes person you will see the private SNPs that he's extracted from the bound file you can't get data on the people because that was a medical research project so confidential my first match was also a thousand genomes person but you can get SNPs at least from the tree but can you actually transfer that data into your family tree and have it be a transfer file there's no you can't transfer things into family tree DNA the results are simply this is your result from your bound file you can transfer it to others and Alice Williamson from probably FTC will use the thousand genomes data that's from here from Jerry Corkin John great presentation so the SNPs tsunami is morphing into the acronym apocalypse when is technology going to disappear this will become easy probably I'm thinking Alex Williamson's tree is brilliant really it's very easy to understand but it's very much processed is there any way that we can get FTC DNA to need time generate those types of trees maybe incorporate so are they matching maybe incorporate district between geographic matching I don't know if that FTC DNA can incorporate those extra features but I think they are trying to find ways to generate trees automatically I think there are rumors that they're working on it but they're keeping So, I don't know how soon it's going to happen, but we know they have a database, we know whyful.com can do it. The problem is if you generate a tree dynamically automatically, then you're dependent on the input, so you're dependent on how you describe a tree, but if manual control, you've got someone checking, saying, that's not good, and at least there's some kind of control there. So, I don't know whether we can move to that, you know more than me, the digital analyst, I think. I'm just going to use it here. Last question over here. Thank you very much, John. It's a very interesting talk. You said, obviously, the test yards of Donis Farris, they can go, and clearly Big White has opened the lid off the SNP Pandora's box. There's a big amount of information that has been assembled on historic SDR testing, and a lot of those people may or may not go forward to Big White testing. What's your current view on identifying from historic SDR tests who might be best approached to go to Big White testing? Well, I certainly wouldn't say SDR, if I'm no use at all. I think they do. What I was doing really was we can't rely upon them for building trees. But I think SDRs can be very, very useful to say, identifying who may be related to you at the right kind of distance to make a Big White test worth doing. You don't want to do Big White testing to be very close to you, because you'll get the same results. You don't want to do that if they're very, very far away. So the SDRs can help you find those people with Big White testing candidates. In terms of certain projects, I usually test the people that are most distant to each other within a particular SDR group within the project. That's a relatively good way of trying to triangulate on what your hope is, the most distant common ancestor for that particular genetic group. Well, we have to leave it there, but listen, John, thanks very much for a wonderful explanation of what's happening. John, serious.