 If you want to, I can just introduce myself. Yeah, let me shout in your hearing. So here we are. Ladies and gentlemen, it gives me great pleasure to introduce you to the next speaker. And Dave is an IT services executive, a time partner and associate for HCL America, previously 30 years with IBM. And Dave has been doing some great work with automation of some of the YSTO markers that we use for certain research. So previously we were saying, how can we make this easier for the general public to actually understand the data and manipulate the data? Dave is working hard to actually do this and he's going to talk to you about the new SAP program that he has developed. So please give a warm welcome for Dave Vance. Good afternoon, everybody. Everybody hear me okay? The good news and probably the most important thing you need to know about me is that we need to turn this off. I can usually shout about it on my phone. So if you're new to, that's fine. Thank you all. What Lawrence didn't have time to say was I've been a genealogist for over 30 years. I've been in genetic genealogy since the first National Geneographic Best Act in 2005. And so a lot of this, I've kind of derived that a lot longer than I should have, but it took me a little while longer so a lot of people would come up with a lot better. And we're back to why DNA tiered on the subject. I'll be talking about, but I am not going to spend any time talking about the biological side of SNPs and as the authors mostly because we don't have time here, but also because you've heard about it and also because my program and my interest is more on the genealogy side anyway. So I'm talking about how we're doing the urge to be able to be able to produce useful information for genealogy. We think it's been a lot of time talking about genetics, who are just talking, and it's all very interesting. I do understand that we still have two minutes left. I'll see you then. Thank you so much. I'm sorry, I'm just going to have to speak up because I don't see a lot of control. Anyway, I do understand that the folks who are fairly new at this and you understand that they're coming from my wife, I've been talking to her for a long time. She still hasn't quite grasped it. I'll let you on a little secret this little graphic I made, which looks like it's a window on the path of DNA going to an alluded past that's very grand. Actually, I made it for my wife because she's always said when I talk to her about genealogy that she wants a window to throw herself in. So it can be different things for different people. So anyway, when we're talking about life in a test, how do you think about genealogy? Are you going to talk about it? Because I'm just going to turn this off so I can talk about it. All right. When we're talking about life in a we're talking about doing analysis on a group of people. This is almost by definition being viewed by a lot of life in a analysis by them. What we're talking about is producing a tree that looks something like this. MRCA is the most recent common ancestor. If you don't know the terminology, you're going to have to go back to it. So we have a number of kits underneath it. It doesn't matter if you have three or six hundred. It still goes back to some most recent common ancestor that could be 45,000 years ago. But why DNA is not connected? So any project administrator, surname administrator, anyone who's got a group that you've just talked to or anyone who's got a group that you've just talked to or a group that you've just screen-scraped off your project and are trying to figure out yourself is usually looking at some group of people trying to analyze it today. Now, how do most people make sense of their why DNA? So here's an actual picture of someone trying to make sense of their why DNA. Typically, we have a number of these sources that we follow. There's all the terminology and concepts, conversion, parallel mutations, how it's implemented on the illness markers, SDRs, SNPs, all sorts of things. All right. Then we've got all of the reference sites. You've got family to DNA. You've got full genomes. You've got ISOG. You've got live goal, live routes, and why, you know, soon we'll have widows, why mountain, why bother. But there's all sorts of places that you can go to for help on these. Then you've got all the why step reference trees. You've got Williamson's great site. You've got the ALSAG site. You've got the DNA site. You've got all kinds of sites there. Then you've got all the data that you have at your disposal. So this is my little data symbol. You've got all the various why SDR marker data. You've got 25 tests, 37, 67, 111. Also, don't forget, we have a few SDRs that might work as an opportunity to get a lot of help from others. And it was waiting to be announced that you could do the treatment. Then you've got all the steps. You've got genomes, big live, heart steps, you run-offs, or your family's. You've got all those databases. But you also have the fingertips. You also have the genealogies that your group has done, that you know where you were and what lines fit together at some point. So we've got all that data and what I'm really trying to talk about today is how do we put it all together? Just look at the SDRs, or the SNPs, or the genealogies, but put it all together. Now, normally, if you put it all together, you can change that book. Which generally changes to that. Because it's not easy to figure all this out. Once you put it all together, you can stare at it for hours. If any of you have project administrators who support your either surname, group, or habit group project, or whatever, thank them repeatedly. Because the work they do is usually a lot more than even on the... I'm also an administrator for the Vance surname project, and there's people who do a lot more work than I do on this stuff. So please believe me that they work hard. So let's talk about... who was talking about all this user data and how we use it. This is an entirely subjective chart. And for most of you who are project administrators, you will vary with me on the dates here, but you'll also have some version of this chart. So we've got presentation going back in time, but it's trying to show the effectiveness of the various sources of data. So for instance, for the SNP data, it's very useful back, you know, 4,000 years ago, 2,000 years ago. As you start getting closer to present day, currently, the effectiveness goes down. Why? For two reasons. One, the coverage of the white chromosome isn't enough yet to give us SNPs for more than about every three or four generations. And the second reason is we don't have enough next generation sequencing tested coverage yet to be able to get private lines fleshed out and get the branching for every single branch that we want to build that tree from the most recent comment since we've done. So the effectiveness of these for building a tree actually will go down somehow. The STRs, on the other hand, they sort of filled in the genealogical period, maybe go a little bit further back and get the differ on how back they're useful. I say it's usually 1,500 to 2,000 years but the no easier one is the most frequently mutated one is don't go back to that period. But typically you can make sense of the value of that. But after that convergence and just general noise tends to cloud and make sure that more they're not quite as useful because you can count on actually mutations at the rate at which they're predicted because a slower moving marker can mutate just as recently as a faster movement and you don't always know which one happened when. So there's a bowl of effectiveness for STRs. I'm still of the opinion that they are useful. I'll talk about that in this presentation but I absolutely understand that the STRs produce good branching whereas STRs are ballistic. You have to deal with them with an idea that you find a likely branching that you don't always know for sure which one it was. I'll talk about that in a minute. Of course, theology data which is usually very good in most recent generations but it drops off with my favorite story. I had a person connected to my ancestors a few years back on ancestry and I was originally very excited because there's been one guy far away from me. But I kind of lost interest when I found that their earliest ancestor was reported as he was put down as Thor from Asgard. Now, I should say if you're in the audience, I'm not discounted the possibility. I do have an uncle who's pretty good at the hammer but I'd really like to have some snips or a haplotype for Thor. Maybe go back to Odin. We certainly know Loki wasn't doing it. So the effectiveness of that data drops off quickly so that your alliance isn't that good so what you do want to do here is kind of ride this top way and use as much good data as you can to get the best possible tree line. Now as I'm going through here, remember we do not have enough data today even out of all three of these sources to produce a perfectly accurate tree. We can only do the best we can with the data we have and the data will be spotty depending on who has done the research or who has done testing. So when we put all this together to create a tree you've got a standard ancestor tree where you start all the way back here with an ancestor that has a son who ends up having sons who end up branching down to present day so you get this structure. Of course a lot of these have died off so you end up having a structure that you can test for that looks like this. It's usually what I call long stalks and bushy tips because you have these long stalks if you plot it against a time scale you have these long stalks going back in time and then you get a lot more of the bushyness in the more recent generations as you have more surviving alliance. If you turn it upside down it looks like Queen Anne's Lace so that's kind of the typical structure of why the NANJ are all against a time scale. In trying to recreate this tree with the sources of data you start with the SNPs so you start with SNPs that build down from the prehistoric branches. Some people on their lines are lucky enough that they can get pretty close to present day just with SNPs. Often your public SNPs or even the surname SNPs will stop at about 1500 AD if you're lucky and you can get that close. But for a lot of us we only can go back to maybe 2,000 years ago and get that far with SNPs so it will vary depending on the lines how much next generation sequence in testing has been done how big your group is that can analyze it. So you get down as far as your panel of SNPs then you have the genealogies filled in from the back and then take them back in time. Clearly you may not get as far back on all branches if you're lucky enough to have a royal line you can go back you can go back to the 12 or 1300s otherwise here in Ireland we can get back to the 1800s or so if we're lucky I should say advanced my own line is a line from Ulster Scott immigrants from Ulster I've been able to go back I've been able to go back on mine to the Inichal Peninsula to my ancestor there born in 1753 so I was able to cross the Pong which is lucky but in general we can only go back that far and I would never think I would get another generation or two back it's not likely at that point most of the record will be lost depending on the line. So the SNPRs will fill in the middle and the resources of data put them together to make your best tree now the challenge then is to take all that data and create the tree how do we do it? First question is why do we need a tool we have tools available today for those of you who have been around for a while you know the tool called Fluxus a lot of people have used it I know in videos which we have online from Morris that he's done the basis for this however in old FDRs there's no connection to SNPs or genealogies and it's not that easy to generate or analyze you really have to be an expert in Fluxus to use it it produces output that to steal a line from Morris that tells me I'm dissented from the constellation Ursa Major but not myself and it is not that easy to analyze anyway this is a I actually spoke with this straight from Morris this is one of his rotation history trees again I can't recommend these videos and they're a great way to produce these trees they are manual and not everyone has the time and inclination but it's not that easy to model a scenario so if you're trying to get a scenario that you know is more likely than another you pretty much not have to start from scratch but you have to back up and do the analysis and start growing trees again which is not a good thing so the trick was to automate this into something that was readable useful for analysis and concentrated really on the genealogy side as opposed to all the genetic kind of outputs that produces you get all the tips in yellow so these are all these are all the endpoints of the tree it goes back to all group MRCA always goes back to the common ancestor you get the branching nodes in blue that are the common ancestors for the smaller subgroups within that overall group there weren't good tips for genealogies that were appropriate so that's the common ancestor and the short ones were reduced in you also have the SDR mutation history in black above each other and you don't always know exactly where they mutated but you know that they have a long long branch so the tool basically builds that picture for you and we'll talk about that in a minute I'm going to go through a very briefly how it works and then we'll talk about some of the analysis we're on that's the tool itself that's where it is anybody can get my name off of the schedule and send me an email I'd be happy to answer that if you need it but that's where it is you basically build an input text file and you get the button to execute and it writes the input text file is the most important thing so if we create one we will start with just four kits there's two of 37 markers and two of 67 it doesn't matter how many markers you put in it will accept any number put them in the text file the text file simply is divided by sections so the SDR the only required section is the SDR section and I'll talk about that a lot but you put a slash there's a help file online that explains all these sections for you so you don't have to remember the format then you just put the kits in with the kit name and all the numbers again it is fairly accepting on the formats you'll see a lot of the details in there if I run just by itself I get that output now obviously I tailored this example for this group but you get four kits or it has decided that the common ancestor branched off with three SDR signatures here because these two share that in common then they each have their own mutations and ended up with just two and four that are on the branch then you get this one here has its own it has its own signature and then has other communications done now when we talk about SDR mutation history we talk a lot about conversions and parallel mutations and here the tool has decided that there's been a parallel mutation because it had to create one to explain the mutation history so marker 456 has gone through the 90 here 90 or both what it does is it picks the most likely paths so it bases it off of the SDR mutation rates that the most likely ones to mutate are the faster markers so 456 is a relatively fast marker it's certainly more likely that it mutated than these three in a parallel and 443 is a slower time 442 is a slower time and that one is not as likely to have mutated twice so that's how it builds the tree again it may not be the perfect tree but it's the most likely now it also produces some other tables as output it produces it will repeat your input so that you can verify it with all of your kits it produces the table of genetic distances that it calculates and it also produces an adjusted genetic distance table which has the parallel mutations taken into account so that just bumped the mutation that the net distance between kits 2 and 3 is bumped up by 2 because of that parallel mutation so I'm not going to spend a lot of time on those details now let's try oh, first let's answer the question why does it require SDR data for two reasons mainly today first because SDRs are as common as opinions everybody has them some people have more than others and if you turn around somebody wants to share theirs with you that's because of the family tree DNA product choices for one thing but as long as we still have people taking SDR markers as their entry point into genealogy for Y testing we'll have a lot of SDR markers around for analysis the second reason is because as I said before we're still better at figuring out the branching within a genealogical timeframe than SNPs that will change as whole genome testing so it's more popular and more affordable and as we figure out the branching but it will take my guess 7 to 10 years before we can exclusively rely on SNPs and in that time SDRs we already have 500 we'll have more of those to be able to test so I my opinion, not shared by everybody in the community is that SDRs will still be useful for quite a long time so now let's add some SNP data and some genealogy data we just put two other sections in you give it the actual test results for the SNPs now understand that the only SNPs that are useful to put in are the ones that divide the group if you put in a SNP test that's common to all it will say it's common to all branching information so you want to really break it down into the SNPs that are for instance this person is positive for Z2 356 the other two are actually negative for that this one here doesn't know because they have tested those so there's no it assumes if you don't put a SNP in that it's unknown I've also put in that Donald Smith is the ancestor of both kids 2 and 3 now for genealogy it assumes that if you don't put it in it's negative so what I'm telling you is also it's 1 and 4 are not the SNPs if I run that that changes the picture significantly because before it thought 1 and 3 were the SNPs or more closely the SNPs and 2 and 4 were I've now forced it to be considered 3 and 2 are the SNPs it has now decided that 3SDR signature has to be a parabolic mutation here so it puts it it's not going to judge by genealogy if I fold it that it will accept it and it will put that branch on there and tell me how that looks now once you analyze it you look at that and say it's probably not right but it takes that extra step to be able to look at it the tool can't do that analysis for you if you put that genealogy in so suppose I did that I said well that's not right so I go back to check with it's 1 and 4 so the output here that produces an additional table that just gives you that SNP genealogy data the blue is positive the red is negative the gray is unknown so suppose I go back to my sources and it's 4 and 1 and say well we might be the same but we don't know we haven't been able to go back that far so it's okay I'll put that into my genealogy instead I can do that with question marks if you look at the output table it will then show those and gray saying that I'm sure of it and I think I'm going to get that and when I run the table again it says okay I'm back to where I was before because now I can make sense out of the STR mutation and I can go to the tree back to where it was before and John Smith is up here in the box as the commentator says so it was decided even though I said I didn't know if 1 and 4 were to come to where I was sending John Smith it's decided that probably it is because of the way the STR's but no the last thing is if you take a look at this step but I said only tip 1 was positive before Z23,5 and 6 I decided that that has to be a private for tip 1 because I thought that if tip 3 was made if tip 3 is made it will have an issue that has to be at this level if that's never heard suppose I go back and I say I don't know for sure if that tip is positive or not either the test is bad or it turns out to be a test or whatever I go back and now I say I don't know if a person is positive or not but the tool both do since it doesn't have any data to go on that's probably not one person or another it will put it in both places with brackets and say that's the range at which that's meant to be perfect so I know it can only be as high as this because we still are at the native so it's not enough here to do it that way because I don't know at this point where the state has actually been taken so it tries at the best to put all this data in the tree produce a tree that makes sense but it's not always a perfect solution you do have to this is still more of an art than a science that's why you have to play with it and see how it works so the details the rest can change where it actually happened there are a lot more options for those of you who want to play with the details you can going back to I think what James Irvine said during the panel you want a tool that can give you finer motor control over these trees if you want to change the STR leading to somebody else's or like the ones that it uses you can put your own in if you want to you can always store certain STRs if you think of CBY and you don't like how it comes out of your tree you can calibrate the TMRCA you can show more information you can show information in the tree always explain on the site so I'm not going to go through it in a short time and then there's a lot of adjustments you can change all the colors you can add labels to the trees you can annotate your favorite things those aren't used in building the tree they're just things that you can use them to show people that are communicating where they're certain things are what moves along areas of that sort so the TMRCA calculations everybody in this talked about time the most recent common ancestor in it with anybody who knows if this is a complicated subject it is very general we cannot localize these STRs and send mutations down to the exact years yet but there are statistical models that can generate estimates this one uses one from Kevin Murtrick he was popularized by the North Houston few of the people if you've seen any of the literature it calculates these for every node on the tree as well as the group ancestor it's got the error range so this is a 1650 AD estimate error range of 1500 1800 is the number of generations as well how far that went back in as far as the years the accuracy of the error range depends on how good your STR data is how many markers you have by the way I should say it is STR generated not SNP generated there are two competing ways to do it but the only way to do SNP PMR SCAs for every node is to have net generation sequencing across your entire group and we don't have that sort of coverage today so that's why we crystallize on this TRG it can be calibrated this is a fairly recent feature but if you know the ancestor lived in the 1700 you can put that into the tool and it will adjust this and the upstream and downstream ones to compensate so it will tell you ok if you know where that ancestor lived I can make some better assessments in these tools as to when the other nodes probably occurred and I'll always say this I've got to use them sparingly and use them really as general estimates are we still within the time of surnames how far back is this really is this one 1600 and the next node is 300 BC meaning there's a long time span that's about as useful this cannot be used as estimates of when the ancestor was actually born I won't talk about this much this is just for project admins and folks who want to play with the recognition it will not recognize very frequently used to be like CDY AD and 712 and 710 it doesn't recognize those single mutations as signatures that can also be adjusted there is so a lot of this depends on the STR haplotype of the group MRCA it backtracks to calculate for a mutation history it backtracks to figure out what that haplotype probably was when it produces the whole wide mutation that means that what that haplotype was is very important to how the tree was built so it's producing this chart going back to that group an official common ancestor if you put in another haplotype right to the node it will assume that that is the haplotype for this group MRCA and it will try to to compress all of the mutations that probably happened which means there's a lot more mutations that has to deal with which means the tree will probably have branching whether correct or not will at least be on signatures that don't make sense or may be wrong branching we've had occasions of that and the real correction is to make the haplotype that this starts with be as close as possible to the ancestral haplotype that that group ancestor had usually the best estimate of that is just the local of the group which is what the tool calculates on its own but that's not always true if your tree is not balanced well sometimes you have a group that is tested a lot underneath it and that will overrate the haplotype for any of the project administrators out there talking about in terms of getting the tree tree the mutation history tree to look right you have to make some assumptions about what that could go with the law and the tool will allow you to override the internal calculation which you have to do experiment if we try to run the tool doesn't care how many people you put into it you can put 547 people which if you know Robert Casey this is his actual tree that is the tree that he produces if you printed it out it's the height of a small child and it will run in half a way across the soccer field that's a difficult one to analyze but usually the picture is not that useful so the tool has an option you can print out the entire tree with all the same information just indented for your various nodes if you're a visual person like me that's harder to look at but I'm not going to analyze 547 people through a picture either so it's a judgment call if you have two or three hundred people you can decide if you like the output in text format or in picture format but either way it's a lot of people to analyze I'll quickly run through if you are a project admin we do have a larger group to analyze how you can go about it with this tool just to close it so this is an example of the details that are important that's an example from the Banserian project I'll try to look at it if you take this entire tree so the Banserian group has 247 members they break into about 11 recognizable subgroups of which four or five are actually from the Ulster Scott area the Ulster Scott background was published in the original version and so I actually covered the various groups so you can sort of see how it comes out through the tool it does group them in general areas so you can get to these groups but yeah it's actually fixed this one because that is a group A3 and I haven't given it any simple genealogy there yet and that group actually does descend from a common ancestor but they split very early on and without the simple genealogy there it doesn't able to make decisions based on STRs alone which is not always right but the whole reason that I do an entire project at this level it's a little difficult to do lines so I do it just to see the outliers so I do see if the outliers can be placed and a lot of us have done this by name before so we already know the major subgroups for a certain number of subgroups always have outliers there's always people that don't quite fit into many buckets depending on what they've tested you may not know where they fit but generally you have some that you don't so I learned a little bit to see it like you found in year 6 where all of those groups increase the various groups and I do a treat on each and I do a treat on each and every one of those then I add a lot more shift data and genealogy data the microphone is entire but the steps but the steps you take are you first go through and find any inconsistencies that you know of so the treatment the total of your decisions based on your input of data or it may be the one you see and when you try different scenarios you can try all of these in order to try and get to a better this is as I said before an art scale and not necessarily a science but at least we've got the effort of building a tree that you've already done for it and you can start to analyze it and get decisions based on how it's set out at first and then you can be creative with it you can for instance decide that there must have been a script that marks a certain group because there's a clear pattern that was developed so you can without knowing what that's the idea you can make an assumption that there's a script there and see how the tree will put that data into the tool as I'll say on a chart coming up just make sure that every assumption is documented because you have to prove all of this so for instance I have in one of the dance projects I have here here that the signature that defines these branches but they're both including the marker 460 and it's gone from 11 to 12 in both cases alright, thank you Dr. Daryl so we have the same SPR that's rotated in both places because it's a pretty fast moving marker the tool doesn't really have enough information to know if it happened once or twice so I can read it that way and just say we don't have enough information to know if those 4 share a common ancestor or 4 the overall or I can put into the tool that I know they share an ancestor and see how it works so now I put in what I call ancestor 1 and it says okay tell me there's an ancestor there all of these it's now created that as a single mutation and it produces the same tree underneath so I've made an assumption about an ancestor is it right, I don't know but if I make the assumption I now have to prove so it gives me something to go with once you have your best tree you can put it in finishing touches a lot of us in the surname projects have burning questions that our members would like to know the answer to are they related to a particularly famous ancestor when did their immigrant ancestor come over when did it split off there's a lot of questions that you can if not answer completely at least the amount data that helps explain that one of the problems with the advanced learning project that I have is that a lot of the members tested a long time ago and then it sat for a long time and they didn't really see a lot of value out of their original test in information I can give them at least about the possibility this helps and then what I look for is to get feedback from them okay what testing is there any genealogy information I don't have yet that would be useful would you reopen to more testing that's a particularly interesting subject because what we want to do with our members as well is advise them on what further testing would be useful this is unfortunately a game where every test opens up new questions and the possibilities for more testing so you never are done with the answers the way that the tree can suggest additional testing if you have bigger the tool does produce and the lines that it connects it does give you an indication of how sure it is in those connections because the bigger lines are less certain the thicker lines are more the closer it is the easier it is to show confidence levels in terms of how the kids can recognize no kid gets left out for a while it's not filmed in the DNA where there's a particular cut off in determining how the data they are if you put a kid into the data it will put it on the tree somewhere even if you only have 12 markers and no STR or genealogy there it will find some ways to put it on the tree obviously the confidence in that kind of information is much lower than if you have 11 markers with initial STR so there is a confidence factor in terms of how close the tree how every kid will end up on the tree if you have S&P range and FUSP ranges it also might as well tell you where or under there you should test to be able to show where that's going to go so you can start to allocate the SNPs perhaps to particular areas of the tree or suggest where people should test more they can even do I use it to suggest on my part SNP testing for my members when they don't want to spend another $300 or $500 for the next test it would be good for them they just want to spend maybe $17 $50 at YC to test that particular SNP to find out if they're on that range so that's useful to be able to help get the SNP money or more easily which branches I would recommend as the architecture again I suggest it as the R the malls and the sort but they're 2 or 3 STR markers that are very clear signatures for subgroups for people who know that they're in a general area or think that they send it from an ancestor because they're pretty easy to get from the SNP and of course if it's a guest and they want to test they can find out where or how far they can anchor that sort of thing but at least they haven't spent the money without getting the initial if there's no branch so if you have a particular number there's a dozen different people that gives you an area where you could advise people to either do some next generation sequencing or a SNP panel maybe if that's useful for your group or something that gives you a lot more knowledge quickly in that area because you need to find the right channel and then what other testing is necessary to develop the best tree so if you spend some time on the tree as opposed to building the tree on the analysis it can really help you get advice to your family the tree does have internal SNP trees so if you tell it that you're positive for a particular SNP you'll directly know that that means that it's negative for other SNPs on other branches so if I say I'm I'm actually positive for this SNP it means that I'm negative for all of these other SNPs and that I'm positive for the upstream SNPs obviously that's only the public trees sourced it off about its Williamson's tree because it's the most up-to-date from the citizen scientists at some point and it does have everybody from U106 that's the entire U106 tree that's obviously not readable but over to the citizen you can print off these SNP trees by the way for an endangered SNP you can just plug it in there's a SNP tree function should you get this tree output for any of the SNPs if it goes about and as U106 L2 and L22 so at least that's the major of the and I have run it on the 500 SDRs which I think is kind of the next generation of these tools it's interesting there's two major problems with this first of all we don't have good SDR mutation rates for all these SNPs yet so it determines signatures but they're really not based on weeding so it's hard to know and the other problem is it's 500 SDRs to analyze and finds an interesting way to do it that is still a lot of work to analyze it's not easy to do but I actually had to move the output over to be able to fit all of the mutations onto the tree to produce the same tree for 500 SDRs so hang on a second I think I think that's it for me I'm going to leave you with an image of what I think the future looks like first of all I think we need and I've heard some of this I've been sitting here all morning and I've been hearing some people say the same sorts of things which was gratifying but I kept wanting to jump up and go again but I think we need to make more databases together clearly we need more testing more connectivity we need to be able to port things over back and forth I'm proposing we need some kind of a crowd source facility where admins and people who are responsible for a group can build their trees and share with each other for commentary or various kinds of analysis again Alex Williams is probably the closest to that we have today he does produce a lot of SDR information and that's about as much as we've got I think we need more and of course for me because I'm a visual guy and I want the modeling and the capabilities to look at the data pull out useful analysis data but be able to see the tree and how it connects and on that vein I will leave you with one picture of where I think we sit today this is not my invention but I agree with it I think this is us I think we've survived the initial sit tsunami but we have a lot more with the continuing affordability of next generation sequencing the coverage on the why the amount of SDR data that we can get out of these that will continue to grow as well as let's not forget the continuing digitization of internet records that gives us more genealogy stuff to pull on I think we have a vast amount of data and I know I'm tickling what a couple of other people are doing today but again to echo them as well if we don't get in front of this with better analysis toys we won't just be sitting drenched in data without the ability to drench that's all I've got, thank you Eric is around I work very closely with Dave on aspects of the tool and we're very closely involved discussing A, I've tried it online at least a group what do you think the issue is what version of it we're on version 2.7 I think now we just added the colors and the coloration of the MRCAs I'd love to add it does not have a lot of sniff of the phylogenetic block data yet but that's because it's not particularly useful in producing a branching but I'd love to add that for people to be able to as it does branch as you do find ways to break up those blocks so I would automatically recommend that I'd love to have a routine brain formation at the moment should have sniffed tree and your input data so we'll start to get to this kind of tool but I think we want to make that as opposed to the evolution that we're trying to do the questions today, do we have one here because I'm a guru? Dave, thanks very much I'm not going to say anything it was fantastic it was a dream thank you yep yeah so this question was the input files were a little bit fiddly and how do you the input files were a little bit fiddly how do we improve that? yes, the format is in sections and you put the data into the sections but of course it's dependent on the input you provide it does parse the data pretty well in terms of recognizing how to do it whether it's a disease or an option but to your point the input file that you build has to live on and you adjust it over time and this is something that you're basically producing another version of your data just for the tool I would love to input directly from an Excel file or do something else but again the formatting has to be right so the other option I have is to go directly to the sources and try to read data directly from either bad files or other sources then putting genealogies in becomes difficult so to your question I don't have a perfect solution yet this is a temporary one I think and as the data gets more advanced the text file will not be a permanent solution it's all right if you're just trying to analyze a group now but it won't survive for very long it's a good way we have a question here from Kim Schoen I don't know if it's right here it's great to see that you can have a timeline that we have no calls it's a good question yeah that's a good question the question is not a handle no calls the tool that you notice doesn't really know SNPs as biological you put SNPs in as positive or negative and it's kind of these kids are in that sector we're not in that sector so no calls you simply would either put it as a question mark as an analogy or you just leave them out in time and it will make a decision whether it's in that set or not just based on the STRs or the genealogies so to answer your question it kind of treats it as not input but it will show you on the tree whether it ends as part of the model and it will fully answer your question right it doesn't ignore them but it doesn't count on the data now that may be something that you want to try as a scenario you say okay I'm going to check if this is positive and how the truth is negative how the truth that might give you an indication of these without a certainty we'll ask this actually to what extent do you think it's possible the family treaty and they will integrate the SAP program into what they offer their customers I've not talked to them at all about that and I think they would go for something themselves just knowing I would hope that as much as I like the tool because I created it to simplify my own life I'm hoping somebody comes out with a next generation tool that leapfrogs over this and does something much better so I would hope not that I'd mind talking to them but I think rather than integrating my work they should do their own and build something hopefully much much better directly from their database and produce output something like this would give us as much information with more analysis more probability behind it, more certain needs or estimates thereof etc it gives them, I mean they've got a lot more people that I think can work on that than I do so Excellent, but we have to close the day here unfortunately because we could talk about this for quite a while longer because at least they too are one of the most to know the things that come out in DNA research for a long long time and I just want to congratulate you and thank you for all the support you've been giving me and the rest of the community with the wonderful SAP program ladies and gentlemen, thanks