 So, thank you for coming back for the afternoon session. I have a lot of material to go over, so I'm going to go over it quickly, and I hope that everybody sticks around for this afternoon, because speaking personally, those are the lectures I'm looking forward to the most, the very difficult issues about what to do clinically with families that you use this technique for. So I am a practicing pediatrician and a medical geneticist, and I went to graduate school in a genome sequencing center, but most of the work I've done with NexGen is through the Undiagnosed Diseases Program, and we see families and individuals with mystery syndromes, and often they, multiple families will not have the same thing, so each one will be a unique case, and so we're using Exome Sequence for an individual or a small family. I'm going to present examples and ideas from some UDP projects, and I'm also going to be presenting the work of some of my collaborators. I will be talking about some platforms and the usual disclaimers, I'm not advocating for any of them. I'm going to use Next Generation to refer to Exome Sequencing, and when I say variant, that's going to be any difference from the defined reference sequence. A pathogenic variant is going to be a variant that is wholly or partially responsible for a phenotype of interest, so this is roughly equivalent to a verified mutation, and a candidate variant is something that's on your list of things that might end up being the pathogenic variant, but you haven't verified that yet, and that's the list that varies as you go through all the filtering techniques you heard about today. So I'm going to start mentioning a few things about project design, hopefully not overlapping too much with the great talks this morning, and the core of the talk is going to be a discussion of how to integrate some other technologies with Exome Analysis to improve your chances of success, in particular SNP arrays, and also phenotype and family history data, and then at the end I'm going to talk a little bit about validation to add a few comments to what's been talked about before, and the same for reanalysis of uninformative data sets. So because of my background, the examples I'm going to use are going to be for Mendelian inherited diseases and high penetrance, small number of genes and humans. Hopefully there will be some overlap if you work with different types of projects and somebody's phone is up here if that's okay. I don't think it's my financial advisor. And the basic problem as we've discussed before is you usually, when you go to look for the needle in the haystack, end up with a big pile of needles, and you need to pick which one of those is the one that is your disease causing variant. So how can you improve your chances of success? There have been a lot of examples this morning of how careful project selection can improve the outcome of an exome project, and I'm going to give one more example that we use. You can also use parallel analyses. So I'm going to use the Snipchip array as an example, but there's lots of things you can consider, including phenotyping and expression analysis. There are also lots of variables to consider in experimental design. In listening to the questions before the break, it was clear that some people are starting early in the process of sequencing and data collection, and other people are analyzing data that's been obtained through a collaborator, and even analysis can be broken up into a bunch of different pieces. So before you even start the project, it's worth thinking about how you're going to carry out all these steps, which things you're going to do yourself, and which things you're going to do collaboratively. So in the Undiagnosed Diseases Program, we see many families, and we only have the resources to apply exome sequencing to a few of them, and so we have a tool that we use to prioritize the families, and we consider things like phenotype. Is the phenotype more multifactorial in appearance, or does it look very genetic, early onset severe? Is the material available a single individual or a family? Is it a phenotype that is overlaps with common conditions, or is it severe and compelling? And is the family show multiple affected individuals or only one affected individual? And we need to use this kind of rubric because our families, we don't have a large pedigree with lots of affected people and a clear pattern of inheritance, so we need to do a little bit of screening. So for all of the type of analyses that you may add to an exome project, and that includes the filtering things we heard about earlier, and also some of the things I'm going to talk about today, it's worth asking a few questions. And I'm going to use the SNP array data to explore some of those, and as I mentioned, we'll include some other examples, and I would like to touch on at the end using accumulated data for multiple exome projects, which is again something that came up during the question session. So if you think about the criteria you might use to decide whether a filter is worth doing, or an additional analysis of the data or DNA is worth doing, you first want to ask how much is the candidate variant list going to be reduced? So is it worth the trouble of doing that step? And then the second thing you'll probably want to ask is how error-prone is it? And actually there's a very nice discussion during the cancer presentation about trying to get a sense for whether true variants were thrown out or false variants that your analysis is designed to exclude were in fact included. And we heard about examples of that. For instance, DB SNP can be a powerful tool. It's certainly been used in a lot of studies, but it can fall into both of those error categories pretty easily, whereas segregation filtering, if you have high-quality data and a correct genetic model, has very favorable characteristics. So let's dive into talking about SNP since we're going to talk about a SNP array. So a SNP, as you probably well know, is a single base at a defined genomic position. The exact nucleotide varies in the population, and the location itself is defined by conserved sequences nearby. And you may be familiar with this logo type of display where the height of each one of these positions is proportional to the conservation. This is the one end of the mammalian splice site and the G and T are highly conserved, whereas in other regions you can see that there are two nucleotides that are commonly found at this site. And the most common allele is typically called the A allele or major allele, and the less common allele is the minor allele or B allele. And a perfect site would have lots of highly conserved sequence to define the position of the SNP and then would have a SNP with two nucleotides. So the way that assays for SNPs are carried out is that the technique, which I'm not going to explore in detail, uses a different floor for each of the two possible nucleotides. And so this is a graph from Genome Studio, which is a tool that we use to analyze this type of data. Along the left is the fluorescent intensity for the B allele, and along the bottom is the fluorescent intensity for the A allele. And each one of these spots represents an individual person. So this whole graph is for one SNP with multiple individuals. And by quantitating the fluorescence, you can create a genotype. So for instance, this is all B, this is all A, and this is about half intensity of each of those. So those are called heterozygotes. And those same principles can be used to define hemizygous spots, duplications of various combinations, and places where there's a total deletion. So this is probably the most complicated slide, so bear with me. This is a display from Genome Studio. And two things to point out. One is that before each dot was an individual, and the whole diagram was for a SNP. In this case, the dots are a series of SNPs from left to right across to locus. And there's only two patients here, the one with the blue and the one with the yellow. There are two important plots to know about when you're analyzing SNP sequences. The top one is the B allele plot. And this is a normalized level of B or minor allele signal for each one of these SNPs. So in most cases, some of the SNPs will have the two copies of the minor allele, some will be heterozygotes, and some will have two copies of the major allele. But you can see that there's a gap here where there's no heterozygotes, and that's because this is a spot where there's a single copy DNA deletion. There's another way to look at the SNP data, which is a log of the total intensity that you see over a reference amount of intensity. And if the reference intensity and the total intensity are the same, that's the log of one over one or zero. And so those spots cluster along the zero line here. And if you have, if the log is 0.5 over one, then you get a decrease. And you can see along the bottom here that these are slightly shifted down. And if we go to the next slide, that's blown up a little bit. So the B allele plot here shows no heterozygotes and there's the shift here for single copy deletions. And this same technique can be used for double copy deletions and for duplications. So why should you include SNP chip analysis along with your exome analysis? Well, as you saw this morning, the short reads from SNP, from exome analysis, tend to pile up over regions of interest over the exon here. And for the SNP, although these are much, much less dense, they're spread out across the, across the genome. And this is more, this is easier to see if you look at these are chromosomes now. And at every place except for some centromeric regions, you see there's a dense coverage of SNPs that cover most all of the, most all of the genome. And so this really provides a survey of the entire genomic structure. We were talking quite a bit about dosage abnormalities. And this is one way to get that information. And it's cheaper than doing genomes. So for call it 200 to $300 a pop, you can do SNP analysis on families where it costs maybe $5,000 to do a genome. So you can do this analysis and combine it with your exome analysis for less money than it would cost to do everybody with genome. And you can detect things like dosage changes, we just talked about that. You can detect chromosomal mosaicism, consanguinity, uniparental dysome, regions of anomalous continuous homozygosity, which are stretches or spans of homozygous SNPs, which are longer than you would expect for that region. And there's actually a question before the break about a case of doing an analysis in such a region. And then recombination mapping if you've got pedigrees. And so we'll talk about each one of these. So as far as detecting those dosage changes, you've got a couple of options. You can use the manufacturer provided software and look at the something like the Bealeal frequency. Another thing you can do is there's a piece of software called PenCNV, which will automatically detect regions of dosage changes and will print out a list. And that can be integrated, for instance, using Jamie's Varsifter tool into a set of include and exclude regions to combine with your exome data. So here's an example of how this was useful for one of our UDP cases. So this is a 10 year old male complex neurologic phenotype. And we guessed that this was going to be autosomal recessive, although of course it could have been dominant as well. And we applied multiple filters as we discussed this morning and we didn't find anything. So we reanalyzed the data with new filtering tool that we're using, something developed by a person in our group, Marat Singhan, VAR MD, which automates some of those filtering steps that we talked about. And it also allows you to relax some of the filtering constraints. And we talked about the role of doing that in an iterative analysis process. And we found a candidate. Now this candidate had originally been thrown out because it didn't follow the rules of Mendelian segregation. So the child there that has AA, little A, little A, could not have gotten that little A allele from the mother because she didn't carry it. And the geneticist in the crowd will guess that, in fact, the mother was not homozygous, but she was hemizygous. And the SNP chip data confirmed that there was a small deletion that the mother had that she passed on to the child. And this was the cause of his disease. So single copy deletions may pair with deleterious sequence variants. Duplications can result in subtle but important changes in gene dosage. And you can create a bed file with PEN-C and V output incorporate this into your exome analysis. So this type of analysis can also find chromosomal mosaicism. I'm not going to spend a long time on this figure, but just to say that you can see here that there's more than the usual three populations we expect for this B allele plot. And this separation of these two populations of SNPs is because, in fact, you've got one that's triple B, one that's AAB, one that's I'm sorry, ABB, AAB and AAA. And this turns out to be a case of mosaicism. And this type of mosaicism can be quantitated fairly accurately, even more so than you can do with karyotypes. So consider the effect of mosaicism on sequencing quality. Homozygous and heterozygous base calling, as you learned this morning, or you may have known, use the relative proportions of short sequence reads with different genotypes. So if you have mosaicism, it's going to change those proportions, and it's going to affect the quality of base calling. They also may indicate regions of interest in the genome. And of course, they may be important in somatically evolving cells, like the cancer examples we saw earlier. A third thing that you can use SNP chips for is consanguinity and homozygosity mapping. So here is a normal B allele plot. We visited this before, and you can see that there are, as I mentioned before, a few gaps where they're at the centromeres. But here's a family that has many large gaps. And these are regions of homozygosity. And I can highlight those by putting some arrows in them. So there are many more than you would expect to see in a family that was completely outbred. And some work done by Tom Markello in our group, in his work, he took a group of families with a known amount of consanguinity, and he compared them to the total sum of linear length of homozygous regions in the genome, and he found out that those things correlated pretty well. So with this type of calibration, you can actually determine the amount of consanguinity using the chip. So once again, if you want to incorporate this into data, into your data, you can use the Illumina or Affymetrics or other platform and look at plots like we just looked at. There's also a tool called Plink out of Harvard, and among its many capabilities is the ability to auto-detect regions of homozygosity, which you can incorporate into your analysis. Here's an example of a case, one of the early UDP successes where we found a new disease, and in this case, it was the presence of a region of homozygosity that allowed us to find the gene. So you can identify these using Bealeal plots. You can look at just the variants that fall within a homozygous region, and in fact, that's probably makes up the bulk of the successful exome papers that are available is exome projects that were done with homozygosity mapping. And it may even alter the planning of next-gen experiments, and to be fully forthcoming in that we never did exome on that previous slide, that study because we found a region that we could just go and look at the gene without doing exome sequencing. But it can be incorporated into which variants you're looking at. Also, when thinking about the consenuinity level, the optimum is probably second or third cousins or a little further out. If the families are too related, you end up with many, many regions of consenuinity, and then that doesn't comprise a very effective filter for your variants. So far, we've been talking about analyses with SNP chips that you can do using intensity measurements, but you can also use Boolean tools. So Boolean tools are those such as the one that Jamie showed you in Varsifter, and Genome Studio also has a set of tools for doing Boolean queries, wherein you ask if a certain set of SNPs follows some rules that you impose based upon a genetic model or other criteria. And this is usually based on fairly straightforward genetics. So if the mother is AB and the father is AA, then a child who is AB had to get the B allele from the mother because she was the only one that had a B allele to give. And if we say that at an adjacent locus the same is true, then if some children are AA and some are AB, or some are AA, AB at those two loci, and some are AB and AA at the second loci, then a recombination is suggested. And you can see that here. So the parental genotype is AB, AB, and the recombination genotype is AA, AB. So to set up a Boolean filter to check for these is fairly straightforward with a small family. As you get to larger families, you need more rules to be able to incorporate all of the possibilities. So I'd like to contrast this a little bit with formal linkage analysis and positional mapping such as that that Dr. Bissecker talked about earlier. So classic linkage analysis usually uses fairly robust markers, tan and repeats and those sorts of things. They tend to be fewer and more widely spaced, so there's 440 of them and one of the most commonly used ABI sets for instance. And the analysis must therefore take into account the chance of a double recombination between those markers or other recombination event. Whereas with snipped based linkage mapping, the markers are admittedly less robust. They can be uninformative and the snipped genotype can be wrong. However, you have a much higher density of markers and so you have many assays to test for recombinations and at the end of the day they're dense enough that it makes the chance of a double recombination between informative markers unlikely. So you get out a slightly different kind of data. So this is this is a graph that shows the log of the odds ratio on the left hand side because this was done on a small family, the black plot here which is the log of the odds ratio doesn't go up to the high significance level of three that you would use to confirm a place in a whole genome. But really what I want to do is contrast the fact of this continuous plot with these discrete intervals that you get from recombination mapping using a chip. And when you look at these close up, once again you can see that there are these discrete intervals that are defined by the snipped based recombination map. So here's an example to show how we've used this in the past. So two children out of four are affected with neurodegenerative disorder in this family and six members were sent for snipped chip analysis and were sent for exome sequencing and these were somewhat older chips so we came up with about 112,000 variants for the family and recombination mapping was applied. So this is what the snips look like before we applied the recombination filter and when we filtered for only those regions which segregated in a manner that was consistent with our genetic model we ended up with this subset and were able to use that as part of a filter which is again was successful and at the end there's only two genes in this list and one of them turned out to be the answer. And here is how some of that analysis looks in varsifter which we used extensively for this project. So this recombination mapping requires a defensible genetic model and multiple family members but fewer than for a linkage study and it can be used to define segments of the genome that segregate according to a given genetic model and it can exclude segregation in consistent regions and their associated variants to comprise a filter. So what are some other types of data that you can integrate into your exome project? There's phenotype and pedigree information clearly so for phenotyping, phenotyping may implicate pathways which you can use for pathway analysis and we heard about that in the cancer talk earlier and that could be true for any sort of pathway that you can reasonably associate with the phenotype in front of you. It may provide clues for candidate validation and it may provide clues as to an appropriate genetic model. The pedigrees in the family history just as they are with linkage are a powerful resource for variant filtering but the phenotype is critical so if you get the phenotype wrong none of the family-based techniques are going to work and so you need to have a clear, unaffected status and penetrance estimation to use these small family methods. So the type of gene lists in addition to the cancer list that we saw earlier that you might consider would be mitochondrial genes, metabolic genes interacting with the given metabolite, pathways or clinical syndromes that have genetic heterogeneity are a large number of genes that cause the same phenotype and an example of one of those is hereditary spastic parapheresis which is one that we've discussed as a potential diagnosis in several of our patients and varsifters nicely can incorporate a gene include list. So if you do your phenotyping and you think oh this is developmental you can make a list of developmental genes and you can look at only those in varsifters. So here's an example of where phenotyping helped us to make a diagnosis. So this is a 19 year old female this is the brain of a 19 year old female with slowly progressive neurologic disease and her course was suggestive of a lysosomal storage disease and one of those is a GM1 ganglioncidosis however that had been excluded by the gold standard of enzymatic testing. Exome sequencing however detected candidate variants in that gene. So the combination of those molecular results plus the fact that we had a clinical suspicion for a lysosomal storage disease prompted us to go and redo the the gold standard testing and in fact that had been a an incorrect clinical result and this patient had enzymatic activity that was consistent with GM1 ganglioncidosis. So this is an expensive way to come up with a known disease I will grant you but in this case it was the combination of the of phenotype and careful phenotyping and the suspicions that generated plus the molecular data of exome that helped us. Another very healthy and ongoing debate we have in our laboratory is to whether to send single exomes or small pedigrees for a new family. So clearly single exomes are less expensive. The analysis is more straightforward. There's fewer tools required but you will generate more variants. I hope to show you that. A small pedigree is more expensive. The analysis requires some additional tools and expertise but you get fewer candidate variants. And again the point is that the filtration that you can do using that pedigree information has a low error rate if you have a correct genetic model and high quality data. And in fact you can think of two ways of using that family data. So you could ask me well why not just do a SNP chip and do the recombination mapping rather than doing multiple exome family members. It turns out that the candidate variants that you exclude using the recombination mapping is not exactly the same set as the variants you exclude by forcing individual variants to follow segregation that has been dealing consistency. And so you actually get a larger set of variants excluded by doing both of these things than you would from just doing one of them. So this graph shows has paired results. So for every line up in this red section there is another line down in the black section. All of the red traces are the path of filtering from the beginning of the ending of the project using only the proband whereas the black traces are using the proband plus the family members. On the left hand side is the log cumulative number of post filtration variants and along the bottom are all of the filtering steps. And importantly these last two aren't actually continuous columns these are all the heterozygotes and these are the homozygotes. And the point that I want to make here is when you only use the proband for these families we got between a hundred and a thousand candidate variants where when we included the family information we got around 10 sometimes a little bit more sometimes a little bit less. Obviously this is going to vary a lot with the particular project and chemistry that you're using. But I just wanted to give you some evidence that you get improved filtration by including extra family members. And here's another way to look at the same thing. So here's log of number of variants each one of these is a different exome project and I'd like you just to focus on the red the red columns. And the general pattern I'd like to point out is these smaller simpler families with trios ended up generating between a hundred and a thousand variants on average and the ones that had more family members included generated fewer variants. So overall this technique can be a very powerful way of filtering your data. So at the end we would say use exome data when you've got any other clues available or a single exome. So if you've got a pathway if you've got a homozygous region you've mapped if you've got a gene list then by all means consider doing a single exome because you've got other means of doing filtering. If you have no clue going into the family and it's a single family you may consider using additional family members assuming that good phenotyping is available. And for this type of mapping especially for recessive conditions it helps to have both parents and at least one sibling in addition to the pro band trios are less useful for recessive models in particular. So we've talked about a couple of different types of data integration. Use all available resources that you have to help filter your lists. For exome sequencing consider using SNP arrays for all the reasons we discussed. The study design should include as much in for a priori information as you can put in from careful phenotyping and family history. And really new approaches are coming out on a monthly basis. So you really should do a literature review if you're starting a big new project. So just a few words on sequence validation and reanalyzing projects. There's a couple of ways to think about sequence validation and I was very interested to hear that some of this was explored in the cancer talk. So you can think about one type of validation just to do Sanger sequencing to make sure whatever you found in the exome is real you pick a subset of things that you're interested in you may have to do some clear sequencing if you're returning results to families. The point that I'd like to make is the likelihood of the verification is in part based on your filtering techniques and I think that may have come out in some earlier talks as well. So for us for an autosome or assessive model that we've done the most stringent filtering on we can have 90 percent or more of the variants detected by exome analysis will verify. Whereas for an autosome will dominant model especially with new dominance and with less ability to filter the variants in some cases we verified 30 percent or less. So you may see some variation in how accurate the genotyping is based upon what type of filtering you use. And the point for functional validation that I would like to make is that functional validation is determining the biological effect of a variant and there are no in silico methods that can replace functional analysis in the laboratory for previously uncharacterized variants. So during the break I had talked with a couple people about about this pathogenicity prediction software like SIFT and Polyfen for instance. And this is a study from 2012. You can look it up the authors in the hand out there. But the point is that this confirmed data from earlier studies that all of these methods basically have a 10 to 20 percent false negative and 10 to 20 percent false positive rate. So what that means is that they're pretty good for ordering along list of variants to decide which ones to look at first but they're not so good in determining whether an individual variant is pathogenic or disease causing and shouldn't be relied on for such. And also from our experience editors will ask for evidence of functional consequences. There are papers out there where they don't but every paper just about I think every paper that we put out with exome data and they wanted to have protein and RA measurements or enzyme activity or rescue experiments or model organisms something to really show that you found the right thing. The exceptions are probably previously well characterized variants and maybe severe variants and well characterized genes but even for the latter you may have to have some experimental evidence. So what happens when you come up empty handed and we've already heard about this iterative approach which I would wholeheartedly endorse. You really need to revisit all of the assumptions you made along the line about who had the disease and who didn't have the disease what the genetic model was what the frequency of the disease was for some of your filters and you need to know what your technique measures and doesn't. We heard all through the morning about the fact that targeting, capturing, alignment, base calling all of these things collect only a portion of the total possible world of true genotypes in the family you're studying so you need to explore the sources of false negative results and study data quality and actual coverage. So here is one exon and plotted above it is the coverage and this is using a fairly old kit. This is getting this coverage consistency is getting better but you can see that the coverage varies a lot even across this exon and there are some known determinants of this phenomenon GC content sequence complexity and near identical repeats and changes in representation due to unequal amplification during the during the prep steps for sequencing. But I'd ask you just to consider what an average coverage means in terms of this type of granularity that may not provide you the quality the total quality story that that you need to go forward. Genotyping quality and completeness and exome sequencing is complex and it can fail differently than Sanger sequencing which is very interesting to those of us who are thinking about eventually reporting this out clinically. So you need to think about a number of different aspects of what went into your experiment and for many of them there are specific things you can look at for instance for targeting or what was actually the study was designed to capture you can create a bed file to show where all the baits were so you can see what was supposed to be targeted and what was not targeted. For capture and complexity this is a somewhat involved topic but historical data can be used for sequencing and alignment. You can use coverage and other metrics and also historical data. And for base calling you can use the MPG scores that we talked about and other metrics and also historical data. So clearly I've mentioned historical data three times and the reason I did that is because an accumulated set of data using the same techniques is an invaluable resource. They do expire as came up in the the talk before the the break. So if you've got something which was captured using much much different chemistry than what you're currently using it may be less useful but you should certainly accumulate and study data sets until they diverge as far as technique goes because they can be very instructive. An example of that is we use some of our previously collected data from the UDP plus some data that was kindly shared with us from Clintseek and we looked at several hundred exomes and looked for genotypes that were out of Hardy Weinberg equilibrium. So we used the Fisher's exact test and we used a Bonferroni correction because it was essentially doing a multiple testing hypothesis to look at all of these sites using a multiple testing protocol. And there are two error types that jumped out at us. There's some more complex interesting things too. But one of these error types was when all of the genome type calls were homozygous non-reference and that would suggest probably that the reference sequence is either wrong or has a minor allele in it. A second type of error that we saw is when all of the genotypes were heterozygous and that suggests that two similar regions were aligned together to form a compression and the few spots that they differ show up as heterozygotes and we could use this data from our historical data set to make an exclusion list for further filtering. We did another experiment where we asked the question given a set of genes associated with the known disorder. So this is the genetic heterogeneity question. How well are those things covered? And so we took 114 exomes from 27 families and we used gene lists. One was of a variety of muscle disorders and the second one was for hereditary spastic periparasis. And you should know that even though this isn't always true, sometimes clinicians will assume that if a clinical sequencing test comes back negative, then all of the sequence regions were sequenced with sufficient quality to detect all variants in those regions or put in simpler terms, I can take that diagnosis and set it aside and move on to another diagnostic consideration. For these two gene lists, and mind you, this is using somewhat older technology. So the targeted capture kits included from 47 to 73% of nucleotides within those gene lists. And this is probably lower than average, but it was true for these lists. And while the average coverage was high, 40x to 100x, 2% to 3% of the nucleotides had less than fourfold coverage. So my point here is not that these techniques are very, very error-prone and don't use them. My point is that you need to understand the assay characteristics and know what's missed. And here's a great example to illustrate that. So this is some work done by one of my collaborators and this was a full-on chariot race. The first one to identify the gene gets the good paper. And this was a large region that was identified by linkage mapping. And many, many genes were sequenced over a year or several years and nothing was found. Exome sequencing came along. Take the project out of the fridge just like Dr. Beeseker talked about and try exome sequencing. Didn't find anything. So then another member of the lab went back and looked for specific regions that had been missed by the exome sequencing. And it turned out that the answer, the causative gene, was in one of those sections that had been missed by the exome sequencing. And you may ask who won the race. Well, actually this and two other papers were simultaneously published in Nature Genetics. So I guess it was a tie. So for validation and reanalysis and summary, functional validation is required to prove that a candidate that you have is the pathogenic variant. And if there are no good candidates at the end of the analysis, use the iterative approach that's been talked about today. Revisit assumptions and analysis parameters. Use historical data and study the quality and coverage issues of your particular project using historical data if possible. And data quality is constantly improving, but that doesn't mean that each new technique won't have some sort of failure modes that need to be studied. So in conclusion, be sure to give plenty of time to experimental design, hopefully before you start your project. Consider using adjunct technologies to complement exome analysis like the SNP analysis we talked about. Phenotyping is critical, especially when using family data. Consider using additional family members in certain cases. Functional proof of pathogenicity is required and analyze the data in an integrative manner, altering your assumptions and filtering constraints as you go. So I have many people to thank. I'm not going to thank them all individually. These are from several different groups, including our own. I would say that Tom Markello, Neil Borkle, Murat Sinkhan, Karin and Praveen are sort of our core bioinformatics group for the UDP now. But we benefit from collaboration with other members of the NHGRI and other intramural communities. So thank you very much.