 Now, again, today's webinar is Availability and Quality Assessment of Genome-Wide Genetic Data on 9,900 Participants in the CLA. Very briefly, let me introduce our speaker, Dr. Vince Forgetta. Dr. Forgetta is a research associate in the lab of Dr. Brent Richards, who is at the Department of Medicine and Human Genetics, Epidemiology, and Biostatistics at McGill. In the genomics and bioinformatics, his research interests include the management and analysis of large-scale human genetic data sets, the training of students, and leading research projects that investigate the role of genetics in human disease. Dr. Forgetta is also the system administrator of a local compute cluster at the Lady Davis Institute that services the research programs of three professors and their students and staff. So we'd like to thank Vince for being here, and I'll go ahead and turn over control of the webinar to him and invite him to begin. Thank you, Carol, for that nice intro. And thank you to the CLSA for inviting me to give this talk, and thanks to everyone for your interest in this first genetic data release for CLSA. I apologize on behalf of Dr. Richards. He wasn't able to make it last minute to give this talk with me. However, if there happen to be any questions I am unable to answer during the question period, he has also invited you to send him any questions via email if that's permitted. So the title of this brief talk will be the availability and quality assessment of genome-wide genetic data on 9,900 participants in the CLSA. I tried to make this talk as accessible as possible, so ended up not covering so many, many of the technical details that are in this genetic data release. However, for the genomics people in the audience, if there are, please feel free to ask me as many detailed questions as you want during the question period. So this work is the work of quite a number of people. As you see on this slide here, I would just specifically like to acknowledge the work of the Montreal Genome Center, specifically Rue Lee, Alexandra, Corrine, Giannis, and Mark Lathrop, who basically performed all the genotyping, so the acquisition of the raw data. And I would like to also acknowledge Parminder, Suzanne, and Christina for leading on the CLSA. I'll just give a brief overview of what I'll be talking about. First, I'll just go over briefly a rationale for basically why we would like to look at human genetics in CLSA, then give a brief overview of the human genome and whole genome genotyping followed by, I'm sorry, genotyping of the CLSA, then followed by the imputation of the genotyping data. In some example, some analyses we can do, such as genome-wide association studies and genomic prediction, followed by a brief conclusion. So here's the best example of, I can think of, of basically why we want to look at the human genome. The main reason why is that basically there are many traits, so there are many traits and diseases for which there is a large genetic component. And here I'll just show you a little story of basically the progress of human genetics in identifying regions in the genome associated to human disease and traits. So back in 2006, when we were first starting doing these whole genome scans, and by whole genome you see on this picture, which are the vertical lines, which are the chromosomes, in 2006 we found basically a handful of loci associated to human diseases or traits. If we skip a few years into the future, into 2010, as a field, the number of discoveries basically balloons to roughly 5,000 associations across roughly almost 1,000 different traits and has led to, you know, so many, many research publications. If we fast forward to the end of last year, there are now 69,000 associations to various human traits and diseases, and these things have basically directly contributed to our understanding of human biology as well as to the development of drugs and new treatments. The CLSA offers an excellent opportunity to study human genetics of disease for two main reasons, both because it is a very large cohort, one of the largest cohorts in the world, and secondly, the comprehensive assessment of many, many different traits and phenotypes and questionnaire data as well. And I'll have a more specific example of why CLSA is actually very ideal for this type of work. I'll present towards the end of the talk, so one of the projects I've been working on. So a brief description of genome-wide genotyping of CLSA. First I'll give a brief overview of the human genome and human genetic variation. As some of you may already know, the human genome is deployed, so we inherit one chromosome per biological parent. There are 22 pairs of chromosomes, of autosomes and two sex chromosomes. The human genome consists of three billion nucleotides, so A, C, G, and T, and encodes roughly 20,000 protein-coding genes. The human genome can vary between individual, and there are multiple different types of variations. So things we call SNPs or single nucleotide polymorphisms are basically a change from one letter to another one, which you can see in the example below. And there are other larger changes, such as insertions and deletions, and CNVs or copy number variants. Currently, the largest database holding this variation has catalogued over 100 million SNPs. And so in the example below, we can see there are three individuals. For example, I would carry potentially on one of my parental chromosomes, a C, and at that same position, I might carry a T, versus if you look at Ruiz, she inherited a C from both of her parents, and Brent inherited a T from each of his parents. So how do we assess this variation across one or more individuals? The idea is basically if we take the example from the previous slide and we would like to assay that particular position in the genome, you know, in three other people, let's say Alex, Corinne, and Janice, we would take a DNA sample from them. We would develop some sort of an assay that basically measures that position in the genome and measures whether it's a C or a T, and then run this assay on those three DNA samples and basically obtain their genotypes. So the genotypes are basically just the pairs of the alleles of the SNP. So in this case, the alleles are either a C or a T, and the genotypes are the readout per person. So for example, Alex has a C and a T, and Corinne has a C and a C. And these assays can be multiplexed on arrays containing anywhere from 500,000 to over a million SNPs. And this is exactly what we did for CLSA is we used one of these arrays. Specifically, we used the Afrometrix UK Biobank Axiom Array that was used to genotype one of the largest cohorts in the world, the UK Biobank. It contains roughly 820,000 SNPs on the array. The SNPs on this array target known loci associated with disease, target SNPs that are in genes, and more importantly, target a panel of variants that are optimal for mutation, which I will get into the next, which I will explain later on in the talk. In this phase of genotyping, up to date, we have genotyped 9,900 individuals. The individuals were genotyped in two batches of roughly 5,000 individuals each. And the list of final variants that are produced from genotyping is listed there, 794,409 SNPs per individual. I will briefly go over how we assess the quality of these 794,000 SNPs. We basically ran four tests. So the first test is, right, so in the top left graph, I will just explain the top left graph briefly very quickly. We see on the X and Y axis the allele frequency per batch. So basically the occurrence of one of the genotypes per person, right, over the entire population. So this graph contains roughly, sorry, this graph contains exactly 794,409 SNPs. And we can see that we would expect the allele frequency to not vary between batches. So what we observe is that there is some SNPs that vary and that are off the diagonal. And the goal here for the quality assessment is to try to remove as many of those SNPs that are off the diagonal. And we ran four tests to assess the quality of the SNPs. The first test was just to look for basically a batch level effect in the allele frequency between the two batches. And you can see that all the orange SNPs in the middle top graph are basically targeting the majority of the SNPs that are off the diagonal. And so these are considered of potential poorer quality. The next test was looking for Hardy-Weinberg equilibrium, which I'm happy to explain at the end of the talk if anybody is interested in knowing what exactly Hardy-Weinberg equilibrium is. The third test was, so similar to the first test, was to just look for discordance in the allele frequency. But in this case, it's for control samples that we have so inserted on all of the plates that were genotypes in CLSA. And the last test is to assess for discordance in genotype frequency between the sexes but on the autosomes, as we don't expect the frequency of the SNPs to vary between the sexes on the non-sex chromosomes. When we apply all of these tests and filter out all the SNPs that basically exceed our thresholds, we end up with the picture in the bottom right, which you can see now all the SNPs are highly concordant between the batches. And these are the SNPs that we recommend that people use for their analysis. All the SNPs along the diagonal total, 781,000 SNPs are 98.4% of all the SNPs. So while it looks like from the picture a lot of SNPs are excluded, a very, very, very small amount were of low quality. The data is actually of exceptionally high quality. Switching gears a little bit, now instead of looking at a per SNP quality assessment, we can look at a per sample quality assessment. So here, for example, on the graph below, I'll explain it shortly, there are 9,900 points, one for each individual. And the two metrics we looked at, which are on the axis, is the missingness rate. So per person, how many genotypes are we missing for that person? And on the y-axis is the heterozygosity. So basically how many SNPs per person are heterozygous, basically meaning their genotypes are not the same on both of the chromosomes. For example, in my case that I had given a while ago, I was a C in a T, so I would be heterozygous. And it's just accounting of how many of the 800,000 are heterozygous. And that's also a measure of the quality, because we don't expect to have, for example, very, very high heterozygosity indicates that there's an issue with the genotyping or the DNA sample for that person. So if you look at the plot below, on the x-axis is the missingness, on the y-axis is the heterozygosity. And what we simply did was we drew lines to define thresholds, and any points that lie beyond these thresholds are considered outliers, and we flag those as potentially being problematic. Also, what's interesting and what's as expected, the heterozygosity does vary among the self-reported ancestry within CLSA, which you can see via the color coding of the points. I'll briefly present some other population-level analysis that we've run specifically with the intention to generate information that people can use in their downstream analysis. So the first analysis that we did was for familial relatedness. So these are measures that are not recorded by CLSA at assessment, but we can measure them using the genetic data. And this information is useful for many analysis, either because they may introduce bias or you may actually want to analyze related individuals. Here we use a software program called KING to assess familial relatedness, and we obtained the observed pairs of related people that you see on the right there. The next analysis was to assess the population structure. The intention here is to complement the self-reported ancestry that's within CLSA, and also to provide a metric by which you can adjust Amsterdam analysis. For example, so genome-wide association studies, which I will present in a few slides, so use the results from this analysis to remove bias caused by differences in population between the individuals in CLSA. So briefly you can see the plot below on the X and Y axis. There's a labeled PC1 and PC2. These are principal components. They're basically measures, they're overall measures of the variation present across the genotype data across all the individuals. PC1 and PC2 basically account for the most variation when we plot PC1 versus PC2, and we overlay on top of this the self-reported ancestry. We can see that this variation associates very well with self-reported ancestry, which is again as expected. The next analysis we did was to determine the largest ancestral ancestry homogeneous subpopulation. Again, similar to principal component analysis on the previous slide, this enables us to focus the analysis on a subpopulation of CLSA that has a very similar genetic background, thus removing potentially some of the bias associated with attempting to analyze people of mixed ancestry, for example. Here we can just see the orange point on the graph on the left is basically the cluster that we've analyzed, that's the ancestral homogeneous population. On the right is just a measure of basically the variation left in that population in comparison to the entire cohort in gray, and you can see that the line is basically very low, which means that there is very little residual variation left in their genetic data that could potentially bias downstream analysis. Switching gears a little bit, I'll talk about the imputation of CLSA. So the purpose of imputation, so if you recall there are roughly 800,000 variants available, so per individual in CLSA, the goal of imputation is to basically get more genetic variants, so to increase the number from roughly 800,000 to potentially millions of genetic variants. And how we do this is we basically use a reference panel, a completely independent set of individuals that are genotyped for many, many millions of variants. And the idea here is that with these many millions of variants we'll increase the power of any studies that we do with these genetic tests and also enables some fine mapping. So basically to look for putatively socausal variants or SNPs that actually cause these and not ones that are merely associated with it. Below is a cartoon of how the imputation process works. So for example, if we have genotype data for Rui and Corrin for three of 12 SNPs, and we have a reference panel of individuals that contains all 12 SNPs, basically it's just a comparing and filling game. So we compare Rui's genotype data to the reference panel and find one that matches or partially matches and then fill in Rui's genotypes using the reference panel. And we do the same thing for Corrin. This is clearly a very toy example. The algorithm is actually sufficiently more complex than this but the idea is basically a comparing fill operation. CLSA was imputed using the haplotype reference consortium panel containing 32,000 individuals and 40.4 million SNPs. We imputed the CLSA using an online service called the Sangria Invitation Service basically resulting in the observed number that we have in the reference panel. So now we have 40.4 million genetic variants for 9.9 thousand participants. In the plot below, we simply compare the quality that we obtain from the imputation in CLSA to a cohort of similar size, in this case UK Biobank. And you'll note that if we look at UK Biobank, so on the X axis is the minor allele frequency of the SNP so how common that variant is in the population and on the Y axis is simply a measure of the quality of the imputation and we can see that while UK Biobank and it's full data set overall has a higher quality meaning that the line is higher, if we take a comparable random subset from UK Biobank of 9.9 thousand, sorry, yeah, of 9,900 individuals we get a nearly equal quality. And this is simply because the imputation improves when you give the algorithm more individuals. Again, that's a very technical detail and I'm happy to explain why that is the case during the question period. Now I'll show you how we can use the imputed data to run a genome-wide association study and what that looks like. So briefly, what is a genome-wide association study? It is simply we take each SNP and we just associate the genotype of the SNP across all the individuals to a trait in question. If the trait is a binary trait, such as you have a disease or do not have a disease, it simply returns to the increase in odds of disease per leal of the SNP versus the control group, which you can see below. For continuous traits, which is the graph on the right, it is simply a linear regression. So plotting a line through the points where on the x-axis you can see here, we basically just take each individual and plot them either on the left, middle, or right column, according to the genotype for that SNP. And on the y-axis here, I have a two-example using their height. So the height here is actually simulated, it's not real CLA say height. And you can see that people who have the TT allele are generally slightly shorter and the people who have the CC allele are generally higher. And so we just plot a line through this and this is what we are trying to find when we're running a GWAS, is a signal like this, basically a non-horizontal line. When we run the association study for height in CLSA, we obtain the results that's seen in this slide. On the x-axis here are 7.4 million SNPs ordered by their position on the genome. On the y-axis is simply a measure of the strength of association of each of those SNPs. So the higher the lines go on the y-axis, the more strongly we believe those SNPs to be associated with height. And here we see basically three signals at three genes and these genes have previously been identified to be associated with height. So it's an excellent positive control showing that A, the genotyping worked excellent and that the invitation also worked well. Next I'll show you another example. Using a very recent project that's ongoing in our lab is the genomic prediction of osteoporotic fracture. So being able to predict whether somebody is gonna fracture their bones using their genotype data alone. So we have developed a model to predict bone density using genetic data, which you can see on BioArchive or a recent paper. This model was developed in UK Biobank and we show that it can predict fracture risk. And the idea for using CLSA in this study is to be able to validate the results in an external cohort that has been assessed in a similar manner to UK Biobank. So on the graph on the left, we have the, for the 9,900 individuals in CLSA we have predicted their bone density in a similar manner that we have predicted in UK Biobank. And then what we would like to do in CLSA is to basically assess whether when we apply this genetic predictor on individuals and compare that through their assessment for fracture risk without that predictor to show that basically when we apply the predictor, the number of expensive bone density tests that somebody would potentially have would be fewer because we can potentially screen people who have very high bone density, who have genetically predicted very high bone density and say that they're safe and they do not need to be tested. And therefore incur fewer tests for people that are very, very likely to not fracture their bone but still capture the people that are at risk basically. And the reason why CLSA is a very useful for a study like this is a because it is a very large sample size in a very relevant population. It is assessed for all fracture risk factors including questionnaire data that CLSA has performed and other measures. And also more importantly as well is that they've measured bone density across the vast majority of individuals so we can compare our predictor to a gold standard within CLSA. I'll briefly go over how the data is formatted for release. So the data is made available using established genotype file formats. These file formats are in binary format so they're not in text format. Both to reduce the size of the download and also offers rapid indexing of the data so you can search for it using software. So you can search for SNPs or particular individuals using software tools. The directly genotyped data if you recall the 800,000 SNPs is roughly a two gigabyte download in Plink format. The imputed data is sufficiently larger. It's a 36 gigabyte download in BGEN format. Again, I'm happy to explain all the technical aspects of this at the end if anybody has questions. And the data can be manipulated, analyzed, you can run GWASs in it, filter, and so forth using software programs such as Plink and BGENX. Just to conclude, again, the current release contains 9,900 individuals, roughly 800,000 genetic variants, 40.4 million imputed genotype data. All of the summary statistics I've showed you including the quality assessment, the familial relatedness has been provided in supplementary files. And moreover, what I did not demonstrate here is an additional data set where we have the imputation for the HLA alleles. In a future release, which I expect to occur mid this year or later is to have an additional 10,000 individuals released to a total of roughly 20,000. And so thank you, I'm happy to take any questions at the time. Well, thank you. That was a really great overview and I'm sure that people have many questions on some of your slides. So please, we're all gonna open up this for a question and answer session now. So if you have any questions, please type it into the chat box menu and we'll go ahead and discuss it with Dr. Fajada. While we wait for people to type in some questions and ask you about the presentation, I'll go ahead and start the conversation. So how did the selection happen for the individuals who were genotyped? Was that just a completely random selection or was there some process that we didn't see going? If I recall, I think it was completely random. And do you have any details on how that was done? I do not. Yeah, okay. I was a little bit further down the chain. I just received the genotype data, but I was on calls about the selection and from what I recall, I think it was random. So they attempted to make it random? Exactly. And while you were talking about it, were there any tests that were done for any rare variants that would actually lead to known diseases? I'm thinking about kind of the ethics and the consent issues with reportable diseases, or I'm sure there was a large amount of thought process and information gathering that went into kind of the consent process for being able to do genotyping and protection for data and reporting those. The only analysis that we've done, which is in the QC document that's released along with the imputed data and the genotype data, which I didn't have time to mention in my talk, is looking at the copy number variants. So basically the large insertions and deletions, particularly on the sex chromosomes to confirm, to basically confirm any discordancy we see between self-reported sex and genetic sex. To note, no actual disease classification for things that are known, genetic variants of actual disease were done? Individual genetic variants, such as for Alzheimer's or things like that, no, we did not look at that. Okay, so there's no mandatory reporting if someone has, I don't know, a breast cancer gene? No, there is not, no, no, no. All the analysis that we've done so far is simply some quality assessment. And so there's no mandatory reporting back to participants based on? No, from what I understand, we are not, I wouldn't say allowed, but the intention is not that, right? Okay, just interesting aside on the ethics question. Yes, yeah, exactly, yeah. So clearly so, so this is why it was not analyzed, right? Because we weren't given the mandate to analyze that, so we did not. Certainly. There is a question here from Dr. Prashay. Please tell us the female male percentage split. I do not recall. I'm sorry, but if I recall, it's mainly female, but not by a large margin, like listen to the UK Biobank. Yeah, if it's random from the data, then it would probably be... Yeah, yeah, yeah, it's slightly more female. I'm sorry, that's a very simple thing I can look up, and I did not. And we've also typed into the chat box there about where you can get further information. So the data on the 9,900 participants in the Canadian Lunch, Children, City, and Aging is available at this link. If you go to our CLSA website under researchers, the tab for researchers, there's a data support document that deals with the genome-wide genetic data on this subset of participants. So I encourage you to search our website and look at the data support document that's been put together on this as well. So for Laura Anderson, thanks for the nice talk. Could you describe the logistics of working with the data? Are they accessed through a remote server? And is the CLSA already participating in any networks of pooled data that need to be considered when proposing studies? Ben, do you have any thoughts on all of that? Thank you for your feedback. So working with the data, basically the data is going to be made accessible, while it is being made accessible through CLSA as a download. So it's not analyzed on a remote server. You need to download the data onto your own computer. And it's typically analyzed using so fairly established software tools for genotype data, such as the ones I mentioned, Plain, Gambigenics. A lot of these tools work in Linux. Some of them work in Windows. So for example, I know Plain works in Windows. The data set at this point, genotype data can likely be analyzed on a laptop potentially. The data set's not that big. I'm giving the laptop as a sufficient amount of memory. But the imputed data likely needs some sort of a server to analyze it, like a Linux server. And what about the second part of the question? Is the CLSA participating in networks? Oh, yes, I'm sorry about that. From what I'm aware of, for the genotype data, no. So it's not something that needs to be considered. Okay, it doesn't need to be considered. No. So talking about the logistics of working with the data, one thing that occurred to me when you were going through it is how p-value thresholds were used for the quality of the data and kind of any tips or tricks for researchers that might be thinking about using it for GWAS studies? How, any guidance for kind of p-value thresholds? Yeah, so usually these are well-established some methods in genome-wide association studies. We simply use a Bonferroni correction for the number of independent tests. The typical p-value threshold that's been used for a GWAS study that includes anywhere from one to two million SNPs has been five times 10 to the minus eight. However, now that the imputation panels are getting more dense, so we're arriving at potentially seven million or 10 million SNPs. I'm happy to share a paper, but we had published a paper a few years ago where we compute the Bonferroni correction and it should be 1.2 times 10 to the minus eight. And that's what you use for the quality test thresholds as well? For the quality test, it's a little bit different because they were, the number of SNPs is fewer and some of the tests, it's fewer, but so I basically, it's in the QC document that's released with the genotype imputed data, but it's basically the number of tests I've done. So it's a number of SNPs, 800,000 times, if I recall, the number of tests. So there were four tests. And then multiplied it by the alpha threshold I used. They're very conservative alpha threshold of 0.005. Okay, thank you. I don't recall what the P value threshold was exactly for the quality assessment, but it was also very low. Okay, so when we're talking about the quality assessments, those four tests, kind of the big graph, graphical slide that you were showing, and then you compared it to the UK Biobank section. Is it a true statement that rarer variations would be of less quality? Did I, is there? In general, yes, they are more difficult to impute. But if you go, but sorry, however, the vast majority of rare variants, even down to a manual little frequency of 1% in the population, are still of very good quality. Yeah, yeah, so I think that's the takeover. The threshold we usually use as a cutoff for discarding SNPs for low imputation is 0.3. So the info value, the Y axis on that plot, the line would be at 0.3 for discarding SNPs. So we'll continue the conversation here a little bit longer, but I encourage anybody who has any questions, or concerns to type it into the chat box now. If anybody has any technical questions, then Vince can read it directly. So I encourage you to ask your questions now. So when you did your genetic versus self-reported sex cutoff quality check, anything interesting occur from that? Yes, yes, I mean, we found, it's in the QC document. We found 12 to 14 individuals that have basically sex-linked diseases. Okay, anything about... And it corroborates with their self-reporting, right? Largely. We see the people that self-reported having a particularly, having a specific sex-linked disease, then we basically found it uses genetic data. For any kind of sex or gender studies, would there be any utility for using genetic described sex versus self-described, self-reported sex and gender issues? Have you thought downstream for any of those issues? I mean, it really depends on the type of study. For GWASs, we usually use the genetically determined sex and not self-reported, just because we're looking at the biology, right, and not necessarily you know, some sort of more environmental factor, right? Yeah, but you can see for some hemologic studies it might be kind of an interesting checker. Yeah, definitely, definitely. And I think maybe that kind of goes along with the familial relatedness issue. So you recommended to remove the biases using subsets of populations that are homogeneous. But does that lead to any issues about kind of minimizing genetic variants or looking kind of cross-genetic variants? Well, I mean, it's very, it's kind of, for GWAS, you know, the GWAS community tends to be exceedingly conservative when they do their analysis. And so one of the ways of being conservative is to try to limit any form of bias that you can introduce in the age. But now, you know, not that the algorithms are getting more advanced and people are interested in actually looking at the association studies of like other ethnicities, for example, then there are software programs and there are researchers out there that are actually specifically targeting, you know, those people of mixed ancestry, for an example, or other ancestries, and analyzing all of the data together, right? Certainly, yeah. So it's really more, we just basically did the analysis trying to give people something that they'll find useful for the downstream analysis. But again, it's their choice to use these results or not, right? Certainly. At their discretion. I see there's a question in the chat. Are there plans for whole genome sequencing? I think there are plans for some whole genome sequencing from what I understand, but I don't have any timelines on that. Okay, well, you did a great job on kind of giving the overview and, you know, showing how really interesting this new set of data is for people to coming out. Do you want to touch upon any limitations or cautions with you, think the data said? No, I mean, people are free to use it. You know, the only limitation is, you know, is that if you don't have a lot of prior knowledge in using genetic data, it can be rather daunting. I don't know how people, you know, if they have questions, I'm very happy to answer people's questions if they need help on either the software or basically what to make sense of the data. I'm more than happy to help people. I'm not sure what is the most efficient way to do this through CLSA. Maybe go through the access portal first, right? Yeah, I think so. Ask me questions and, therefore, we could keep track and, you know, like build an FAQ or something for people coming. Certainly, I think that's the plan. All right, a couple more quick questions here. Are there plans to break out any nutrition-related SNPs? By breakout, meaning, no, there are not. I mean, the question's not very clear to me if you mean, like, separate out SNPs and flag them. There is not. But because we use the UK Biobank Array, you can maybe go to the Affymetrics or Thermo Fisher website or to UK Biobank, and they basically describe the contents of the array and there might already be SNPs that have been flagged as being nutrition-related. I see there's another question. In terms of phenotypic data, what is the general rate of missing information at follow-up? I'm sorry, I do not have that information. Is first follow-up out yet? Carol, do you know? This coming data release will be the first call for follow-up. Oh, okay. So I'm sorry, I don't have that information on what the missing information is to follow-up. So our run will try to maybe type an answer to you as well as guide you to the website when a follow-up to data is available to look at that in specific. There's also a report chapter for P-HAC that talks about future prediction of missing information of follow-ups. Okay, thank you very much. I greatly appreciate your time and I enjoyed your webinar very, very much. I think people as they utilize the data will have lots of questions. As you say, we'll start trying to get those questions answered and maybe put together a question and answer sheet for them as we work together for the future. Thank you very much. Thank you. All right. I'd like to remind everyone that the CLSA Data Access Request applications are ongoing. The next deadline for applications is in February 25th. Please visit the CLSA website under Data Access to review available data for their information and details on the application process. I'd also like to remind everyone to complete their survey located under the polling options. If you have any questions or concerns about the poll write it into the chat box and we can help you go ahead and answer this important survey to help us guide future webinars. And remember that the CLSA promotes this webinar series using the hashtag CLSA webinar. We invite you to follow us on Twitter. Finally, next month, our monthly webinar in February is on characteristics of caregivers and care receivers in the Canadian International Study on Aging by Dr. Deborah Sheets from the University of Victoria. So we look forward to that webinar and we invite you to go to our website and to register for our entire webinar series and for this particular webinar. Soon it will join us for the rest of our 2019 webinar series. Thank you again for attending today's presentation.