 Our next speaker is Steve Chin from Emory University. So Steve have done a lot of exciting work in statistical genetics, including his work in international HEPMAP consortium. So today he's going to talk about identification of non-coding risk variants associated with the complex diseases based on multi-omics profiles. Steve. Thank you. Thank you for giving me this opportunity. And today I'm going to tell you briefly about our recent work on the specific non-coding risk variant detection. So as you know, in the past decade, thousands of GWAS studies have been conducted, and tens of thousands variants have been identified to be associated at the genome-wide significant level to hundreds of different disease and phenotypes. So the big surprise here is that a majority of these variants are non-coding. So that's a big problem for us. Just like 10 years ago, before INCODE, our understanding on the non-coding region is very limited. So the huge challenge here is what is the biology behind this treat-associate non-coding risk variant? And related to that, how can we identify this non-coding risk variant? And because of the hard work of the GWAS, now the situation has changed dramatically. Now we have a lot of average genomics data from INCODE and other consortiums. So recently several methods have been emerged to leveraging this new resource and to do risk variant annotation, such as GWAVA from Sunder Institute, and CEDD from Jay Shander's lab and Egan, Genome Kenyan, and several other methods. So today I want to introduce you to a method we've recently developed, which is called DIVI. DIVI-specific variant annotation. Like the name suggested, our method built a unique model for each different disease and phenotype. So it's a DIVI-specific, unlike all the methods I mentioned previously. So in this method, we use the treat-associate SNPs identified by the GWAS training data. Specifically, we use data collected by the associated result browser as a training data. And as features, we use more than 1,800 genomic, average genomics data from INCODE consortium and the roadmap ABGNOL. And this includes ChIP-C data, BNAC data, and FIRE data, and so on and so forth. So the challenge we face in this product is that we have a huge collection of multi-omics data. And many of them are collected, right, same factor from different cell types and same, different factor from a related factor from cell types. And another challenge is that the training data can be very limited. For some of the diseases, we have fewer than 50 known variants. So it's a huge peak greater than N problems. And also the data types are quite diverse. So our workaround is the following. So we use some machine learning techniques like feature selection and ensemble learning. And very importantly, instead of using the binary indicator as a weather variant or that with a peak, we use a continuous rate count to add the features, which we believe improve the performance of the method. So some result, we compare to four different methods, GWAVA, CAD, CDD, IGN, and Genome Kenyan. And we use a cross-validation on 45 different disease phenotypes from a certain result browser. And we also conduct a test, independent test on 36 disease phenotype from a grasp, another database, GWAS database. So these are partial list of the disease we studied. So you can see it's very diverse. And the number of variants can be as many as several hundred and as low as 50. So these are some of the results. These are the ROC curve from the four selected disease types. And you can see our method with the black curve. I'll perform the other method. And overall, this is the performance of our method. This is the area under the curve for the ROC curve. And you can see it's not bad. So all the 45 disease give me AUC greater than 0.6. And some of them are as high as lower 80, so higher than 0.8. So which is pretty encouraging. And we also did an independent test on these grasp SNFs. So these are the comparison of four disease. And again, our methods seem to be slightly better than the competitors. And also, out of 36 disease, in 22 of them, our method shows the best results. So you might be interested to know which features are important in our studies. So these are the list. These are the number of features identified from each of the disease. The very, this end is body weight followed by stroke. And to this end, these are inflammation. And interestingly, you can see the number of features differ a lot, from more than 600 to as low as 50. And these numbers highly correlated with the number of variants discovered, not surprisingly. And also interestingly, we found that the histone mark is actually the most informative features among all the features we included in the test. And open chromatin, which actually come in second. The first one is histone mark. So several histone marks are showing on top. So in summary, our study demonstrated that the specific risk variant identification is feasible. And but to do that, we need to perceive a caution. We need some advanced learning techniques. And our method, divine, seems to outperform competitors. But on our terms. And also the same to me, it's advantages to using a read instead of peaks as the features. And we found that histone mark are the most informative class of features. So some knowledge, most of the work done by Lee Chen, a graduate student, which is graduating, and also a collaboration with Peng Jin's lab. So thank you very much. Any questions for Steve? Jason, are you talking about Mike Beers or Olga's method? Have you compared to methods such as FGWAS and Painter? They seem to be working on a more similar problem than the other methods you've compared to. Well, we didn't compare with Joe Keeker's paper. He has an American Society, an American Journal of Human Genetics paper, right? And his method is more on identifying G. Wasmarian, if I understand correctly. But we can talk about it offline later. Okay, so that was my question. So does your method is comparable to Mike Beers or Olga's method? They predict the causal variance in disease? Probably, we didn't look at other methods beyond these four. Okay. These four seem to me the state of art in this field right now. All right, cool. Thanks. Thank you.