I'm Shuji Kawaguchi of Kyoto University, not Sushi.Today, I want to talk about the method of prediction of causal genes for Mendelian disorders.For Mendelian disorders, identification of novel causal genes is very important to the genetic diagnosis.It works.However, the actual success rate of genetic diagnosis is yet around 30%.Also, four-genome frequency technology is improved and makes low cost.この問題をアドレッシュするために、Novell causal genesをプレディクトするために、4-genome sequence and AI technology based on IVF Wattersonを使うために、Ratinetis pigmental cellsの勉強を促すために、Fujibus study of the developed method of Latinatis pigmental cells.So, our final target is improved diagnosis rate to more than 70%,but our research is half way.This and the next slide shows why a detection of causal genes is difficult.By searching the four-genome sequence,we can detect a variant specifically to the case.So, we set this gene as a candidate and then advanced analysis such as to the data of familiar data,what protein interaction, what metabolic pathway, what ontology,by researching such information.And then we decide this candidate is causal or not.But there are many variants when we analyze the four-genome sequence.So, we must do advanced analysis for all candidates.Then we decide this candidate is causal or unrelated.But unfortunately, all most of candidates are false positive by a chance.So, to analyze all candidates, it's very time-confusing and expensive.So, we want to get back breakthrough to identify optimal candidates.Then we use IBM Watson for solve the problem.There is Watson-Wadrack-Discovery-WDD.WDD is one of the solution of IBM Watson.WDD incorporates tens of millions of articles in Medlineand discover relations between genes, disease and drugs.Then predictive analysis is one of the function of WDD.Then, prediction analysis, PA needs two lists.One is known gene list and the other is candidate gene list.The predictive analysis is ranks the this candidate gene listby using similarity of known gene list with candidate gene.However, WDD does not work when input all gene candidates as candidate gene list.Appropriate selection of these lists is important for predictive analysis.So, we created these two lists as a first creation of a known gene list.Several years ago, our group, such as the causal variance of Latinitis pigmentosby target to exome analysis of 365 genes against 326 RPs patients.We detected the causal variance in 30 genes and 122 cases corresponded to 37.4% of genetically diagnosed.There is another information called retinate database.Retinate database provides information of genes and genetic loss causing inherited retinal disease.In the database, 90 causal genes of RPR registered at September of last year.Then this past study and retinate database is suitable to known gene listand used to create candidate lists.I will explain next slide.This slide shows the creation of candidate gene list.Then we used the whole genome sequence data of 523 RPs and 2,143 controls.Then we decided the criteria for variance to fit to known causal genes.Then we decided criteria for variance as follows.One is stop gain or splicing or variance is stop gain or splicing or frame shift.Second is further non synonymous mutations.4 or more of 9 protein function prediction software predicts the mutation as damaged.Third is minor refraction in RP case is 1.5 times greater than than in controls.Then if at least one RP case of mosaic or compound heterozygote was available,satisfying above one and two and three conditions,we pick up this gene at this scene.1022 genes satisfied this condition 4.Among which 34 genes were known causal genes.68 is RPs and 1 is control.So this criteria is very fit to known genes.However,G other 994 genes,435 is RPs,but also 423 controls is also having variance in G.Ahmotygote or compound heterozygote in genes genes.So this criteria is also having high horse positive date.But I think many true causal genes are included in this 994 genes.So we set these genes as candidate lists and rank them by WDD.This slide shows the workflow to identify causal genes.First input known information and set these as known gene lists.Then by using 4genome sequence dataset,create criteria for variance and create the candidate gene lists.Syuji,how do you show when it's a causal gene?How do you prove it's a causal gene?This is not decided by ours and RP known gene is recorded in database.Then this database was created by other papers or research results.Then this known is already...If you find a new one though,how do you prove it's a causal gene?How do you demonstrate a gene is causal for RP?Can I answer for him?I think it is at the end.Whatever you will have as the top candidate,you will take it for genetic diagnosis.But that doesn't prove it's a causal gene.So some is by checking the toriel data or family data.If you see it,segregating something like that.So then set two lists to Watson-Drock discovery.WDD sort and calculate the square and sorting these candidates.Then we use two strategies for WDD.And it I will explain after.Then we tested the developed method by using the RP case.Some of them are already diagnosed by known genes.Then there was 326 patients and 37.5% cases are already diagnosed.And then we performed the whole genome sequence to rest 135 cases.And the rest is now ongoing.So we adjusted the rate of diagnosis as forth.Here is a calculated diagnosis.Then blue solid line is an ordinary use of.Ordinary use of WDD.And this zero point shows the diagnosis rate by only using known genes.And the monotony diagnosis rate was increased.But by ordinary use of WDD,after top 50 ranked candidates,4th positive rate is also certainly increased.I think this is because lower ranked gene is effective by these top ranked genes.So we use other strategies called recursive method.At first,WDD ranked only top 20 genes.And picked up these top 20 genes.And removed these 20 genes from candidate list.Then WDD ranked again,Ranked again list candidate list.And then WDD ranked top 20 genes.And then removed these top 20 and continued until fourth positive rate is reached at some threshold.By using the recursive method,4th positive rate is back to very low ranked candidates.So by using top 80 ranked candidates,diagonal rate is improved to 52%.Then we checked the top 50 ranked genes.In top 50 genes,17 genes are causal genes of other retinal diseases.Indeed 90 of 37 RB causal genes are also the causal of other retinal diseases in retinal database.And 4 of 50 genes are in the same gene family to which known RB causal genes.And we used the known causal genes list at September of last year.Then after two genes were very recently added to retinate.So I think we don't say all of top ranked genes are truly causal genes.But WDD seems to rank candidate genes correctly.That takes some ways.So we developed prediction method of causal genes for Mendelian disorder by using whole genome sequence and Watson for drug discoveries.Then we do feasible study of RPs.And then we find that many top ranked candidates share structural functional features with known RP genes.And two top ranked genes were registered very recently in retinate,which suggests that our AI assist approaches useful.Then by including top ranked genes,diagnosed rate were increased 50% without increasing the false positive rate.So finally we want to introduce the vision of the developed method.Our group, sorry, okay, time is up.Our group tacked to construct integrate platform register for this code rather than develop method also integrate in this system.So I want to say many thanks for collaborators and thank you for your attention.The radar questions.So is it possible that your false positives are coming because of penetrance issues?So it is possible that the mutation exists but doesn't express itself into disease.This is called penetrance.The penetrance of the disease.Have you heard of that function?Sometimes the mutation exists but it does not express itself into disease.Even if the program work credit is optimistic.Can I just say something about all these short talks?I asked the speakers of the short talks and the workshops to also have their abstracts.So the idea is to draw you also to their posters.So if you're interested in more discussion, they should also stay there, stay in their posters.Do you focus on exons rather than intragenic variation?Yes, I only focus on exon.It's another level of difference.The other thing that's weird is in yeast, a lot of people are finding that synonymous mutations actually lead to a phenotype, something like 25%.So it's always intriguing to think of all the variation that we're ignoring when we do these studies because it's so difficult to capture it all.You've got a very good point.I'm going to collaborate now here.For example, if you have mutations in a very important part of the regulatory element,and if the patient is heterozygous for that mutation and heterozygous for the protein calling mutation,then only one element can be transcribed into the home zygote.It's very important.Just a quick one.Let's be sure that you said it.How many cases of lateinitis, pimentodous, are considered to be of a genetic origin now?How many?What fraction?Can I ask a question for you?I'm sorry.It is considered as at least 50% family cluster or hereditary.And the other 50%, we don't know much about that.It can be the novel mutation.It can be because of the lack of the information of family.So we don't have very clear score or number how much fraction of the disease is hereditary.But it is said more than 50%, which is generally accepted.But not yet correctly checked the variant.It is a test simulation result.Assumed that top-ranked genes are causal genes.We are not considered only a recessive model, not included dominant model.But because the dominant model is more difficult to predict.Because positive rate is very improved.Because there are many variants if we assume the dominant model.But it is future work.Thank you very much Shuki for a great time.Thank you very much.