 Great good morning everyone. I realize I'm the last talk before lunch, but also notice we have an hour of discussion So if your stomach is rumbling there are still snacks out there I assume so I apologize, but I'm going to take my full time So thank you very much for the organizers for inviting me to talk about this I'm going to talk about farm cat which is a project that I've been Co-leading for about the last year and a half with Terry Klein from farm GKB So the this project came about through a series of conversations About two years ago. We started to try to think through implementing Pharmacogenomics at geisinger. I had just moved there Mark had been there. There was an e-merge project that was you know starting to do implementation of a few variants And I said well let me ask around surely there must be code out there at this point the CPIC guidelines have been around for a Couple of years lots of places are implementing pharmacogenetics So someone must have written a script to take the guidelines and the dosing recommendations and just take a VCF file and Annotate everyone so that you know which people in your data set have the variance of interest so that you can just implement them So this has to have been done. So I started asking around No one had done this yet except for their one or two variants that they had implemented at their institution So I contacted Terry we were chatting about it And I said well did farm GKB do this and she said nope. We don't have the funding to do that yet Okay, so I started asking a few companies. So companies have these algorithms for their product. So whichever Variants are on their assay that they sell they've done this. So oh, well, could we get a copy of those scripts? Nope, that's proprietary Okay, so if a medical center an academic institution wants to implement these we either have to Buy the assay from a company and get their output or write our own scripts And so Terry and I talked and said we should be able to do this One you shouldn't have to go to a company to get this done And if even if you did and some of you will know this if you've done this You might even get different results if you run the I won't say any company names because I don't want to offend any of them but if you run two different commercial assays you might get different star allele assignments out Well, why is that it's because of the haplotype tables in the CPIC guidelines and which Specific variants they use to define the haplotype haplotypes if they used all of them Which is what you should do or whether they picked only a few and it depends on the assay that they ran So Terry and I decided to embark on this project to create this automated pipeline for annotating these pharmacogenetic variants out of CPIC and making it open source and publicly and freely available so that everyone that wants to implement pharmacogenetics can use the same Source code to do so instead of getting different results from different companies So the basis of the data that we're using or the information or knowledge for our software it comes from the CPIC guidelines We are not creating any new guidelines. We're not deviating from what's in the guidelines Currently our goal is to take the published guidelines and write computer code to then Annotate VCF files with that information and then produce reports and I'll show you that in a few slides So there are currently 28 CPIC public publications those correspond to about 33 different Gene drug guidelines that has to do with the number some of the the genes associate with multiple drugs And so this is the source of knowledge that we're using to link into farmcat And I'll show you the workflow of farmcat in just a few minutes So a little bit more about the motivation and why we decided to do this Automating this annotation process is really important Generating consistent reports so that if you run the same VCF or similar genotype data You should get the same dosing recommendations every time and so Why is this something that you'd want to do in an automated way? Well first the number of people that carry these Pharmacogenetic variants of interest is a large part of the population so we did some work in The Emerge Network as part of the Emerge PGX project So this was actually the first half of the data that we generated in emerge So it's about 5,000 people we use the PGN seek sequencing assay And then we looked to see out of the CPIC level a guidelines We just took the genes from the guidelines and those variants and said how many people have One or more of the variants in these genes that we know are important for pharmacogenetics And it was about 96% of the 5,000 people had one or more variants No We did not go through the the process of doing the haplotyping and figuring out what the recommendations were for all of these The Emerge sites have done that for the gene drug pairs that they've chosen to implement But by and large Almost everyone had at least one pharmacogenetic variant of interest So this is not something that you're going to want to manually do in a health system You can't go through everyone's genetic data manually. You have to automate this part now. We did talk earlier about how These pharmacogenetic variants only matter in the tails and so this almost seems counterintuitive to that So almost everyone in this room has a genetic variant in one of these pharmacogenes of interest However, that variant is only important at the time that you're prescribed one of the drugs for which that variant is Indicative of a response. So if you never go on the medication You just carry that variant and it doesn't do anything or at least we don't think it does anything So it's only the combination of the variant and the drug that become important And that typically is in the tails of these population-based studies. Not everyone goes on all of these medications So the first reason you're not going to want to manually do this because the majority of the people in your data set will Have a variant in one of these genes. That's a lot of data to look through manually and Then second the assignment of the haplotypes using those tables is actually pretty challenging That has to do partially with the phasing of the haplotypes So most of the assays do not provide phase or which chromosome the alleles are on if you run a snip genotyping assay You know the genotype, you know, the person is a an AA or an AC at that particular position But at another position, you don't know if the genotypes are on the same chromosome or the other chromosome That can become an issue for some of these haplotypes because it may be that it's important when they're phased together And then secondly, it's the number of alleles that you have to use to define the haplotypes So if you look through the CPIC tables There are a few variants that are typically more common and maybe not common like 30% some of them are but some could be more rare Like, you know 5% 1% But then if you keep scrolling there are lots of other very low frequency variants in some of the tables And if you didn't genotype those you wouldn't be able to make the haplotype definition that include those So a lot of people just skip them because they're rare anyway But if you read in the guidelines some of the kind of caveats or their asterisks next to some of them Some of those rare variants actually very much change the dosing recommendation So if you have one common variant and then this other rare variant You might go from an intermediate metabolizer to a poor metabolizer And so knowing those rare variants is really critical for making the definition and figuring out which dosing guideline to use And so that would argue that doing some sort of sequencing would be preferred over a genotyping chip So that you get all of those rare variants We had a conversation a side conversation earlier about how we probably don't even know all of the rare variants in those genes That are important and that's true. I'm sure we will continue to identify and discover new rare variants But there are a number of them that we do already know and so if we could sequence these genes And that was part of the reason for PJR and seek being developed So that we could sequence these key pharmacogenes pick up the common variants and the rare ones that are in the guidelines What Terry and I decided to do was to bring together the pharmacogenomics community to help us with this effort So we held a series of meetings. We included stakeholders from The pjrn c pic clingent and merge p-star and farm gkb We talked a lot about how we would engage in this process and we decided early on that We wanted this to be open source all of the code would be posted in github and shared publicly It's already there though. It is in alpha mode Which means it's not ready to be used in real time yet But all of the code is there so that you can see exactly where we are in development As I said, we had a few meetings So the first meeting was just to bring kind of thought leaders together and discuss is this even worth doing Is this a problem or have we assumed this is a problem? And everyone agreed that this is an issue and that that creating a tool like this would allow for pharmacogenetics Implementation to be adopted much more readily We then got a group of programmers together for about a week and a hackathon and had them just start putting the prototype together You know, what what are the pieces that we need? What would it take we had a lot of conversations about? Well, what if they have an exome? What if they have a genome? What if they have a GWAS chip and just decided we're using VCF files We would require users to do whatever reformatting they need to do to put in phased VCF files So that's the assumption that it's a phased VCF file and we have some kind of text that we're putting together to explain to people You know where to go to do file conversions if you had a you know, Plink format GWAS chip File or where to go to do phasing if you have unfazed data But farmcat won't do those two steps. It will just take the phased VCF We got back together in May kind of talked about where we were and what else needed to be done And then again in January of this year to kind of see where we are Talk about kind of the last steps towards releasing version one and applying for funding to keep this going So this is the general workflow of how farmcat works We take the allele definitions. So these allele definitions come from the CPIC tables Now if you've worked with these guidelines, you know that the CPIC tables on the CPIC website are Excel files We can't compute directly on those and so we wouldn't have been able to do this without all of those tables being in database format at farm GKB so farm GKB has been actually like the Kind of cornerstone of being able to write the software we take the tables From the CPIC guidelines and process them into a leo machine readable data We take the sample genotype VCF file We do some processing of that file to create kind of this normalized file that then goes into the Hapla typer The Hapla typer is the code that takes those Hapla type tables from CPIC and the genotypes and does the annotation and I'll show you an example of one of those files Looks like in just a minute it makes makes these allele calls and that's a JSON file that can go into the data reporter One other note that I want to make is that sip 2d6 does not get processed in the standard way Sip do sip 2d6 data gets processed by astrolabe, which is another tool that requires a separate license and it's released by Children's mercy in Kansas City and they've already done a tremendous amount of work to take those allele definition definition tables and turn them into something computer Computable we didn't want to duplicate the effort They've already done that part and so we've written it into farm cat to take the allele calls out of astrolabe And they go in along with the CPIC guidelines the recommendations to create the reports And so the CPIC guideline annotations are also in the farm gkb database They are pulled directly from CPIC into the database so that they're computable and we combine those with the allele calls to create the reports So the genes that we're including in version one are shown here We started with kind of the CPIC level a genes But also needed to remove a few so sip 2d6 as I said will be done through astrolabe haplotype calls HLA B and G6 PD are kind of in gonna be in a later version not in version one This was a decision not that those genes are less important they're just more difficult and We wanted to release something for the community to start to test rather than waiting until we had everything This is the kind of making sure that we don't make perfect the enemy of the good and So this set of genes can be processed in version one or will be and 2d6 can Be used and generate reports as long as you have the license to astrolabe So as I mentioned earlier, it takes the CPIC haplotype tables and this is an example of one of those files excel files From the CPIC website, so this is one of the kind of background pieces of input that are used Along with the dosing recommendations Again these come from CPIC, so we're not rewriting any of the recommendations and the haplotype or takes the the haplotype table from CPIC and your genotype files and Processes it into this kind of intermediate file So it will assign what alleles you have at every single position in the haplotype table It will also indicate whether you are missing calls at any of those positions so if you did do a snip array and you don't have a few positions it will indicate that and Then it combines with the CPIC guidelines to generate the reports So this is an intermediate report here You see all of the chromosome positions the RSIDs if it has one and then the genotype calls at all of the Positions and as I said if it's missing it will say no call and that becomes very important in the interpretation of some of the Guidelines, but this particular example of this individual had all of the genotypes present And this may be hard to see From a distance, but these slides will be available. This is an example of our current Kind of report format. We're still doing a little bit of tweaking, but you'll have each drug For which a particular gene is indicated. So here's a set of drugs the gene Whether or not there is a diplotype or haplotype at that location the allele function the phenotype and whether there were un-callable alleles so the one of interest here is I think Here's a sip to see nine which has a war friend. They're a poor metabolizer and they did not have any un-callable alleles for example Oh, and I should also say the color actually can see better here the color of The drug is also related to whether or not there is something important in the report So green means that they're a normal metabolizer. So not necessarily draw attention to it red means there's something to see here You should look at this one because this one is You know not normal metabolizer We also have a lot of kind of caveats and warnings. So this particular one has you know, please check the allele calls It wasn't that there was a no-call, but there was something related to this one that has an issue There are others that have things like un-callable alleles So to summarize farm cat stands for the pharmacogenomics clinical annotation tool And I should say that this largely not only did this sprout out of the conversations that we hadn't Developed this yet in the community for pharmacogenetics But one of the other things that I do at Geisinger is to take the ClinVar tables and take our exome data and Annotate people's VCFs with what variants they have that are four stars in ClinVar or three stars in ClinVar Two stars in ClinVar so that we can look through them and figure out You know which people have variants that that we should be looking at and I thought well Why didn't we do this for pharmacogenetics? We should do this for pharmacogenetics and so it really kind of sparked out of a lot of the ClinVar related work as well So this pipeline will automate that VCF to haplotype to guideline and reporting process I think this kind of sits in between The work that that Mary and Heidi talked about kind of these databases and resources so farm gkb Cpik ClinVar these are the the knowledge bases and then digitize what Sandy talked about You know in order to have the things to digitize The goal would be that farm cat will kind of do the annotations and generate the report that can then go into the EHR So farm cat version one is in testing our goal is to release it to the community Very soon to get feedback and have people start to test it and use it The reports can then be adapted as I said for local EHR implementation As I mentioned earlier farm cat is in get hub so you can kind of look there to see when we're releasing We'll also post things on the farm gkb blog and tweet about it once we do have the version ready for testing And so lastly, I just want to acknowledge that this was very much a community effort So it's something that Terry and I have been leading, but we wouldn't have gotten to this point without the members of PJRN and Cpik and Emerge who not only came to the meetings to donate their thoughts But some of them sent their programmers to donate their time to work on this So this maybe it's not a donation because they are funded on the projects But farm cat itself did not have a grant yet So Terry and I are working on that but it's been funded as part of kind of farm gkb and p-star and cpik Kind of related to those. I might also specifically want to name Michelle Ryan Lester and Catherine at farm gkb have really been the workforce that's making this happen in terms of writing the code and Being the expert curators to make sure that you know when we run the code that what we get out makes sense That those are the recommendations for those alleles so that their interpretation has been invaluable And so I'll stop there and take questions if there's still time. Thank you very much This is a photo of the programmers at the hackathon Question There's a wonderful talk very very interesting And I hope very productive approach. It could really speed things up a lot successfully so if if If I understand the architecture correctly if if I'm interested in just a few Pharmacogenomic sites Or or loci then I can create a VCF file that just has those variants in it So some but something that's like associated with a panel or something could could simply fly through the same pathway Yeah, absolutely now the report will say you know all of the genes that you have a no-call for right But that's fine if you're only interested in a few But part of we talked about whether it should start out. Well, you pick which ones it does And I think our mindset is that the community is gonna go towards you sequencing or a panel to Peter's earlier point We shouldn't be testing them one at a time. We're probably gonna start to test all of them So we wanted to try to be forward-thinking and develop the tool to do all of them Which would still do the one-offs for the people who are still doing just a few Thanks This is a really great Maryland for those of us that have tried to had to do this at our homegrown systems It's a tremendous amount of work. So thanks The question is about if you have a key allele that's not called in a gene, you know Something that's quite prevalent except 2d6 star 4 something like that star 10 Is it do you think it's enough to just report to the user that there's uncalled alleles? And then it defaults to the star one or should there be some higher Alert there that saying you're missing something and it could be a real risk to the patient Giving them a false sense that they have a wild type It's a good question and I may ask Terry to also come up to Mike because we talked about that quite a bit I think the what we've come to is saying Well, we will show in the report given the data that you provided This is the recommendation caveat and then add to the report these are the things that you didn't have that would change that and You know, I think I guess I don't know I don't want to I don't remember how strongly we worded whether we recommend that you Genotype or sequences other alleles dick. Can you speak to that? So? Where we do know that the information exists and you haven't tested we're pretty strong in our in the caveats And this is because as you know most companies that do this kind of testing default to Wild type or star one star one Which is what we're completely trying to avoid so that if you go in with limited data and we know for a fact for example You know that it's a deletion has a huge effect. We just couldn't overlook that aspect of it So the caveats out of farm cat you may not like them From the standpoint if you're reading the report, but it has everything in it And and it's almost like if you have this then you have this and if you don't have this then you've got this and in some Ways you almost need a computer to read it But it's it has all the caveats that you need and I think that that's what's critical and it'll be critical for things like Clinvarr and clingin because that way if you're pulling something out of one of these other sources You know what you're looking at and and where the holes are right? It's the idea of if you think about an excel An excel template or a cell when you have a blank cell You don't really know what that means does that mean that you don't that there's nothing there So one forgot to put it in that it's not applicable, you know that range So we were very explicit all the way throughout so actually related to that the VCFs are Not the variant only VCFs, you know a lot of people because the files are so large You'll end up kind of anybody who's ref ref you just remove those and your VCF is you know Non-referent alleles only we specifically say we want every allele And because it's a small set of genes the VCF won't be huge if you subset it to only the genes in here But having those ref ref calls is important because it's not that they're missing you know whether they're referent or not So I wanted to follow up on what Marilyn and Terry had said is that I think from the clinical implementation perspective I don't see the intent of farmcat as being able to be plugged in and used clinically I think it's going to be a knowledge resource. That's going to be really important But there's going to have to be a clinical decision support much as Terry referenced It's not going to be something you're going to be able to look at the report and probably be able to utilize It's going to be something that ideally would feed into a clinical decision support Tool that would then have logic built in to say if there's a caveat there Our clinicians have decided that this is what we're going to do to message the clinician about how to deal with this particular piece of information but to get to the point where we actually have the data where we can use it for clinical decision support to Quote someone very famous is huge. I Think it's going to be twofold though I agree with you from a clinical decision support particularly when we talk about You know a health system, you know, absolutely You're not going to hand someone a report, but I also think that these reports are being written in a way That when you're talking about annotating a single genome that an individual could read it And so I think it's a bit of both, but you're absolutely correct particularly as we think about these larger systems Great so, thank you Marilyn. We're going to move into the discussion phase