 I'm Leila Buzba and thank you for attending my lightning talk on dope, the drug ontology parsing engine. If you've ever had to look up information on a specific drug like methylphenidate, also known as Ritalin, you've probably visited sites such as the National Library of Medicine or have visited a drug database like drugbank.com or if your drug in question was illicit, you've probably visited the DEA's website. You've also probably realized that it's a little tricky to get the information out that you need and after a couple hours you eventually get what you wanted, but how was your experience? Most likely it was boring. You may have also realized that key pieces of information that you need are spread across the resources. A piece of information may be available on one resource and another on a different resource. Frustratingly, sometimes the resources can just be plain contradictory. Enter dope, the drug ontology parsing engine. The frustration that came with searching these sites for specific bits of information like what are the street names for Ritalin or what classes of drugs does Ritalin belong to anyway eventually motivated the development of dope. And our package designed to not only provide a comprehensive database of over 4,000 drugs, but also parse free texts and identify known drugs. Before we get into a brief demonstration on how dope functions are used, the following slides will provide a high-level overview of some of the key functions within dope. The parse function at its core parses a function that takes in the corpus of texts and returns known drugs. For example, this messy data frame with the vector text drugs includes problematic characters like slashes and dashes, but it also contains some information that I personally don't really need like few days on, few days off, milligrams, dosages. So parse is just going to strip all that out. When I pass in the messy data text drug vector into parse, I am returned with a vector of 11 possible drug names. The parse function takes in the text drug vector, filters out any stoppers and then utilizes a role-based natural language processing model to identify and extract the drugs in a phrase. Stopwords include a combination of three domain independent lexicons from Julia Silves and David Robinson's tiny text package and a set of domain-specific stopwords that we have established. What we have as a result is a vector of 11 possible drugs. Then there is the family of the lookup functions. Once you obtain your vector of possible drugs, then it's time to look them up. Dope contains one main lookup function and two helper functions. Lookup can take either a vector of drug names or multiple strings and looks for any possible matches within our comprehensive lookup table. Because some drugs belong to multiple classes and categories, they can have several hundred possible synonyms. Heroin alone has over 500 synonyms. For this reason, we have developed a lookup function called compressed lookup. So in this example, I want to lookup speed and dope two individual strings. The lookup function returns the original word plus its class, category, and synonyms associated with those classes and categories. If I pass that lookup table to compressed lookup, I actually get the same results because speed and dope only belong to three unique classes or categories. The only column that's missing here is the synonym. Finally, what if you wanted to want to adjust a vector of synonyms? We have a function for that, too. Lookup sin takes in a category and returns all the possible synonyms within that category. And if you accidentally type in a synonym instead, the function will return a message suggesting a different query using the category or class instead. In my example, I try passing dope, which is actually synonym into the lookup sin function. The function returns a suggestion that maybe I might want to lookup heroin or marijuana instead. After looking up marijuana, I get a lot of synonyms. So in this example, I've taken just the top five. Now it's time for a brief demonstration. In my demo, I've gone ahead and loaded the library dope, which you can install dope from Pran or from GitHub, and the McGridder package. My story in question states, I was at a party and I started with some percocet and Vicodin. I think I had a bunch of Ambien. My buddy Keith took Alprazolan, 25 milligrams, and he snorted zip. Now I'm trying to get a few NX. Let's see what parse fits out or end the lookup functions. So the first thing I'm going to do is I'm going to pass in story to my parse function. If you look over at my console on the left hand side, you see 11 potential drugs return. I see Vicodin, I see Ambien. These I know are drugs, started, Keith snorted. I'm not sure about those. They could be potentially synonyms. Let's find out the ideal way I would actually use these functions are in a sequence. I would take story and I would pipe it. Yes, pipe it. Dope actually supports the pipe into parse, then take that resulting vector and pass it into lookup. And because I suspect that there may be a lot of synonyms for some of my drugs, I'm going to go ahead and pass it into compressed lookup. And for simplicity, I'm going to go ahead and pass the results into view. When I run this code chunk, what I get are two known narcotics, Berksa and Vicodin, a depressant, a possible stimulant or cannabis, as it belongs to either category, and a treatment drug. We envision dope as a tool to enhance hypothesis generation in substance use research. Dope has the potential to analyze large batches of free texts collected from sources like social media and public forums where those seeking information on substance abuse can be extracted to catalyze further investigation. In a future release of dope, we will include several new capabilities, including allowing the user to define their own list of stop words, the parse function, and partial matching for misspelled drug classes and categories. Finally, we hope to build a web app, a front end to dope, with dope the package at its core, utilizing a more sophisticated natural language processing and machine learning algorithms to be able to extract more accurately actual drug references from a purpose of free texts like those found in clinician notes. The possibilities are endless and we are always open for collaboration. Before I wrap up this presentation, I want to take a moment to give a shout out to my team, Dr. Raymond Belize at the University of Miami Division of Biostatistics and Dr. Gabriel Odom at Florida International University School of Public Health Department of Biostatistics. They have been awesome to work with and this package couldn't have been formulated without them. I also want to take a moment to acknowledge the clinical trial networks for their support of CTM94. It provided not only support but also the timeline follow back files that motivated the development of dope. Thank you so much for tuning in and I'm happy to take any questions.