 Hello, my name is Sophia Shahu, and I'm a Principal Investigator at Mass Eye Near and Harvard Medical School. Along with David Miller, a medical oncologist and dermatologist at Massachusetts General Hospital, we're excited today to present Storyboarder, an R-Package and Shiny application designed to visualize real-world data from clinical patient registries. We hope to present this package that we created and discuss its applications. So here's an overview of our talk today. I'll begin with describing the unmet need that inspired us to create Storyboarder. Next, I'll provide a general schema overview of the app, and then I'll hand off the presentation to Dr. Miller. We'll discuss how to install Storyboarder from GitHub. Next, we'll discuss the background on the data capture instruments in Storyboarder and show how to install them into Redcap. This will be followed by a demonstration of the application using a built-in data set. Then he will discuss data science principles behind Storyboarder and conclude with limitations and potential solutions. Okay, let's get started with the unmet need. Tumor registries are a rich source of real-world data which can be used to test important hypotheses that inform clinical care. Exploratory data analysis at the level of the individual patient when coupled with interactive data visualizations has the potential to provide novel insights and allow us to generate new hypotheses. Unfortunately, software that can be integrated into an oncological data collection effort and generate interactive data visualizations of a subject's cancer journey are very limited, and that's what inspired us to create Storyboarder. So let's move on to the general Storyboarder schema. Storyboarder takes as input clinical data that's structured and captured in the electronic data capture system, Redcap. Storyboarder is designed to wrangle, transform, and ultimately provide a visualization of the patient cancer's journey using the data that's stored in Redcap. The generated visualization is a timeline that displays the key elements of the cancer patient's journey. It does this using the data stored in a structured fashion in Redcap. In reality, it's a little bit more complicated because not only does it function as a standalone package, but it's also built as a shiny application. Again, the input of the shiny application are clinical registry data captured in a structured electronic data capture system and users interface with Storyboarder in a web browser. That's where they're able to control which patient's record they would like to create the interactive data visualization timeline for. And it's the server side of the shiny application that executes the code of the R package, which then wrangles, transforms, and visualizes this clinical data. So this data visualization output is a plotly interactive graph that displays a timeline of the patient's key elements along their journey. So now I'll hand it off to Dr. Miller to start with how to install Storyboarder from GitHub. Thank you, Dr. Shalhout. Now we're going to provide background on the data capture instruments incorporated in Storyboarder and show how to install them into Redcap. I'm aware that most of you are familiar with Redcap. For those of you that are not, it is a secure web application for building and managing online surveys and databases. It's been implemented by over 5,000 institutions across the world. Storyboarder incorporates the following 11 data collection instruments that capture information across the patient journey. And they include subject status, patient characteristics, presentation and initial staging, lesion information, pathology, surgery, radiotherapy, systemic antineoplastic therapy, adverse events, lab results, and genomics. These data capture instruments are utilized by the Mass. General Brigham, Merkel-Selcarcinoma patient registry. And they incorporate data elements for M code, the minimum common oncology data elements, and fast healthcare interoperability resources or FHIR. They also include disease specific unique variables. In an effort to aid in the implementation of these data collection instruments, we've created a series of posts and placed them on the Miller Lab website. These posts provide insight into these individual lesion information instruments. We also talk about some of the limitations and solutions. We recognize that various investigators use different ontologies and nomenclatures when they create their data collection instruments. And in an effort to help aid in implementation, we've created lookup tables and other tools to help map to other nomenclatures. We've also made this data dictionary freely available to all of you. We spent over four years working on this data dictionary. It has over 4,000 data elements that you can collect on. And we've provided this for you in order to help you use Storyboarder. We place this on our GitHub, the Miller Lab Storyboarder. It is found in the data raw folder. You can click on this CSV file and upload it into your RedCap project. In order to aid in the implementation, we've created a fake data set of five simulated patients that's found within Storyboarder when you load it into your R session. It must be said any information that is similar to any real patient is completely coincidental. All the data in these simulated patients are completely fabricated. We've incorporated these data collection instruments into our clinical informatics ecosystem that includes the electronic health record or electronic data warehouse, electronic data capture system, and an integrated development environment. And our overall clinical informatics pipeline is probably pretty similar to many of you that are using Epic RedCap and RStudio. We've also created several R packages in order to both help augment data abstraction and in years past. At R Medicine, we've talked about ELab in genetics and we've built now packages to help augment data analysis and body mapper in Storyboard. Now we're going to provide a demonstration of Storyboarder with the built-in data set. So once you've installed this from GitHub and loaded the Storyboarder into your library, when that happens it'll load dependent packages such as those found in the tidyverse, plotly, shiny and shiny dashboard. We're going to use that embedded data set Storyboard underscore data set and we're going to store it as an R object DT. Let me take a look at that R object. This simulated data set has 115 observations of those five patients. Next, to launch Storyboarder, you use the function launch Storyboarder. This function takes two argument, a data argument, and then has a date shift argument which defaults to false. If you want to shift all of the dates in a systematic way in order to protect certain protected health information, you can change this to true. But we're going to leave it as false and we're going to load the DT object. This will, once launched, cause the application to launch in an embedded browser and we're going to go ahead and click that so we have our Storyboarder shiny application embedded in Safari. What you'll see in the sidebar are two options. The first option which defaults is the subject dashboard. This allows the investigator to select the patient for which you want a Storyboard generated. The first display is a tabular form of relevant patient characteristics. For example, demographics, initial staging, subject status of the patient, their burden of disease. We also display genomic information. Perhaps the patient had next generation sequencing, kind of a standard of care in 2023. Perhaps that information was collected using the genetics package that we created. That is displayed in this dashboard here. Furthermore, the dashboard contains any therapeutic intervention that a patient may have had and has been captured by investigators. And this is very useful as a lot of information right at the fingertips. But we think that the main attraction of Storyboarder is this interactive data visualization. So when you click on Storyboard in the sidebar, what happens is that information from RedCap gets displayed on this horizontal timeline. When you see this timeline right out of the gate, there's a lot of information stored in the data visualization, which you can see here. You can see elements of the patient's disease journey, the type of staging that they had, various treatments, and their overall subject status. But better yet, we have a lot of information stored within the hover text. So let's take a look at simulated patient one. This simulated patient's journey began with a left ankle skin primary tumor. And for this simulated patient set, these patients all have myrchal cell carcinoma. So that myrchal cell cancer and the simulated patient presented on their left ankle. And that was histologically confirmed in this hypothetical date of May 9th, 2019. This patient underwent next generation sequencing using the MGH Snapshot Next Generation Sequencing Panel. Here in the hover text, it has displayed the alterations that were detected via that snapshot analysis. Due to standard of care procedures, this simulated patient went on to have surgery and embedded in the hover text is details of that surgery. So you can see what type of surgery. Do the patient have an excision? Let's note the section. You can see the margins the surgeon used and the outcome. This patient had an R0 over section, which means that the tumor, thankfully, was completely removed. As a result of that tumor being completely removed, the subject status, which is an M-code data element, a minimum common oncology data element, we see that that patient's subject status is NED or no evidence of disease, which is what we want every patient to be throughout their patient journey. However, unfortunately, this patient, like many with mercosile cancer, had a recurrence of their disease with left cath, shin, and transit metastasis. Therefore, the patient underwent definitive radiotherapy and completed definitive radiotherapy. And we see the details right here in our storyboard. The patient had 30 fractions and received a total dose of 6,000 senigrate. This patient was then rendered NED until, unfortunately, recurred once again with a left groin lymph node metastasis. Due to that, simulated patient one received one of the FDA approved options for mercosile cancer and anti-PDL1 monoclonal antibody of valiumab. What storyboarder does is displays each and every dose that a patient receives if that information was collected in red cap. Now this patient, fortunately, went on to have a complete response so their subject status was no evidence of disease on March 12, 2019. This simulated patient then went on to receive maintenance dosing. What storyboarder does is it allows you to create that dashboard and storyboard for every single patient in your registry. Here are three storyboards that are created. Now we use this storyboard, a storyboarder, data visualizations to help us generate new hypotheses by looking at patient level data. It gives you insight, gives you ideas. It also has other potential applications. We've heard people talk about using this, for example, at a tumor board where a case is being presented to other clinicians that might not know the patient's overall journey. This is a way to get a quick data visualization, a sense of that patient journey. We hope that all of you who are interested in this type of research can use storyboarder and customize it to meet your research needs. Therefore, I want to discuss a few of the data science and software engineering principles of storyboarder to allow you to customize it for your needs. These are the functions that storyboarder incorporates and we're going to talk about these. We think about them in three different levels of functional programming. What we've done is created a series of base level functions. These are functions that are specific to each data collection instrument. So you see here we have functions for subject status, for staging, for lesions, for surgery, anti-neoplastic, so on and so forth. And what all of these base level functions do is they parse the data and wrangle it to create the same data frame, a data frame with five vectors, record ID, description, value, date, and hover. Now let's look under the hood a little bit more. Now let's use our storyboarder data set to do that. First we're going to call the lesion function. And what the lesion function does is it takes information from this lesion instrument that has a lot of information about, for example, here that left ankle skin primary that simulated patient one presented with. This instrument's capturing information on, for example, the date that this lesion was detected, the location, the size, its response to treatment, various other data elements of interest. What the function does, first thing it does, it replaces the values in the registry with the appropriate strings from the data dictionary. So instead of just displaying numbers on the storyboard data visualization, it's going to display relevant text. The next thing it does is it selects variables from those instruments that you want displayed on the data visualization. Some of these instruments have hundreds of variables. It's not useful to have all of those displayed on the data visualization, so we've selected the relevant data instruments. You are able to select whatever variable you want again to enhance that customizability. And finally, what story this lesion instrument function does is it returns a data frame of five variables. Record ID, description, value, date, and hover. And let's take a look at those. So here we see our embedded dataset after we call lesion. Record ID, it's pretty self-explanatory. Description is the instrument from which the data comes from. Value is the information, the strings that are actually on the timeline. And then we have the dates and the hover text that we saw, for example, right, but our primary that lesion tag is then also concatenated with the date for which that lesion was detected. Another example, when we call the surgery function, it's going to select the relevant variables from the surgery data collection instrument. It's going to replace the numbers with the appropriate strings, create hover text, and then return those five variables. Record ID, description, value, date, and hover. And this is what the data frame looks like when you call surgery with the embedded dataset. So that's our base level functions. Our next function, mid-level function, is Combine Story Border. And what that does is it takes and calls each individual base level function and stores it as an R object, and then concatenates those into a single data frame with those five variables. Record ID, description, value, date, and hover. Now it's taking all of the different base level functions and concatenating them with all of the information from each patient in the registry. This data frame then becomes the argument for the final top level function, which is Storyboard Plot. And that creates the data visualization that we walked through earlier. Now every package has limitations and so does Story Border. Story Border is to be used in conjunction with the data collection instrument of a clinical registry, such as the Mercosel Carcinoma Patient Registry. Therefore it functions optimally when those forms are installed in a RedCap project. As a solution, we have made the data dictionary freely available so that others may adopt a similar platform. So to summarize, tumor registries are an important form of real-world data and the data they contain can be used for both hypothesis generation as well as hypothesis testing. Story Border is an R package with a shiny application front end that produces an interactive data visualization of patient level data. When built around a core data capture system, such as a cancer registry, R-based packages like genetics, elab, bodymapper, Story Border can combine to form a powerful data informatics ecosystem to both augment data abstraction as well as facilitate data analysis with the goal of accelerating time to action for patients with rare tumors. With that, I'd like to thank all of you for your attention and for the organizers for inviting us to give this presentation. Thank you. That was really fantastic. I can see how that would be very useful for visualizing patient histories. And we have a couple of questions in the Q&A that maybe you wouldn't mind just answering live for everyone to hear. The first says, I see a lot of info is encoded in the hover text. Have you guys looked into accessibility software, for example, for blind users? I think hover text can cause issues for those types of users. Are you able to unmute and answer that, David? Does he have access? All right. I think I've been freed. Sorry about that. No, it's a great point. I, you know, honest him. I was ignorant about that software. And I think we should absolutely look into it. That's a great point. And so part of the point about a blind spot on my part to think about the visually impaired users. So thank you, community, for bringing that to my attention. Awesome. And there's one other question about whether you can produce storyboards for multiple patients at a time. For example, if you wanted to see patterns of treatment patterns. And on the same, in the same data visualization, I haven't optimized that yet, but we can we can definitely work on that. I think that would be that would be a nice way to, you know, to add some more texture to the to a registry, for example. It's another great suggestion. Okay, great. Well, thanks again, and we'll move on to the next talk. Thanks, everyone.