 So first of all, thank you so much to the organizers for having me here today, live from home, to talk to you today about some of my work with the All of Us Research Program. My name is Michelle Holcoe. I'm currently a Presidential Innovation Fellow, detailed to the NIH to work with the program, specifically with digital health technologies. I'm gonna talk to you about one of the projects I've been working on there since about December last year. So I'm gonna give you first an introduction to the All of Us Research Program. I'm gonna run pretty fast through those slides so that we can get to the digital health technologies portfolio more broadly in the program. And then I'm gonna spend the most time talking about the Fitbit, bring your own device data, analysis, and participant demographics, which is the area that I'm gonna be talking about most today. And then we'll close with some opportunity and considerations for digital health technologies in the All of Us Research Program. So just again, to give you kind of an introduction to the program in case you're not familiar with it, the All of Us Research Program grew out of the Precision Medicine Initiative, which was founded several years ago in the previous administration and has 1.5 billion of congressional dollars to get the program started and to continue running. And the mission really for the program is to accelerate health research and medical breakthroughs to promote individualized prevention, treatment, and care for all of us. There are three core values to the program that are really important to us. The first is to nurture relationships with one million or more participant partners from all walks of life for decades. And again, diversity is really critical to the program in that, to deliver the largest, richest biomedical dataset ever, as well as making it easy to use to a wide range of researchers and to catalyze a robust ecosystem of researchers. So the All of Us Research Program truly is an innovative and ambitious research effort. And here are some reasons why really the diversity at the scale of a million people are more in focusing on capturing the diversity that makes America America, focusing on participants as partners. So not just taking data from them, but also giving back to them and continuing to engage with them throughout the length of the program. The longitudinal design of the program means that we have the ability to re-contact participants, which we've already done in some cases around COVID. And multiple data types. So a diversity in participants, but also diversity in data, including electronic health records data, survey data, baseline physical measurements, biospecimens, genomics, and digital health technologies, which is what I'm going to be talking about today. Creating a national open resource for all, which is broadly accessible to all researchers with open source software and tools, and building security and privacy safeguards for all participants. So some current progress. We opened the doors nationally on May 6th, 2018. So far, we've enrolled over 271,000 participants that have not only completed the first initial enrollment step, but have also completed what we call the full protocol. And this represents individuals from all 50 states. Furthermore, greater than 80% of these participants represent one or more of the underrepresented in biomedical research categories, which includes racial and ethnic minorities, living in a rural region, and there are many more. I'm going to go through those in more detail a little bit later. And greater than 50% of the participants are racial and ethnic minorities. We built a significant infrastructure to support the program with over 100 plus academic groups working with us, the VA, the FQHCs, which are the federally qualifying healthcare centers, technology and community partners. We have over 320 clinics enrolling participants and that number still extending. We also have a bilingual website. So the website's available in English and in Spanish, as well as the participant portal app and call center. The Biobank at the Mayo Clinic supports 24-hour shipping and processing with capacity for 335 million plus vials. And we also have an interactive mobile exhibit that travels the country. The picture is on the bottom right of the screen, certainly not traveling during right now, but usually it is and it's actually wonderful because it can reach more rural areas where we don't have partners. So here's just a snapshot of enrollment to date. So you can see enrollment over time in the graph on the right, and then the bubbles kind of tell you where we are right now. So like I said, we have 271,000 that have completed the initial steps of the program which includes consenting, agreeing to share your EHR data, completing the first three surveys, providing physical measurements and donating at least one bio-cessinance. We have many more than that, 353,000 who have completed some part of the program but they haven't completed all of the initial steps. This is what the current protocol looks like. The first step is to enroll to provide your informed consent and to authorize your EHR. It's really important that we ask for the authorization of EHR up front at this initial step. Right now we're only recruiting adults who are 18 years and older. We do have plans to expand to children but right now we're only recruiting 18 years and older. It's also important to note that at this initial enrollment we have in-person enrollment and we have digital enrollment capabilities. Of course right now the in-person enrollment have all been paused. So in that interim we've still been able to accept participants joining the program but only through the digital method. Then you can answer surveys. There are many different surveys that are already available including the basics, overall health, lifestyle, healthcare access, immunization, family medical history, personal health history. And we also released a survey around the coronavirus pandemic called the COPE survey which focuses not only on signs and symptoms and exposure to coronavirus but also the social and economic aspects of that. The next step, the next two steps actually kind of usually go hand in hand. And so usually if you're coming to us through one of our healthcare provider organizations you'll provide physical measurements which are all listed here, blood pressure, heart rate, height, weight, BMI, hip circumference and waist circumference. At the same time you'll provide a bio sample. Usually blood, that's the bio sample of choice. However, if the blood draw is unsuccessful or if you're coming to us as a direct volunteer digitally we can send you a kit to give a saliva sample. Very few participants give your assessments. We are collecting some of those. And then finally, digital health technologies which is what I'm gonna be talking about today. We have a bring your own device program already set up with Fitbit and Apple Health Kit Connections. We're also developing integrated apps to track mood and cardiorespiratory fitness among other things to come. So now shifting from the participant to the researcher and how are we going to make these data available widely to the research community in a usable way? That sounds very easy but it's not. So on the left is kind of the traditional model where a researcher would come to the program, apply for the data and then they would be basically shipped out, it would be basically shipped out to them. But with this model it certainly discourages sharing of research. It also is a very weak link in terms of security. You have no control over the data once they leave your facility. You also need a high infrastructure to be able to support that, to pay for multiple copies and all of that. So what we've decided to do is to create a cloud-centric approach where we bring researchers to the data. So all of the data are hosted in the cloud. The tools are being built in the cloud and then researchers are able to get access to it. And in this way it facilitates collaboration. There are centralized security controls. We can make it accessible to all researchers and we can even think of ways of making different types of data accessible to different types of researchers depending on the need. So with this cloud-centric model, you can also think of providing the data in tiers. We certainly have a public level of data sharing that's happening. Publicly accessible, no log-in is needed, shared on the website. I showed you the blurbs of the enrollment over time. That's one of the snapshots. We've also created a registered tier which requires identity verification, signing a data use agreement and completing research ethics training. There's also the idea of potentially developing a controlled tier of data which would give people even more access to the data. So at each step where the security goes higher and we are instituting more and more controls because you're getting more and more access to data. So with the registered tier, there are some specific privacy controls that are in place there that would not be in place in the controlled tier. So we would need an additional trust with the users of that data. So here's the Researcher Hub website. You can go to researchallofus.org. I encourage everybody to do that. From there, you can see the data snapshots, the data browser and the survey explorer. The data browser that's available publicly does not let you drill down to individual level data. To do that, you have to move into, what's on the right here, the researcher workbench. So this is just how the Researcher Hub components fit together. On the left, you have the data browser, data snapshot, survey explorer. And then in the middle is this kind of data passport model which is how we are making the data available to researchers. We don't require you to ask or tell us what specific project you're gonna be working on. We just identify your identity and kind of build a level of trust with you as a researcher and then give you access to the data. In the researcher workbench, it includes the data dictionary, concept sets, cohort builder, notebooks and help desk. And one exciting piece of news to share is that the All of Us Research Program Researcher Workbench is open for beta testing. So it's a very important moment for us. We want to build this platform together with the research community. But we absolutely need the input of researchers to make it more robust to learn how well the data tools and policies are working. So the current components in terms of data include physical measurements, survey data, some EHR data. Much more is to come here, including genomics and digital health technology data. The tools that are included include a data set builder, a cohort builder, Jupyter notebooks with R and Python supported and the health resources are wide and varied including FAQs, sample notebooks and workspaces including code snippets, documentation and help desk. So access and security just a moment here. Again, All of Us employs a data passport model for access to this researcher tier to grant researchers broad permission to explore the data for a wide range of studies. Data testing is available to researchers who have an ERA comments account. This is what we're using for identity verification. So you have to have an ERA comments account. Most researchers do and if not, you can look into the process for creating that. And then you also have to have a signed data use agreement with your institution. So the institution that houses the data resource center finds an institution with your institution and then we verify your identity through the ERA comments account and then that's how you're able to get access. So again, to facilitate collaboration, the researcher workbench is hosted in cloud-based system. Data downloads onto your local machine are not at all allowed. And during the beta testing phase, another really nice perk is that researchers will receive free credits to cover computing storage costs. So here's just a peak look at what the workspace looks like when you come to the researcher workbench. First, you create a workspace and this is where everything kind of will be all collected. And then you can create cohorts, concept sets, cohort reviews and notebooks to run your analyses. This is just a look at what the tools look like in terms of the cohorts, datasets and concept sets. And then just one more summary about, you know, what is included in support. And I have to say the support features are really extensive. So anyone who is not a coder coming to this should be able to even get up and running. So there's the user support hub, researcher workbench, feature workspaces, reusable code snippets, demonstration projects, and then a help desk form that can give you kind of one-on-one help. So just to summarize this section, we are providing a robust data ecosystem with survey data, physical measurements, structured EHR data, with plans to add additional surveys, digital health technologies data, unstructured EHR data, linkages to other datasets, additional assays and bio-specimens and genomics. So moving on to the topic of the day, digital health technologies in the all of us research program. So what is the vision really for digital health technologies? Well, digital health technologies data from things like fitness trackers, smartwatches, apps, other types of devices can capture health relevant data from individuals outside the hospital or clinic, which can really supplement current medical information. Another feature, especially for fitness trackers and smartwatches, is that the data are longitudinal in nature and it can really give researchers an unprecedented ability to better understand the impact of lifestyle, environment and other factors on health outcomes. And eventually and ultimately develop strategies for keeping people healthy in a very precise and individualized way. The program is currently collecting data from participants in a bring your own device model. So we are not distributing devices to participants, although that is something that we are planning for. And we currently have connections to Fitbit and Apple health kits. So again, just bring your own device. We are also considering developing pilots of smartphone based app in the future. So what is the vision or impact specifically for the Fitbit? Bring your own device data in the researcher workbench. So I believe that all of us really, truly has a unique opportunity to release one of the largest and most diverse digital health technology data sets. We currently have data from over 9,000 participants from the beginning in 2009. In addition to current survey, EHR and physical measurements, future data releases will also include genomics and other data sets. So when you think about the power of combining all of these different data sites together, it's truly amazing. And then this can promote research to develop new and improved disease prevention capabilities, remote patient monitoring for connected care and other opportunities for high cost diseases. So the approach that we are using for the Fitbit data in the All of Us research program is this minimum viable product or MVP approach. So first step of this is to obtain the data. And we have all of the data currently through May 2020 that's going to all be curated and released in the next data release. We've also developed scientific research use cases around the Fitbit data just to make sure that the way that we're curating the data is going to make it useful to the research community. We are now assessing the data set for the longitudinal nature, missingness and trends are basically doing exploratory data analysis. And then based on two and three, based on scientific research use cases and the EDA of the data, defining the priority data elements for inclusion in the researcher workbench. And again, the timeline for that is to release it in the next data release, which is slated for late fall. Yeah, late fall 2020, currently anyway. And then to develop and implement a Fitbit curation schema coordinated with the security, privacy team and other approvals as needed. So this is currently what the Fitbit MVP looks like. You can see the data elements table over on the right. The data elements that we are currently pulling from Fitbit include heart rate, minute level heart rate, heart rate summary data, intraday steps, which is also the minute level, daily activity, summary, sleep, daily weight, daily fat, food and water. And it's important to note that daily weight, daily fat, food and water are all user supplied elements. So just by nature of that, there is much less of those data. The others are derived from parameters that are derived from the device. So what we've decided to do is to provide a series of data tables using BigQuery of parse data tables, including heart rate, summary by zone, heart rate, minute level, activity, daily summary and activity, intraday steps in that level. So to dig into the data a little bit more, as well as some of the participant demographic, which are important for a number of reasons I'm going to. So these are characteristics of the current Fitbit data set in terms of the data elements. So we have 9,255 Fitbit, bring your own device participants. And again, this is until March, March, May 2020 timeframe. 8,535 of these have primary and electronic health record consent, with activity measures as early as 2012 and heart rate as early as 2015. So that is quite a lot of data if you think about it. The data elements that we are pulling from the Fitbit API are shown on the right with the device derived parameters in green, daily activity, intraday activity, heart rate and sleep. And the user supplied are in blue, daily fat, daily weight, food and water. So when you look at the data corpus as a whole, and you look just by data element to see which elements have the most data in them, intraday steps and activity summary are the data elements with the most data. Heart rate and sleep summary have about 40% less and the user supplied daily weight, daily fat, food and water intake have greater than 90% less data than the intraday steps. So in terms of what is gonna be most relevant or useful for giving to the research community, definitely the, in my mind, the intraday steps activity summary as well as heart rate. So then we wanted to try to assess the completeness of each participant's data. So percent completeness was assessed within each of the 8,535 participants. And the way that we did this was to do a theoretical maximum kind of comparison where if you have to, that's based on the start and end time of donations. So if you have a user that comes to you and they started donating data and then they ended donating data and it was about a hundred days, then your theoretical maximum number of data elements for the daily elements would be 100. For the minute level, it would be 100 times the number of minutes in a day, which is 1,440. So then by comparing the number of data elements that you have for each of those data types within a person by the theoretical maximum, you can get this percent completeness. And we're looking at two thresholds, greater than 60% complete and greater than 80% complete. Because intraday step, daily activity and heart rate are all, they're all device derived data types, you would assume that they're all relatively similar, because if the person is wearing the device versus not, you wouldn't get steps and not heart rate, unless the device just didn't have that capability. So it's actually kind of a good control to see that these numbers are very similar. So for intraday step, daily activity and heart rate, we have about 77 to 79% of the participants that are donating data have more than 60% of the data compared to the theoretical maximum amount, which is pretty good. This is a pretty complete set, which is what that means. When you look at 80%, it's slightly lower, but it's still very good. Around 62 to 65% of the participants have more than 80% of the maximum possible data available. So an interesting thing to note though, is that when we looked at how complete a participant who is only donating data for one year or less versus a participant that's donating several years of data, we noticed that participants with data available for only one year or less had the lowest percentage of completeness and completeness of participant data increased as duration of FIPPET, BYOD data donation increases. So this just goes to show that, I think something that we've all noticed, that if you've had a device for more than one year, if you wear it regularly, it kind of becomes integrated into your life. And so the chance of having a more complete data set goes up. So before I move into the next section, where I talk a little bit about the demographics of the participants in the FIPPET BYOD program versus all of us participants, I wanna first just give you a summary of how we think about diversity in the all of us research program. Diversity is critical to doing precision medicine research because when you think about, if we want to be able to deliver healthcare that specifically for a specific type of person, we need to have research on all of those various types of people to be able to find those differences. So this is how we think about these groups that we call underrepresented in biomedical research and their definitions. So race and ethnicity, participants are identified as Hispanic, Latino, Spanish, race, or race identified other than white. Age, participant is under 18. Again, we're not doing that in this program yet. So over 65 is really what we folks come. Sexual and gender minorities, participant sex reported is different from that assigned at birth or orientation, sexual orientation is something other than straight. The income the participant lives below the federal poverty level, which is less than 25,000 year education. The participant has less than a high school degree. Geography, you live in a rural zip code and then access to care. You can't readily access the healthcare system or can't pay for care. And then disability, some type of physical and or mental disability. So just first to start, I wanted to show you that the breakdown of age groups in the FIPPIT, the YOD participants versus all of us core participants. The FIPPIT, the YOD are in the dark bars and all of us are in the lighter bars. So what you can see here is that the distribution of age is fairly similar. You do lose a little bit of the 75 to 85 and this does fall into that EBR category. You do gain a little bit in the mid 20s to mid 30s. But overall, this is pretty, pretty comparable. So in the next chart, in this chart, I tried to highlight some of where we're seeing some of the gaps and we're not able to capture EBRs in the same percentages with FIPPIT participants versus all of us participants. So you can see that the vast majority of FIPPIT participants over 78% are white. Whereas with all of us, we only have about 47, 48% are white. And then you have 20% black or African-American, 16% Hispanic or Latino, whereas in FIPPIT all of us are FIPPIT BYOD, you only have four to five percent of those two EBR categories. Another feature of the FIPPIT participants is that over 36% have an advanced degree. In all of us, only 18% have an advanced degree. In terms of high school only, we've got about 21% in all of us who have high school only degrees. And in FIPPIT participants, it's only 6%. In terms of living below the poverty line, not surprisingly, we only have 8% of people donating data from that group. Whereas all of us participants, we have 31%. So it's just really, I wanted to really drive this point home that even though we're trying so hard, this program is so focused on diversity, we're trying really hard to capture the diversity, we can't do it in this kind of bring your own device model because this is probably reflecting people that have the devices, right? So we're gonna have to be a little bit creative in order to try to capture the data from other groups. So just to summarize, there are specific diversity gaps in the FIPPIT BYOD participants when compared with all of us core participants and skipping down to the bottom and the two bulleted points that are involved, that there's a definite bias towards white versus other race categories and that FIPPIT participants are skewed to higher education and income levels. So why is diversity important? Just one more piece of evidence. This is a very popular paper in Nature from 2016, which surveyed the data that had been used to do GWAS or Genome Wide Association Studies from 2009 to 2016. And what it shows is that in 2009, 96% of GWAS studies were done using data from people or using samples from people with European ancestry. That is a tremendous bias, right? I mean, this is something to be really, really careful about since this paper was published in 2016, it went down to 81% and it looks like, in particular people with Asian ancestry are becoming more represented, but we're doing a terrible job at getting diversity in this data set. And my worry, my concern is that if we're not careful in how we do research, using these digital health technology data, we're gonna be in the same situation. And that's certainly not something that we want because these types of gaps are what lead to disparities in healthcare. So what is the opportunity and some consideration for DHT in the All of Us Research Program? So the All of Us Research Program truly has a unique opportunity to address some of the current challenges in DHT research due to the following core values of the program first, the size and diversity of the cohort. Again, we're trying to get to a million. We've currently got more than 350,000 and the focus on diversity means that we really do have a need to focus on diversity, not just at the level of the participants, but also at the level of the data type. It's also important to the program to capture data from those NUBR groups and the goal to develop an open access, rich and complete biomedical data set that's not biased and make it available to the research community. So that's really key. So because diversity of participants and of data is of critical importance to the All of Us Research Program, this information that we've learned from the CIPIP-BYOD analysis will be used to plan future digital health technologies efforts, potentially piloting, giving out devices to people to try to mitigate gaps in the traditionally underrepresented biomedical research participants. So how can, really the question that I wanna get to and then I want people to think about and to hopefully spark some discussion is how can the All of Us Program leverage DHT to provide unique opportunities for our participants and for the research community? So just some other key considerations for all of us and digital health technology. Again, diversity is key, I can't say it enough. Even in all of this research program where there's a huge focus on diversity is a challenge to get that diversity in the digital health technologies data. But it's critical and key for eliminating bias from results. You know, the All of Us Research Program because we have the opportunity to release one of the richest data sets and largest data sets with the CIPIP release, it's an opportunity to democratize access to the data type. You know, there are a lot of research studies that have just a very few number of participants or for the larger studies with lots more data, they're mostly in the private sector. So how can we, you know, in this safely enable a larger number of researchers to get familiar with these data types and start to do innovative things with them? The other thing is that, you know, in a lot of cases, these are novel data types. So a lot of what has been built for privacy and security is unknown until proven, right? And so certainly data egress is something that the program guards against and is critical. And then with privacy, re-identification of participants is also another consideration that we need to make sure that we are protecting for. So I'm just gonna show you the slides to let you know that it really does take all of us. This is, these are all of the All of Us Consortium members beyond the community partners. You may or may not be able to read all of these organizations. It's truly massive. These are the community partner and provider networks. And here's a picture of all of us in there somewhere. And it really is truly a monumental effort. So please feel free to find out more information, shoot me a note, ask a question, discuss, and thank you so much for your time.