 I'm honored to introduce our keynote speaker today, Jeff Leake. Jeff is Chief Data Officer, Vice President, and J. Oren Edson, Foundation Chair of Biostatistics at the Fred Hutchinson Cancer Center. Previously, he was a professor of biostatistics and oncology at the Johns Hopkins Bloomberg School of Public Health and co-director of the Johns Hopkins Data Science Lab. His group develops statistical methods, software, data resources, and data analyses that help people make sense of massive-scale genomic and biomedical data. As the co-director of the Johns Hopkins Data Science Lab, he helped to develop massive online open programs that have enrolled more than 8 million individuals and partnered with community-based nonprofits to use data science education for economic and public health development. He is a fellow of the American Statistical Association and a recipient of the Mortimer's Spiegelman Award and Committee of Presidents of Statistical Society's Presidential Award. I actually remember Jeff fondly from his keynote at our StudioConf 2022, so I'm excited to hear him present to you today, DataTrail and its extensions, how you can build a local pipeline of talent for your medical center. Jeff, it's all you. Great, thank you very much. I really appreciate the intro and really appreciate the invitation to speak here today. I'm going to share a screen and then if somebody can let me know with a thumbs up if it actually appears. Let me see. Okay, great. I'm very excited to talk to you here about DataTrail, which is a project that's been a really personal passion project of mine for more than four or five years now. And it's something that I'm really excited about growing and so really eager to tell you about what we've been up to and really look for opportunities to collaborate and build on what we've already started. And I think I was really inspired by Peter's introduction of showing something that they had just started working on and then it turning into something even better with the use of collaboration from the community and I really hope that we can do the same sort of thing with DataTrail here. We're at the early, what I still call the early phases of building this and and I think it would be an awesome opportunity to collaborate with all of you. So I briefly do my disclosures. I'm a founder and board member on a couple of startups that spun out of my group and I'm a co developer of Coursera and LeanPub courses that produce revenue for the various universities and institutes I'm a part of. So I just mentioned those. I just wanted to say I'm incredibly excited to be here today. Art Medicine is a conference that I've admired from afar for a long time and it's just never worked out for me to be able to be a part of it but I'm excited to be here now and I'm looking forward to actually being a part of the art medicine community for a long time going forward, especially in my new role. Before I get started on telling you about everything that we've been up to I wanted to take a moment to thank all of the people that are part of this project. This is the largest collaboration that I've ever done. It involves university staff nonprofit collaborators students postdocs administrators companies. Collaborators at multiple institutions. And I wanted to particularly shout out Shannon Ellis and I was our Hattivan who are two postdocs that started this project with me. They took a really big gamble to do this as they were postdocs. And it really paid off. And then I wanted to thank some of our core team members Ashley Johnson Simone Sawyer Dave on person and Candace Avanon who've really put in a huge amount of effort to making this possible for the communities that we work with. So anything good you hear is definitely from due to one of the these folks and anything that you hear that's probably a little bit wrong or misguided is definitely my my fault. I wanted to talk a little bit about how I have this new role and you know it's a fancy role with a fancy title but it didn't start that way. This is a picture of me back from when I was a nerdy kid from Idaho and my freshman year at Utah State University. And was just sort of trying to figure things out and didn't really know what I wanted to be when I grew up I think like a lot of kids and didn't really know how to get from one place to the other. And so, how did it happen that I got from from where I was to where I am now and I think it is an instructive exercise that was a thing that Shannon and I was our and I sat down and did right at the beginning when we were thinking about data trail which is, we wanted to sort of understand what are the steps or the or the components of being successful and becoming a data scientist or in your career and she have to know that data science or the area is a real thing. You have to have the income security to be able to get the schooling that you need, you need access to an expensive computer. They say a data scientist is a statistician working on a MacBook Pro and MacBook Pro is pretty expensive so you need access to it. You need access to appropriate educational programs and instructors. The right jobs have to come up at the right times and then really you have to rely on your community and your network to be able to get connections and I think, again, I was really excited to see Peter's description of the job placement and connection services that are sort of being provided by our medicine because I think that's such an important piece of helping young people get careers. Even more importantly than that, I had a number of people that really made the difference for me my undergraduate advisor Jim Powell took a chance on me, my graduate advisor john story works with me even when I didn't know anything about statistics when I got started Giovanni Parmigiani rescued me from a difficult situation and supported me in a job and Rafa really helped me Rafa is already helped me really figure out what it means to be a biostatistician and day to day practice and then Karen banding roasters with a chair that hired me and the list goes really on and on. There are a whole collection of people that made it possible for me to arrive at the place I am now. And I just wanted to give everybody a brief moment I know that there's live chat and you can put it in chat or you can think to yourself. Think about all of the people that helped you get to where you are today which if you kind of landed in a place you're happy with. So what was it who did it take, who are the community that helped lift you up and get you to that place and I think if you're anything like me, you'll realize it's a huge community of people that really help get you get you to the place that you're going. So it's really important to build that community for for other people as you kind of move up in your career. The whole kind of hypothesis around data trail is that talent is equally distributed and this is our second cohort of data trail scholars. They all live really close to Johns Hopkins, where, or did at the time that the program ran near where I went to work every day. And they are all fantastic brilliant young people who are now doing either data science or going off to med school or pursuing other careers and it's really exciting to see what they can accomplish. But an important sort of caveat to the talent being equally distributed is that opportunity isn't always equally distributed. So this is data from the Economic Opportunity Atlas, which was produced by Ray Shetty and folks at Harvard. And they looked at the data from census records and the data from IRS tax records, and they found that for a person that grew up in the neighborhood I'm highlighting the neighborhood right around Johns Hopkins Bloomberg School. The median family income for a family of a person that grew up in this neighborhood is $18,000 in their mid 30s. So that's kind of a stunning lack of opportunity and the red sort of indicates the scale of the lack of opportunity I take into calling these opportunity deserts. And you find them all over the United States. I've moved to a place the Fred Hutch cancer center which is in an incredibly wealthy part of Seattle and in despite that there's still this opportunity desert for people that grew up in the neighborhood right around the Fred Hutch. And if you grew up in this neighborhood you again have a low median income at the age of 35. And it's not just the hutch and it's not just Hopkins, basically every major medical center, you know they're often placed in locations of need. And when they're placed in those locations there's often opportunity deserts directly adjacent to those communities. So, one of the sort of core reasons we actually started this was thinking about, we have this group of talented young people and this opportunity desert and it's right next to a billion dollar medical center with tons of need for data. Every level of data professional. Can we make these two things match up and try to both help create economic opportunity but also solve an area of major need for medical centers. And so this isn't just a problem of economic inequality it's also a public health problem. This is another incredible data set out of raised group and at Harvard and it shows on the x axis income percentile so how much money. Your parents made on a percentile basis so zero is your parents made very little money and 100 means your parents are very wealthy. And life expectancy is shown on the y axis for males and females and you can see that this is real data, but the linear correlation between income percentile life expectancy is incredibly tight correlation doesn't imply causation but this is such an incredibly compelling correlation that we wondered if you kind of moved people from one income quartile to another income quartile. Will that have a substantive impact on their, you know, not just their economic conditions but their life expectancy and their health. So that's what we have to do with education. Education is still the best treatment we have for this sort of economic and opportunity inequality. I'm showing you here a graph of the different colleges around the United States on the x axis is the access rates that's how hard you get it what's the chance that you get into that college, if you're from the first quintile of income to the bottom 20% of income. So, if the colleges over here on the right are very easy to get into and these are very hard if you have low income, and on the y axis is the success rate is the chance you're in the top quintile of income after you leave that university. And so Johns Hopkins where I used to work, but also lots of fancy universities are up in this corner, where it's very hard to get into them, but once you get into them you'll have a very high chance of having a high income at the end. And there's this kind of L shaped pattern, which is, you know, the ones that are harder to get into typically produce higher incomes after going to those universities. The really interesting thing about these data is that it shows that the income mobility is relatively independent of your parents income so on the x axis here I'm showing the parents income quintile. Quintile also goes from very poor to very wealthy. And then on the y axis is the quantile of the fraction of the kids in the very top quantile after they finish a university. And I'm, you know, this comes from the data that they collected they labeled the universities in these various buckets and you can see within a bucket, the curve, it does go up a little bit to the right but it's relatively flat. And so what does that practically mean it means no matter what quintile your parents income came from. If you go to this university you have a very reasonable chance of ending up in a pretty well off situation. So they define something called a mobility rate which is multiplying the fraction of parents with income in the bottom quintile by the fraction of students that end up, given that they get into that university at the top quintile of income by the age of 34. And they sort of use this as a measure of how well does that college or university move people from the lowest quintile to the highest quintiles of income. And if you look at the top colleges, they tend to be large public schools with high access rates. So you can get into them, and they're very large there's a lot of students that go to them. And they tend to have really reasonable success rates. But you when you multiply these two things together you get a mobility rate that's sort of around 10%, which is really good by by national standards but one of the things that's interesting about this is, you know you have this huge collection of people they're studying lots of different things the chance of them all being very high income is very low. But if you pick specific areas where there's high income potential at the end, you could imagine increasing that mobility rate. And so data science is one area that we all know is a high growth and wage job category and data science is actually means a lot of different I think that we're just at the beginning of defining the spectrum of what it means to be a data scientist all the way from entry level data processing and data entry kind of jobs that are require some thinking but are just basic jobs all the way up to very advanced machine learning and complicated modeling and I think we need to fill that entire gap of jobs. And in particular, this is a job growth category within the field of medicine I know here it's true at the Fred Hutch here. It's definitely true at Hopkins where I was before that there is a always a shortage of people who can be sort of clinical data managers for groups manage whether it's managing trials, whether it's managing data from large epidemiological studies. So that's the kind of increasing demand for clinical data managers across the entire biomedical enterprise. I'll come back to this in a minute. We've seen this demand firsthand because we have developed at Johns Hopkins previously and now we're doing it some here at the Fred Hutch as well. Massive online open data science programs that enrolled millions of people all over the country and world. So this is an incredible enrollment in these programs because there's a lot of interest in getting into these high growth job categories. So, this is an incredible opportunity to leverage this education to improve mobility rate around the country. So we wanted to study and this is one of the projects that launched this whole project or one of the papers that launched this whole project I was our one of the study of the MOOC programs that students who took those courses really improved their employment project prospects and improve their income. And it turns out that MOOCs could, you know, with an analysis that he did he did this quite nice propensity weighted analysis I'm not going to go to the details you can read it in the paper, but it essentially showed that you get a pretty significant percentage increase in your income just from taking and completing these online courses which was quite exciting. One of the problems though is that MOOCs benefit the already well educated so these massive online open courses the sort of hope was that if you build it, they will come people will just show up, start taking these courses, and we'll use them to improve their life situation. The reality of not just our MOOCs but every data science MOOC or bootcamp is that the folks that are taking advantage of it are folks that are already well educated already have employment prospects and they're using it to move up in their careers. This is data from our program but it's reflected in the data, there's been a series of papers that have come out that show that it's it's sort of reflected across the enterprise of data science training that it benefits the already well educated. One of the interesting things about our sort of data science courses on course era is that for a long time the modal user was a white middle class male software engineer from Silicon Valley and we definitely want to educate those folks we think it's really important that everybody gets access to data science education but it's definitely not leveling the playing field or distributing opportunity across the entire spectrum of people in the United States. We really want to design data trail to see if we could target folks who have limited educational background limited knowledge of data science and see if we could train them to be very entry level data managers data analysts data entry folks and see if we can get them into those kind of employment prospects and in particular to connect them to Johns Hopkins, where we were working at the time. And so that was the sort of the goal of the project. But the reality is that there's significant barriers to entry just as I described at the beginning, you have to know what data science is you have to have access to an expensive computer and income security and appropriate programs and instruction and the right jobs and connections and you might not have all of those things if you come from an opportunity desert. Even if you don't come from an opportunity desert you just have to be lucky and get help and get support and be able to do these sorts of things and so it's the same set of, you know it's not unique in to those individuals that come from the communities we're trying to support, but it is a challenge for those communities and we wanted to see if we could systematically address those barriers and knock them down for people. So the first thing is knowing about data science. And so we all know about data science and are and the way the ways it can be used and how powerful it is. But when we first started partnering and our first partner was with the Historic East Baltimore Community Action Coalition and Baltimore it was right next to Johns Hopkins. They have a youth opportunity training program where they train folks to get their GEDs, which is a high school diploma in Maryland. And we would go down there and hang out when we were first designing the program and starting to figure it out we would spend a lot of time down there and we would talk to all the young people and none of them had even heard that data was a fit data science is a thing. They got lots of them would get excited about it once we started talking to them about it but it was sort of a new thing and so just communicating and getting the idea out into different communities about what are the opportunities and different fields that they may or haven't talked about previously was a big thing. We're seeing this now as we do a lot of work with community colleges and tribal colleges, also in rural areas as well so it's not just in cities that this kind of opportunity deserts occur near where I grew up in Idaho there's also a similar sort of opportunity desert just nearby and so really getting this idea out there is a critical piece of moving people forward. I think that the computers can be pretty expensive as we all know that that you need to use to do data science and so our entire intention from the beginning was, we're going to design a program that you could do on a public library computer or you could do on a Chromebook. This was an intentional design choice to make it possible to do the entire program through a web browser if you had an internet connection a web browser you can do the program. At the time we first started developing this that was a little bit hard. Since then, you know, especially with the pandemic and everything else, there's been so much development and cloud technologies that it's become increasingly easy to be a person that just operates in the cloud. At the time, it was our studio cloud now it's called posit cloud. This is an old slide but this came from the alpha phase of our studio cloud I reached out to to reef and JJ our studio they were working on this our studio cloud project. And they very early on gave us access and a lot of support Robbie and his team gave us a lot of support there to basically build our entire program on our studio cloud. We built the entire program there before it was even really launched as a beta project, because we knew that we needed a tool that would allow us to let people compute on the cloud and not just on their local laptop. And then we started thinking about how do you do data science on a Chromebook and we wanted to sort of use our, you know, eat our own dog food as they say and so I started doing all of my work on a Chromebook and in fact I had done that until I moved to the hutch I had been on a Chromebook, I had issued a Mac, sort of against my will. I have been on a Chromebook, sort of for the last seven or eight years doing my data science entirely on that. And so it's possible to do it it's sort of like writing a haiku like the first few times you write a haiku poem it's very uncomfortable to get all the syllables right and get the you know the design of the poem right, but then as you go on, you can basically begin to write it and then it's a lot easier to do. So that was the computer and knowing about data science then there was income security and here we really relied on our partners so the Johns Hopkins Bloomberg School of Public Health, the able foundation Bloomberg philanthropies and most recently posit have contributed some money to allow us to be able to basically pay people to complete the courses and so they treat it like a part time job. And then absolutely critical piece of the program because a lot of the young people we work with don't have enough income to be able to concentrate on this full time if they can't treat it like a job. And so we've been thinking about ways to basically make it possible for them to focus on this training as a part of their, you know, income stream so that they can focus on their job. And then they need access to instruction and appropriate programs and a lot of programs aren't really designed for folks who are just coming out of their GD, they're they're usually designed for more advanced people, educate people at more people who are advanced in their education. And so we needed to design programs that would work there and so this is where it's really relevant at the our medicine conference we picked our for very you know we had we thought about Python there's some real advantages to Python. And so we needed to think about, you know, other languages to include as well, but are really stuck out and, and the reason why is we wanted to sort of minimize the time to magic. We've been thinking about how do you quickly quickly get people to do things that they'll get excited about. And so the code on the left produces the website on the right, which is absolutely incredible. And I just saw the haiku pop up in the chat which is amazing by the way I'm definitely using that. And so the goal is to minimize time to magic, we want to be able to make it possible for people to write a little bit of code and produce a website or write a tiny bit of code and process a gigantic amount of data. Like you can with the big our query package, or with, you know, flex dashboard or shiny. The code on the left produces a full in hands interactive dashboard on the right. And so again, it's how fast can you get them to happiness, and the faster you can the more engaged and more excited they'll be and are is incredible at that. So we built an originally it was called cloud based data science I'll tell you about the latest iteration in a few minutes. We really covered everything from how to use the basics Google and the cloud, how to organize data science projects version control are the sort of general data tidying and data analysis process and then some some quote unquote soft skills like written and oral communication, and getting a job and data science and so the idea was really, how do you figure out a way to take someone all the way through this life cycle, not just focusing on their technical skills but focusing on the whole person. One last was access to connections we need to be able to get them connections to jobs and I think this is again a place where the R community shines in an incredible way. Whether it's our ladies the bio conductor group our medicine data carpentry open scapes. There's this incredible supportive friendly developer community and I think, you know Hadley and JJ and our studio and now pause it really deserve a lot of credit because they've sort of pioneered and bio conductor is also really big on this. And I mean all these groups really data carpentry is incredible so as our ladies, it supporting people and not making them feel dumb when they ask questions and not making them feel silly for wanting to learn how to do things. I've been a part of a lot of technological communities and most of them aren't like that and I think it's one of if there is a superpower of the R community. It's that it makes new users feel welcome, and it makes them want to contribute and that's like an unbelievable force multiplier across the entire community. So I just really wanted to shout out all of you and everybody else who participates in this, because it's really amazing. So, we actually, I'm going to tell you about the pilot but we've had about an 80% success rate at getting people through the training and the pilot and then I'm going to tell you about some of the challenges we ran into as well because the next step is trying to you know our original intention was to try to connect them to jobs. But there are some real challenges in making that happen. We've been working with Urban Alliance and now we're going to be working with an additional other partners to basically help identify job opportunities and make sure that people can take the training that we give them which is only 14 weeks. It's really introductory they really need a bunch of on the job training to really solidify those skills and start to adapt to the workplace and so we have internships that we place people in immediately upon completing the program. So I'm going to tell you a little bit about our pilot project that we started in Baltimore. This is what it's felt like the entire time that we were doing this project we've always been feeling like we're sort of fixing a bug in production as we're going in the sense that you know there's always something on the train tracks something we're trying to fix and usually our solution is something like this where it's sort of us trying to stop something from going off the rails right before it happens. Most of the time those are self inflicted and most of the time they're my fault and our team has been really good at doing this knocking the barriers down and getting people off the off the rails and so it's been really fantastic. To work with my team again you're going to hear about some mistakes we made. And those are all my fault and then you're going to hear about great things and those are all our teams good doing. So the way this worked was in 2018 Shannon I was our and I started thinking about this and we went down. We started building a little bit of a data science program that might be focused on this sort of community, the community of people who haven't really been implicated or involved in our data science to start. In April we had our first meeting with yo and head cat and we were excited and they were excited and they said, can we start the program immediately like can we start it today and we were like, well we don't really have a program yet. And we managed to convince them to wait until May 21 for the learning to start so that we started in April. So we had conversations and May 21 we had to have a program ready for these young people to take. Mind you, that we were going to build this entire program, and none of it existed yet at the time that we were talking to people and so this was me, immediately after our meeting with head cat stressed trying to figure out how we're going to possibly do this. And again, credit to this incredible team, the content actually got built in time for us to run this program and Shannon Ellis and I was our head of and we're like, by far the leaders on this and deserve huge amounts of credit for the like nights and weekends level effort they put into make that happen. So folks contributed content, and we basically develop these courses on GitHub. We developed them really openly and collaboratively. We had a whole sort of procedure that Shannon developed for moving courses along the sort of production cycle. And we were able to sort of build the courses for the, for the program. Oops. Wait, what happened. I skipped the wrong place. Sorry everybody. We go back. I don't know what button I push this is the problem with the Mac right here everybody. There we go. All right. So, the challenge though is a lot of our courses are built on GitHub and a lot of our courses are built on modern open source technologies. This is both awesome because we can get people to magic very quickly. It's also very scary because things can change really quickly. An example of this is we developed a getting and cleaning data course this is not for the data show program this is for our previous course era courses. This course has over 35,000 repositories that were built on GitHub from this course it's a pretty widely subscribed course. I released this course, more or less the same day that the d plier package was some was released by Hadley. And so, if you look on the x axis these are the number of scripts from our class that use a particular package. And on the y axis is the number of times we a particular library is called, and you can see that like the pliers the tool you would definitely use into getting and cleaning data project. And so our class was immediately obsolete like the minute we released it because Hadley released this d plier package like very stressful and we had to redo the whole thing. And also amazing because what Hadley released made everything that we were teaching people how to do the old hard way so much easier and so much better to do and so it's really been an interesting exercise and trying to figure out ways to automate and speed up the classes. And I'll tell you a little bit more about that work towards the end of the talk here today. So we wanted to build this, this program and we wanted to get it done for people, which we did. And so then our pilot began on May 21. And the first question that they asked us was, Okay, how long is this program going to take for people to complete, which is a totally reasonable question to ask, and to which we had absolutely no answer, because we just built the program last week. And so we have no idea how long it's going to take people to complete the program. We decided it was going to be a part time program, it was going to do 20 hours that we were going to have them do 20 hours a week of work for which we could pay them. And we said, I think it's going to take about three months. That was a number that was totally made up. And we had two pilot folks that did the program with us two young people, they were like astronauts landing on the moon, they had to deal with us updating the projects all the time and updating the courses all the time. They had to deal with us being our very first time teaching it to this kind of cohort of people, they had to deal with, I mean, just a series of nightmare scenarios. And we had projected that they would finish by August 31. Again, totally made up number, but it turned out it took until October 5. A large fraction of that delay was, we would get to the point where they were supposed to work on a course and we would realize that course was totally broken and we would have to like delay them a week while we like fix the course and updated it. It ended up being about 16 weeks for them, since we've settled on about 14 weeks for the program. So it wasn't a totally terrible estimate that we had but it was sort of a testament to the incredible talent of these two first young people who have since gone on to do some pretty incredible things. I'll talk about one of them here in a minute that went through this program to start. One of the problems that we immediately ran into is we try to get them jobs and anybody who's ever written a job at or seen a job ad will see will know this story, which is the real responsibility of a data scientist is often to automate some horrible practice that takes everybody a million years to do by hand currently or and then to write some ad hoc sequel or tidy first code as needed to solve problems for people around the place. So the experience listed will be like 15 years of deep learning experience in Python a PhD thesis on vision modeling and you know 10 years of creating Hadoop clusters from scratch. And I think that's partially because we're still at the early stages of people understanding what data scientists do. And so they don't really know what experience to list but the practical implication of this is that most people, most of the jobs that most people might apply for be interested in that they feel like they don't have the expertise or they don't have the required levels that will get them past the filters to get them into those jobs. Not only that but we have a real legal challenge as well to the data scientists were training so I ended up learning a lot about legal definitions of employment for this project, which is not a thing a bio statistician usually has to learn about. The first thing I learned about was the Fair Labor and Standards Act, which defines exempt and non exempt roles and by exempt I mean exempt from paying people minimum wage. If you're a salaried worker you're probably an exempt worker. And so they can pay you a salary but that means they're exempt from paying you overtime and minimum wage. And there are sort of six categories of people that can be exempt from the law that sets the minimum wage. And there are people that are managers people that have advanced degrees, people that are artists, people that are computer programmers. And it turns out that data scientists doesn't quite meet the definition of any of the exemption categories, unless they have a bachelor's degree that then they qualify for the advanced degree exemption. So the folks that were training have to be hourly employees. So there aren't a ton of data scientists jobs that are hourly employee jobs, including at the time they were like zero jobs that you could apply for at Johns Hopkins that were hourly that were data scientists job, even for entry level data entry data jobs, they were all exempt employees. And so universities couldn't hire them without some changes. So we went and looked into this we talked to many many lawyers to try to figure out how do we actually design the program and it turns out that what we needed to do is really work with employers to figure out a way to not work, not to, you know, bucket hold them into a particular button hold them into a particular exemption but rather to figure out a way to create new kinds of jobs that are available to these people. So we were this close. And again, we're doing this all on the fly and we have two people that are about to finish the program and I don't have a job for them. So I did what any, you know, normal person would do and I started a company. And so what I did was I reached out to my friend, Jamie McGovern, who is the husband of Rebecca Nugent who's a quite a friendly friend of mine and a well known statistician she's the chair of the Carnegie Mellon consultant. They're really close family friends and Jamie is a really well known consultant. And so he founded this data science consulting company with me called problem for data science. And we would take consulting work from a variety of different high profile clients around the country. And we would have most of the work done by consultants who we hired from this data science training program so we can give them their first data opportunity. And they were hourly employees with overtime and so that we could really focus on providing that first foot in the door opportunity for people. While that was going on we started working on the thing that we really wanted to do which is get them jobs at Hopkins because Hopkins is right next door where we're training people and has a lot more opportunity than I can generate in a little startup company. And so we worked for a year and this is really credit to Ashley Johnson, who was our program administrator. She really moved this through the system, working closely with the HR department at Johns Hopkins but they invented a whole new job code, specifically for data scientists programmer and data specialists, who are non exempt employees they're hourly employees and they can work at any of the campuses around Johns Hopkins this was a huge effort and a really important contribution from Ashley because it allowed us to start to hire our data analysts and data scientists at Johns Hopkins and we now have hired several of them at Hopkins which is really exciting. This template also provides a way for if you want to create such a job category at your institution, it's sort of a model and a template and you can point at Hopkins is a place that's already done it. If you want to try to create these kind of opportunities where you're at. So the interesting thing about this is we had to deal with data science problems and community relationship building problems and course development problems and HR problems, all the problems really. And we have this phrase that we like to use all the time Shannon and Abuzar and I, when we were working on this project we said hard things are hard to do. You really want to do something important and you really want to make a big change, it's often going to be hard and is going to be a hard thing to do. And so, again, I'd like to just give you a little brief moment whether you wanted to drop it in chat, or think about it. You know, quietly to yourself. Think about something that was hard that you accomplished. I wanted to take a minute to reflect on the really hard things that you've done that have made a difference for people in the world in particular and helping other people. And so I just wanted to give everybody an opportunity to think about that and to reflect on how much you're capable of if you really put your mind to it I think one of the things this project has illustrated for me is how much if we really put our mind to it, we can do incredible things for the community and for the world and so I'm sure you're all doing the same, and I want you to take a chance to sort of pat yourself on the back and think about something hard you accomplished. Great. I'm going to tell you a little bit about the impact that this program has had. Next. And so the program has had a pretty significant impact on the young people that we worked with and I think that that has been the most rewarding part of this I mean there's been all parts of it. But the first is that we hired so using that job code that Ashley created, we were able to hire one of our, I think it was cohort three graduates Dave on person to be the curriculum lead for data trail so he's currently working at Johns Hopkins Bloomberg School of Public Health. He mentors students through our training program, and he helps design and build our curriculum with Candace seven. And it's been really amazing to see the development of day on. We knew that he wanted that he wanted him to be the curriculum lead when we would have these parties at our house in Baltimore for all of the scholars and all the staff and we would have kind of a And Dave on would come and he would spend the whole time hanging out with my son teaching him how to play chess, and my son would like eat this up, and they want was an incredible teacher, and we just realized he's the kind of person that could really help our program and to really develop the, the young people who were part of our ecosystem and so we really have appreciated him and he was the first pilot candidate that got hired under our new data science hiring program at Johns Hopkins, since there have been others. And then another incredible outcome of this is one of the young people who was part of our very first cohort of data trail scholars was working at that company problem forward that we founded and one of the contracts that that company had was was to analyze data for HBO. And when we were analyzing data for this HBO special. They analyze data about economic opportunity and particular the redlining that has happened historically in Baltimore, and how the historic redlining of communities in Baltimore has carried through through 60 plus years and has set the geographic stage for housing prices and housing quality and housing support in the in the area. And basically what our young person from our program was actually credited in the, the sort of manager, Jamie McGovern who was the sort of co mentor, and then our, one of our first two data trail scholars actually appeared in the credits of the of the HBO documentary, it's really cool and you should definitely check it out if you have something to do this evening. The other thing that was really great and we're working on a paper that we're going to publish about this but by Simone Sawyer who was our community scholar advocate is really talking to our young people and asking them about what the program meant to them and what it, what it did and didn't do for them. And what was really interesting is it said, this is a quote from one of our scholars it gave me a new direction and taught me that you can be different. Go a different route than your parents not always hanging on to what society tells you, you can follow your own protocol so it just gave me a sense of like a new world or a new field and new life that I got to see. It's incredibly exciting to see our young people thrive that have come out of the program. Not all of them actually go on to be data scientists some of them go on to different jobs. They go on to different careers there you know when you were 18 you probably didn't know exactly what you wanted to do I know I didn't. So sometimes they stay in the field and sometimes they don't but I, regardless it's been so fun to see how this program has helped be a stepping stone for them to whatever it is that they want to take on post program. It's just been a really fantastic powerful component of the program to have this community of scholars that's been developing. We've learned some pretty important lessons as we were doing this. It was definitely exercise and running through walls and learning how to do things you know I'm a bio statistician by training. This was my first community based project so I had a lot to learn and the good news is I had really great mentors and teachers in our nonprofit partners who helped us learn how to do things well but but there were still a lot of things we had to learn. One of the things that was an immediately apparent from the first cohort and has been true throughout the entire sort of experience of working on data trail has been the data science training is actually the easiest part. So, you know, the young people they were working with their young people they're like good with computers they know how to pick they pick this stuff up fast they're like, often they end up teaching us about park packages we've never heard about and all that sort of thing. But we really need to be able to support the whole person for them to be able to go through the program, especially when you're working with communities where there isn't a lot of economic opportunity and there's a lot of economic stressors and financial stressors and so forth. We need ways to support the whole person and so whether that's providing laptops whether it's creating connections to the community for mentors, whether it's paying them to complete courses so they can focus on it as a full time job, whether it's providing online support and job search assistance. We've also had to realize that we needed to hire a community scholar advocate within our organization and their whole job is just making sure the young people have what they need that their house that they have access to food that they have access to income that they need to pass when they need to get somewhere. And we've been able to keep people in the program, I think, largely by virtue of not just thinking of it as a data science training program, but as a whole person support infrastructure to try to help them get through this training and building trust takes a lot of time. So, when we first started going down to meet with folks at the yo center. In fact, we actually approached 20 other nonprofits about whether we would. Whether they would work with us, whether they would do this training program with us, and we were frequently turned away because we weren't. We hadn't built up any trust. We didn't have any credibility. It really took the head cat folks taking a flyer on us for us to be able to work on this program. And the thing that we learned right from the start is that you know being there is the most important thing. We started to spend a lot of time down at the historic East Baltimore Community Action Coalition, and starting just being present being there, all of our training didn't happen at Hopkins we would like go to the head cat to be able to train people so that we were like present there. And despite all of that, I think we built some trust over time but I think the thing that is maximize the trust building has been Dave on. And really, it's about it, you know, this is a person that came from their community that they trust and know that they've he's been a member of their community. And he can talk about his experience of going through the program, getting a job, being a leader now for us. And I think that's produced a lot of value for our, or our relationship with not just the nonprofits but also with the young people that we work with. And really thinking about how do you develop those connections and how do you develop that relationship, which has been so incredibly meaningful and important for this project. And then we need advocates so one of the things that's really interesting about this project is thinking about, okay you're going to take a young person you're going to put them through a 14 week data science training program, as you all probably can believe in the shortness of that program and how little background they had in data science before we started. They're just at the very beginning of their data science career they know a few things they know a few packages they can do a little bit of work, but they're not fully formed yet, you know they're still learning on the job and so what does that mean it means we need to identify supportive employment opportunities with the mentors at those employment opportunities understand what kind of community we're working with and understand that this is the beginning of a journey. Think of it as a long term investment in that employee instead of a short term return. And that's taken a lot of work and also a lot of work to work with organizations to help them understand what it is that you need to do when you're trying to hire people from these diverse communities from these economic opportunity deserts. And so we've kind of had to do both we work with the sort of official bureaucratic processes HR and finance and all that. We work very closely with the mentors and the employers to try to help them figure out how do they work with this and I think we're doing an okay job at this but we have way more work to do on this side I think we've devoted a lot of attention to how do we build up the young people to do the data science training, and we're really at the beginning of our journey of how do we actually create a program for people who might hire people from our data trail program and how do we get them to understand how it is to interact with these young people as well. So then. Oh, and there's a good question about evaluation and in the text here which is how do we have a structured way to assess barriers and what metrics to assess future program outcomes yes. So we have a data collection and capture strategy. And we have a lot of educational data about them because all of the methods we use to do the training our high throughput educational methods. We also have access to the API is to all the questions they asked and all of that. And then we can collect standard, you know, financial demographic background data as well as outcome economic outcome data. What's interesting about defining success here is. What does it mean to be successful out of a program like this, and we don't want it to be that you have to be a working data scientist six months out a year out two years out. If a young person, one of the young people in our program is going to college simultaneously he has an intention of getting a degree and then going to medical school. If it becomes a doctor that's a totally successful outcome you won't be a data scientist but we want that to be considered a success so I think defining success has been a real challenge for us actually and for people who know more about this again not my area main area of expertise would love to talk to people who have ideas about that. And again we need mutually intensive learning so you know we provided social support financial support data science training to the data science learners. But now to the experts we need to provide you know and anti racism training mentorship training around what it is that we expect from them submit support for HR best practices and so forth so we have that sort of the right hand side of this is not really built up yet but definitely needs to be built up for us to be able to be more successful one of our big learnings is you have to sort of train both sides and both sides have to learn together for this to be a really successful relationship and successful mentorship opportunity. The future of data trail told you a little bit about the past. The data show model really is a college or university or medical center providing tutors and educational support nonprofit communities providing social and emotional support and financial support, and then hiring opportunities and opportunities. So for example in the case of the Baltimore data show program. We have the Johns Hopkins data science lab producing the sort of providing the tutors and the educational support. We have have tack and more recently hard smiles being our nonprofit community partners helping to identify and support young people, and then I'm putting positive here because they've been an incredibly supportive partner in and supporting some of these young people. And there are a number of other companies and organizations that also taken on young people and scholars out of the data trail program. So, you could also get involved in data trail and this is the part of the talk I'm most excited about is just as built, just as we talked about at the beginning of the sort of presentation. This is a project that's evolving and ongoing and we would love contributions from anybody and there's various levels at which you might participate if you're excited about this at all. So other people about data trail we like want to we want to get the word out we want people to know that this is a program that exists and that they can use and they can leverage at their institution. You can basically find all of the details about our program at data trail.org. So if you go and check that out it has information about our curriculum and the program and the design philosophy and all that. We also encourage more inclusive hiring practices at your organization. So we have a job description which we share with anybody that wants to have it that they can use to try to design similar job descriptions for non exempt employees at their organizations. This takes a lot of advocacy work but is a really rewarding way to create diversity within your organization. One thing that you could help out is recommend data sets for learning examples I think the our medicine community could be a really incredible resource for this is one that we're constantly working on. Our examples are starting to get a little stale from our original program and so we really would love to have some modern exciting new medical examples. The data examples are project templates. There are markdown based whenever possible they either shouldn't require too much background knowledge or you should be able to describe the background knowledge within the context of that are marked down. And the data needs to be publicly available or okay to be made so so that we can share with people and then we create data projects on positive cloud which we then share with people so that they can actually work on them and practice their skills. This is such an important part of becoming a data scientist is having those skills to be able to practice on projects so would love to work with any of you if you have any ideas on projects. You can find all of our code on github. So this is the code for the courses. So if you go to github.com slash data trails issue with data trail itself was taken so we took data trail that's ghu slash data trail you will find our entire curriculum. You can also add an issue there or send a pull request if you notice a thing that you want that you want to fix. So it'd be really helpful. And then hosting a data trail graduate as an intern. You can either do this with your own funding or with funding that we provide. We've already placed interns at a few different places. These are a couple of examples of the places that we placed interns out of the program. And what is required we have funding available to pay for some of these interns. If you don't have the funding to pay for one. We have an entry level data science project that you have in mind. We can help you determine if it's a good fit but it needs to be pretty basic and pretty introductory, and then some time to mentor the intern, you can meet once or twice a week. You can have the post a postdoc or senior grad student could be the direct mentor as long as they have some experience mentoring, and preferably eight weeks or more in duration so they can really get some experience, ideally with some culminating publication and that could just be a get help project that's put up. If you have any interest in this at all you can email me JT leak at Fred hutch.org, or Candice Savinen at her emails see Savinen at Fred hutch.org. She's sort of leading a lot of the data science data show program at this point with me from the Fred hutch side and could connect you to potential projects that you might do if you were hosting an intern. And then you could create an extension course and so this is where I'm really excited about our medicine and the potentials here I've talked to several different groups about this, and so far the project hasn't really got off the ground. So this is me doing a, you know, throwing this out into the world and hoping the universe will respond. We would really love to tailor add on course or an elective course that's focused on clinical data management. So all of our content online of what's on our original program here at this URL. And I can also paste it. Somebody can paste in the trap chat maybe for me if that if they get a chance. And so you can go read what they actually learned previously, and what they would have already had experience in. But we don't really focus on clinical data we don't focus on red cap we don't focus on any of the things that a clinical data manager might be interested in. But we would really love to add an add on course so we can focus on that sort of area of data science. And if anybody's even remotely interested in collaborating I hope they will reach out to me about that. You can see the self top version of our courses on lean pub, we put them up there so that they can be available for free you can pay a suggested price but it's more like a donation and a payment, because you can tune the slider down to zero so you can take the course and get a credential a data trail credential from taking the course. And then we have a whole infrastructure for designing these courses and releasing them on lean pub and Coursera and on the public web. It's called order project you go to order project org. So if you decide that you have any interest in working with us on a clinical data management course, we can collaborate via this order project which is basically a GitHub based course development framework that's really useful for multiple publications simultaneously. And so, finally, the sort of heaviest lift that you could potentially get involved with is starting your own franchise. So, we have decided that the model for scaling data trail beyond Baltimore is what we're calling a franchise model. And by franchise model we mean, we give you everything that we have for free, and then you run your own program if you're interested in running your own program. So, this has already been done a couple of times. The longest running one is at Mount Sinai health system is running a data trail program. But we would be very interested in supporting you if you decide that you'd want to run such a program yourself at your institution. If you're going to do that you need a nonprofit training partner for helping you to identify students from the community and providing that wraparound support, you need funding for some kind of Chromebooks and a stipend. So, typically on the order of this is now more like four or five K per student. It was three K when we started, and then a team to support the students through the program that's a program leader is typically somebody who's sort of supervising and managing typically 10 to 20% effort from two to three tutors to sort of help people do the program, and then you run the cohorts and sort of 14 week cohorts. So, the initial hiring partners, again, ideally eight weeks with significant mentorship this could be your institution it could be corporate partners. We've developed a whole collection of materials, course materials instructor guides, you know templates for hiring advice around fundraising, etc, that we can provide to anybody that's interested if you have any interest in sort of setting up your own data trail franchise. If you have any questions about this. You can email Candace here at C seven and Fred hutch.org or me at GT league at Fred hutch.org or go check out our website at data trail. So the last thing I'll sort of leave you with is thinking about paying it forward and I think this is a really useful feeling that I've had and I hope you'll feel the same way about how exciting it can be to pay it forward. I think that we can help each other and data trails just one way we can help each other, I just feel like the our community and I know they are medicine community. I just so happy to be a part of these communities because they always help each other out. And sometimes it's helping each other out on something like data trail which is a big ongoing project. Sometimes it's just helping each other with a little bit of code. I think we're mentoring or bring over helping pull up somebody that's behind them on the ladder. And I think I hope that you will continue to do that because I love being a part of this community that you're all creating and I want to continue to be a member of this community. I just wanted to highlight a quote from my advisor Jim Powell I use my undergrad advisor, and I was a goofy kid from Idaho is we had no idea what he wanted to do and Jim took me on and helped me get involved in a couple of research projects and honestly, I really probably wouldn't have had a career that I've had without Jim's advice. And I emailed him after I finished. Actually it was like right after I won this big stats award. I sent an email to Jim and I was like, just kind of explain how grateful I was for everything that he gave me and and every all the support he gave me and like how he had no reason to do it other than just just being a person that likes to help other people. And he wrote back mentorship is a debt, you don't pay off, you pay it forward and then he said thanks for helping me pay off my karmic debt. And so I feel like I'm paying off karmic debt to all the people that helped me along the way. And I know we thought about those people that helped you along the way earlier today and think about that karmic debt and I hope that you will continue to be the amazing supportive community of people that helps others that you have been this whole time. And there's almost nothing more rewarding than paying it forward and helping other people to be successful, especially exciting young people who know a lot more than you do and we'll figure things out you won't be able to figure out. With that I'll just thank you very much for your attention and happy to take any questions that you might have. Thanks very much. Thank you so much, Jeff. We have some questions that came in through the chat, and we'll start with Jeff Curtis saying this is a tremendous effort. How do you screen people for the program who are serious and will be committed, some of the typical achievement metrics used for employment evaluation and an education may not apply. That's an absolutely fantastic question so the way we've done it is so we're doing scalable education and really non scalable community effort work. And so, on the education side we put everything online we put, you know we run thousands of people through these courses but for this program. We work closely with nonprofit partners who have been part of the young people's training throughout the entire process and they actually recommend people so rather than having one metric or one screening criteria which we have found excludes people like maybe they're bad at testing or maybe they, you know, get anxiety and so we have found that the best way to identify young people who are committed and will show up is to work with the nonprofit partners who've worked with them for years and can easily identify who's going to show up and who's not going to show up who's going to work hard and who's not going to work hard so it's. I wish I could say it was real quantitative but it's more, you know, deep partnership with the nonprofit partners relying on their expertise really. Excellent thank you. A question from Hannah Hill, do graduates of the data trail course receive a certification. They do they receive a certification on lean pub which is a massive online open course provider. The original CVS program has a Johns Hopkins logo on it we're working on getting data trail so it has a little Johns Hopkins logo on it it's not an official Johns Hopkins credential. Johns Hopkins massive online open course credential which is a different thing, but they do get a credential that's like a diploma that comes out of that and it's the same one that anybody would get if they took the program for free online by themselves. So that's one interesting design choice that we made is we didn't design a program that only these young people take literally thousands of people have taken these down these training programs and gotten these same credentials and it would be indistinguishable whether they went through our training program or whether they did it themselves online. And so we think that improves the value of the credential because it means there are lots of other people, software developers who taken the training program and have that credential. And then also our young people and it sort of elevates the value of it of the credential so yeah they get a credential. It's not a full like university degree. So one of the things we're working on with another project we work on something called the genomic data science community network and I'll drop that in the chat here the URL for that. This is a group of community colleges HBC use tribal colleges who are folk and their faculty all those places that are focused on data science education. And we've been talking a lot especially to the community community college groups about creating stackable credentials so where you could get some community college credit for having completed the data trail program, which would allow them to shorten their time to the community and and we do really think that more education is necessary for a lot of our young people and so how do we shorten that and reduce costs for them by having them get the data trail program so that's future work to lots of future work. Another question from Peter have you started to try to clone this in Seattle how hard is it to find the community partners and build trust. Great question yes we have started to build it in Seattle, we have a couple of philanthropic donors who are interested in supporting it which is our first step. Then we're working with finding community partners right now. I'll tell you it's going to be exactly what it was at Hopkins which is they won't trust me until I've been there for like multiple years and I've shown them that it will work it's a very. You know it's like it's like anything right you're building a relationship with a group and we're at the very early stages of trying to build those relationships here and it's going to take some time but yes we're building it here I'm excited to run it here it's going to be great but it's going to take a couple of years like it did at Hopkins before I feel like we have a really strong relationships with those groups because I mean in their defense right like I'm just showing up out of nowhere to try to do this you know who knows if I'm trustworthy or not they'll have to verify that I'm trustworthy before we really can get get going with them. And then a related question. Do you have some specific examples of that trust building process. Yeah I mean there's a variety of them one is when when you actually place young people in jobs that's a pretty important that's a long term trust building thing but when they see a member of their community like Dave on goes back now all the time. And he's got a job at Hopkins and it's a really well paying job and he's doing really well and he gets to go back and tell them that this is a real thing that he did go through our program he did work with us and he did get a job. I think the long term one, the shorter term one for me was actually just spending time I would like sit down at HebCAC and just do my work and just kind of hang out and like talk to young people I was, you know, I looked very different than a lot of the people that were hanging out at HebCAC so like I was a curiosity and people would come and talk to me and. I was probably distrusted quite a bit and then I kind of became part of the backgrounds, you know I was down there enough that they saw me and I think that just took a while just showing up being there. And so it is it's a lot of work I guess I don't want to make it seem like this is a thing you can do really easily. It is a lot of work and you have to care about it a lot if you want to do this kind of program. But it's for me at least it's been one of the most positively impactful things I've done in my career. And a question about being a freelancer, would it be possible to start a franchise coming from that perspective. I think so I mean we were happy to provide the materials to anybody that wants it. It helps a little bit to have some infrastructure and support because you're going to need to make partnerships with community based nonprofits and partnerships with employers and things like that. And we can help provide a ton of advice. You might be might be easier if you at least have some friends if you're a freelancer is this would be a very hard thing to do by yourself. I would not recommend it by yourself. But if you had like a little cohort of people that wanted to do something like this then then yeah I could probably work. And another question how hard was it to convince HR to let people hire folks without a college degree, we have trouble doing this for study coordinators is from Peter. A huge amount of work. That was probably one of the hardest things in this whole project I said that is kind of a one slide and many but that was a Ashley spent a year talking to doing meetings after meetings after meetings and like showing evidence and you know building the job code for them and all this kind of stuff so it was a non trivial operation. I ultimately ended up pulling rank a little bit and getting some senior leadership involved, and that helps. But the HR department actually ended up being quite supportive in the end so it was it was first it was a little frustrating but once we got over that hop hump it was actually they ended up being a real collaborator with us. So if you can find your people and HR they can help you out and we have a, the one thing that's an advantage is now we have a job code that's officially listed as a Johns Hopkins job code that has a no degree requirement. And it might be useful. We had to point to other institutions where things like this existed and so it might be useful for you to point to Hopkins and say this exists there as a way of making inroads with your HR department but yeah that's another it just takes work to make it happen. Good question on Peter. And we'll float this one last question in the interest of time from Robbie Niroko in the Q&A. Thank you so much for the talk. My name is Robbie from Ghana, I want to find out is the program open for someone like me to join. Also I had my undergraduate studies in nutrition biochem and I've been looking for graduate studies opportunities. I'm an intern as a data analyst currently one barrier I face in my applications is that I have to have a statistics or mathematics background to pursue bio data science related course. How can I get around this. Yeah, so I mean, this is a fantastic question and I really appreciate it and so our program, the full wrap around program that we run currently only runs in cities in the US that we are talking to a group in Nigeria about running it and a group in Ghana about running it. That hasn't actually gotten off the ground yet. And then. So, so it doesn't exist I don't think it's the full wrap around program that being said, all of the courses that we offer are 100% free and online and so you can go start doing them today there's no prerequisites no background required if you just go start the courses you can start them right now. But we don't have the sort of wraparound services for you yet but I think that's something for us to think about is how do we make it more global as we go so. So the good news is there's no barriers to entry the bad news is you're kind of, it's not like you're on your own but it's a little bit less wraparound support services to take these courses at this time. And then you can do with your background, because you'll at least have some our programming background there's a little bit of statistics in the in the program as well like some linear modeling and things. So you would, you would get some of that background from taking the course and there's no mathematical prerequisites other than, you know, basic math algebra and things like that. Thank you so much again. Fantastic talk. If there are additional questions please feel free to continue the discussion the chat, but we'll go ahead and move on to our next speaker. Thank you.