 Hi, I'm Carlos Bustamante. I'm a professor at Stanford of biomedical data science and genetics. I was born in Caracas, Venezuela in 1974. What got you interested in science early in life? So I guess as a child, I was always interested in the intersection of history and science. So I remember wanting to learn everything I could about the ancient Egyptians. And there was a series of comic books actually that came out when I was five, six years old, and it was maybe 26 volumes that walked through the history of humankind. And it started actually with evolution and then sort of walked through Australopithecines and Chromagnon, and I just thought it was incredible, right? And so I remember when we came to the States, my grandfather gave me three-bounded volumes of this, and I would read it all the time, right? It was just sort of this thing that I still have it today that just kind of got me really, really interested in how did that happen, right? How do we tell the history of our species? In high school, I was pretty much set on actually going to become a lawyer. I did speech and debate and had done drama and loved argumentation, and so this is what I want to do. And my high school nominated me for an NSF Young Scholars Program, which would have been about 27 years ago, it was 1992, somewhere between my junior and senior year in high school. And I threw away the application, thinking, well, I'm not going to do this. And I'm cleaning out my room and I find the application that had fallen behind the waves basket. And so I said, oh, you know what, I'm going to fill it out just in case. We were in Miami at the time, and this was to go to Florida State. There's no way my parents are going to let me go to Florida State. But on the back, the essay was, you know, why do you want to go to this program? And I wrote it out longhand, and it was basically, I'm pretty sure I want to be an attorney. I just want to make sure I'm not making a mistake. Being now a faculty member, I can imagine sort of reading this and saying, well, if we can save one soul, right? We got to get this person in. So I spent the summer in a physics lab and learning modern physics and applied math. I was just blown away. I said, oh my God, this is the coolest thing in the whole wide world. And the other part was that I brought together other kids who were interested in this, right? So it was like the first time I was really surrounded by a cohort of people who were just passionate about science. I jokingly call it math nerd camp, right? And it was just so cool. I came back and said, you know what, I'm going to change my mind when I go into science. I started volunteering at a lab at the University of Miami. My dad was a physician, is a physician. And so there was a six-year BAMD program that I applied to and was honored to get admitted into. And my high school biology teacher, my AP bio teacher, when I graduated, gave me a bunch of books. This was his gift to me. And they were books by Stephen J. Gould on evolution and a book by a guy named Richard Lewinton called The Genetic Basis of Evolutionary Change. And it was a beat-up old copy. It was his copy from the lab. I spent the summer reading that. I thought, wow, my God, this is incredible. Tell us about your undergraduate experience. Who were some key mentors? So I landed at UM in Ken Spitzy's lab. And he needed someone to wash the glassware. I started washing the glassware in his lab. He was a recent arrival from Indiana working on population genetics of Daphnia. And one day he turned to me and he said, well, you know, I could really use somebody who knows how to program computers. And I said, well, I know how to program computers. And I want to build a model with all the evolutionary forces. I said, well, yeah, I know I've read Lewinton's book. I can probably piece some of this stuff together. So I wrote a kind of first simulation program. I wanted to compare it to data. And so that kind of got me started. Doug Fatuma came that January to give a series of lectures at UM. And, you know, I was doing research. And so I felt, you know, I should go to these lectures. And Ken was hosting him. So we went to dinner at Ken's house. And Doug and I got to talking. And he said to me, what are you interested in? So I'm really interested in population genetics and its relevance to medicine. Dean Hamer had just published his paper on potential genetic links between XC-28 and homosexuality. And I just thought, wow, this is just so cool. And Doug says to me, he goes, you know, you should really think about transferring to Harvard and work with Richard Lewinton. And I said, oh, that guy's still alive? You know, books from when I was born. It was literally 1974. The same year I was born. And I said, oh, this is incredible. So I went home and wrote my application. And this time I typed it out. You know, the essay was, you know, why do you want to come to Harvard? I said, well, you know, I'm in this six-year BAMD program. I'm pretty sure I want to be a doctor. I just want to make sure I'm not making a mistake. And I'm interested in potentially working with this guy named Richard Lewinton and doing population genetics. That must have caught somebody's attention, and they admitted me. And so I said, OK, you know, I'm going to leave this program and land in my first day on campus, pack my stuff, you know, put my stuff away. Ten minutes later, I walked over to the Science Center, stopped the first person I saw. And I said, I'm looking for Richard Lewinton. It turned out to be one of his graduate students, like the proverbial alien landing, you know, take me to your leader. And it was Peter Goss, and he took me to Lewinton. And Lewinton had this incredible ability to take everybody super seriously. In fact, the more junior you were, almost the more seriously he'd take you. So he said to me, you know, come into my office, tell me what you're interested in. He spent two hours talking about population genetics and evolution. And he took me super seriously, which I don't know if I knew anything back then. He turns to me and goes, well, you know, our research interests are aligned. It'd be wonderful to have you in the lab. So I started in Dick's lab, and it's been an incredible experience. Can you talk a little bit more about Richard Lewinton's lab and his influence on you? It was incredible. So the Lewinton lab in the 90s was just an incredibly exciting place to be. The Bell Curve got published when I was in Lewinton's lab. And I took his evolution course as a sophomore the following year. Because he had started to willow widow down the lab, he actually didn't have any graduate students to teach. And he said, oh, I'm not sure this is a good idea, but would you be interested in potentially, you know, being the teaching fellow for this class that I teach with Steve Gould? And I'm thinking, oh my God. Are you kidding me? I'm not sure. And at the time that section had like, you know, actually ran a separate section. So I ran a section on population genetics and we went through Hartle and Clark. I mean, the whole thing was kind of nuts. But Dick had this incredible ability of just taking you seriously, taking your ideas seriously. At the time, Andrew Berry was in the lab and Susan Albert was in the lab and Kristen Artley was in the lab. And it was just this incredible. Every Thursday we'd have the population genetics lecture, tons of work with the Hartle lab. And it was just this amazing confluence of people interested in population genetics, people doing methods development, and really thinking hard about where the field was going to go. How did debates in population genetics at the time influence where you went to graduate school? And at the time, you know, population genetics was in this complete kind of schism, right? People were writing papers like, long, you know, the neutral theory is dead, long live the neutral theory. We felt we'd already gotten, you know, sufficient data over the last 30 years, first from alzymes and then from DNA sequence, to pretty much understand all these evolutionary forces are important to the details that matter, different species and different populations have their own history. And, you know, the devil is going to be in the details. The human genome was still 15 years away, right? Because Salera hadn't started and the idea that you would use population genetics in human medicine was honestly laughable. Like, I was applying to MD-PhD programs and said, oh, I want to bring in population genetics into medicine. People said, well, this has got nothing to do with it. You should really go study molecular biology and developmental biology. Like, that's the real genetics you need to know about. I said, oh, that's not what I want to do. I want to do population genetics. And so when I was applying to graduate school, I said, well, you know, clearly MD-PhD programs aren't going to be the right thing to do. So I'm going to apply to graduate school in population genetics. I said, you know, the two places that have critical mass are Chicago and Harvard. If I don't get into one of those places, I'll go work somewhere for a while. And luckily, you know, I got into both and had to kind of figure out what I want to do. And decided to stay on and work in Dan's lab, partly because everybody in Chicago had trained with Luenton, right? And that was so influenced by the work that he had done and his sort of worldview around just really incredible statistical rigor in how you interpret and overinterpret your sort of results. And in the four years I was in Dan's lab, the world changed on a dime, right? In my first year of graduate school there was one job in population genetics and Hiroshi Akashi got the job and it was in Kansas. And everyone was like, oh my God, he's so lucky. Like I love a job in Kansas. I'd be an awesome place to be a professor. But, you know, odds are I'm not going to, you know, land a job. And I remember, I want to be a theoretical population geneticist. And even Luenton said to me, Carlos, I think that's a terrible idea. He said, this has gotten really, really hard. You have to be an applied mathematician. You're good at math, but you're not that good at math. You know, you should really try to keep a foot on the experimental world. And I'm not sure what's going to happen in population genetics. How did the draft HGP announcement change your academic career? Well, between 97 and 01, right, Solera just gunned to compete with the federal government on the Human Genome Project. And you could just see, like, the excitement, mount and mount and mount and mount. And so I literally graduated three months after the announcement of the White House, the Luenton White House of Francis and Craig saying, yeah, we've sequenced the Human Genome. We tied. It was just wild. And I defended my thesis on a Friday. The following Monday I was interviewing for faculty jobs, right? That's how quickly the world changed. Like if you had some bioinformatics skills you were interested in going into human genetics, like the world was open, right? Like this whole, it was just completely new terrain. Went to Oxford and spent time with Peter Donnelly. Again, just sort of thinking like I need to get that kind of just incredible statistical background to do this, right? How was that transition to a more math intensive terrain at Oxford? You know, math is a little bit like weightlifting, right? Like we all start a little weak and then just get better and better and better, at least most of us, right? I mean, there are some people who are just born with a mind that kind of snaps things together. I'm certainly not one of those people. And the one sort of looking back, right? I wish I could tell you this was very strategic, but it wasn't. I just kept taking classes in statistics. And it started, you know, probability theory. In fact, I taught statistics with Luenton partly to learn it. And just kept taking classes at the stats department and the stats department. By the end, like I actually could have done a joint PhD in statistics just from all the work that I had done. So I felt pretty prepared on, because I was doing computational work too. So it wasn't, you know, proving theorems about, you know, asymptotics of where distributions would go, but rather it was like, you know, let's bring the computer in to solve these complicated Bayesian models. It was the beginning of Markov chain Monte Carlo. And you can sort of see that it was going to be useful in population genetics because the incredible thing about population genetics is that it gives you a theory on which to hang the data, right? We actually don't have a lot of that in biology, right? Or rather we have a lot of theories in biology, but we have few theories that are as predictive as population genetics about what they're trying to predict, right? So, you know, Hardy-Weinberg is pretty damn amazing, right? Like, you know, many, many, many markers you look at really aren't Hardy-Weinberg equilibrium, which you don't, you know, you're kind of almost shocked to find. And then the ones that aren't Hardy-Weinberg equilibrium are often not at Hardy-Weinberg for pretty good reasons. And like that, you can just continue to tack on theory. And so, you know, it was sort of an evolution of where the data was going and what we needed to do to really build the infrastructure. The community was also incredibly small, right? There were maybe a couple hundred people interested in human population genetics. And after Oxford, I was at Cornell, and one of the first meetings I went to as a Cornell professor was at DECO genetics. And in the auditorium of DECO genetics, you pretty much had everybody in the planet who was interested in this stuff. And it was so amazing, right? Like, the world was new. You were seeing this stuff for the first time. David Reich was still a graduate student, you know, talking about the stuff that he was starting to do on selection and trying to identify patients without phenotypes. I remember this from his lectures, like, we'll look at selective signatures and we'll be able to identify these outliers. So as the world evolved, you know, you just began to see how what we'd spent 70 years of population genetic theory developing was now highly, highly relevant to what was the most important program I think of the 2000s, namely figuring out how we do complex disease genetics at scale. How did the post-HGP emphasis on SNPs bring together human geneticists and population geneticists? The population genetics trotskyites. The true believers, right? We'd say a SNP, come on, that's just a polymorphism, it's just a variation. Why do we need a new name for this, right? It's like the human geneticists have just discovered population genetics, get out of here, right? Because they were totally different fields for a long time. And so for us, I mean, maybe it was just haughty and arrogance, right? It's like, well, of course this is what we have to weigh in on. It's like the one thing we know, right, that the world actually now cares about. And so we felt just extraordinarily well-prepared in many ways, even though we were young, right? Most of us were in our 20s when this stuff was happening to tell Francis Collins, no, Francis, this is why we got to do it this way, right? So you have this cadre of people like Gonzalo Lacossus and others who were sort of part of that kind of cohort that really began to lay out the theory and then the experimentation to see, are we doing this the right way? And I really credit the community in seeing the importance for this, right? They could have said, you know what, you geeks get out of the way, right? Like we're doing serious big science. But in fact, it was like, no, we really want you in. The community, please come and help us get this right, you know? And Peter, in fact, was one of the first to have a map grant, right? Because he sort of cornered Francis and said, look, we need theory to sort of guide this stuff. I felt super lucky as I was at Cornell with Andy Clark, who's like world's best population geneticist in my mind at that time and if not today. And Andy was just building an incredible team out there to think about these problems. And we also had the really interesting advantage of not just working in humans. So the first work we were doing, so my PhD thesis had actually nothing to do with humans. It was under a soft line of rabbit-opsis and comparing patterns of variation in model organisms and trying to figure out what can we learn about selection in these different regimes. As the human population genetic data came out, then we wanted to pounce on it. And Andy had advised Salera. Salera turned out to be a great business model for ABI, right? Namely sell sequencers to both the federal government and to our own subsidiary. They never got the data business, right? And unfortunately, went out of business. But they had accumulated a ton of data, including data on 39 exomes, 2002, 2003. And shared that data with us. It was 39 people who had been sequenced. We had the very first exome data at the same time as we're starting to play around with the SNP data. Again, seeing things for the first time in terms of doing the first human mouse alignment. So it was an incredibly important paper that Andy and Rasmus led. We then said, okay, what would happen to our understanding of human amino acid variation and the selective forces? And those were the papers that ended up making my early career, right? So I just had a series of these incredibly interesting ways of tackling the problem. And the public efforts really filled in just so much of what we didn't have from those exome sequencing data sets. So we felt, you know, we were really humming it, being able to make use of both. It also gave me, you know, a real understanding of why industry is so critical. Because they can do things at scale that is often very hard for academic groups to do. And why sort of hybrids like the Broad or the WashUes and these sort of large genome centers, which are effectively run as companies, you know, they're sort of non-profits, but they have to operate with the same kind of mentality, have just been critical to producing the data that we all rely on. And as the data accumulated, what we began to see is that, you know, the Luentonian worldview, which I had always, you know, adhered to, right? That, you know, we're all 99.9% identical. So what does it really matter in terms of variation from population to population? Probably just noise. And so we could probably learn a lot from just doing deeply, you know, a couple of populations. I think that turned out to just miss, you know, the devils in the detail, right? The differences that do exist across populations. Many of them aren't important, but there are a handful that are actually critically important. And that got me very worried about this question of how do we design these things going forward, right? HapMap was never meant to be anywhere near comprehensive or it was a pilot, right? Let's get out and begin to get data at scale in the human genome. We'll get zillions of markers. Many of these markers will be shared across populations and we'll use it to map. And again, to a first-order approximation, that was right, right? You could use those reagents and samples of several thousand cases and controls and begin to pick up the low-lying fruit, low-hanging fruit. But as we began to look at cross-populations, it's like, wait, this is happening in this population but it doesn't necessarily replicate in this population. Is there a real lack of replication or is that a power issue? And how do we think about really layering on the scaffolding that we need to do trans and multi-ethnic mapping? When did you realize that HapMap was going to be a much more complicated project than anticipated? Well, I think 2007, 2008, when we were designing 1000 Genomes, we had HapMap 3D data, which included additional populations. And again, partly because I had seen the Celery example, began to work with industry on data they had. GlaxoSmithKline had invested in early affometrics 500K. They'd bought like eight different versions of the AFI 500K array and genotype tens of thousands of people looking for pharmacogenomic response. And they did this on a global scale. They had population like 80 different countries. They had just a massive effort. It turned out to not be the best experiment for Glaxo. They shuttered it and they gave us the data saying, well, maybe you guys can get something interesting and meaningful out of this. So in collaboration with colleagues at GSK, we began to look very closely at population structure and deep population structure, including say population structure in Europe. And I mean, that was really the moment. That's the sort of 2008 November paper where you look at individuals of four grandparents from a given country. You use almost off-the-shelf tools like principal component analysis, which are the opposite of what I had spent my early career doing, right? Spent my early career building models and saying what are important are the parameters. We need to estimate those parameters really well. We can do that with small amounts of data if we're really thoughtful about how we set up the statistics. When you get to this sort of GWAS data at this scale, like you can't learn models from it, the SNP chip data is ascertained in all kinds of wonky ways. So it's not pure good population genetic data. So you kind of throw your hands up and say, what are we going to use? We bring in principal component analysis. All of a sudden, it just screams at you, right? The principal component analysis of the genetic data mirrors the geography of Europe. You can almost overlay them. And so that was, I mean, super cool parlor trick, right? Yep, I can get that to work. But what are the implications? So we looked at trends for, in that paper, trends from sort of the World Health Organization and others on disease incidents and other phenotypes. You know what? They mirror the same gradients of genetic variation. And because we'd been working on plants and other organisms, we said, wait a second, like this is a serious issue we need to address, right? And plants, population substructure can kill you, right? I mean, in the sense that if something's fixed in one population and fixed in the other, the trade, and there's a bunch of genetic markers that are fixed between the two populations, you can't map it, right? You need co-segregation. You need to be able to really bring those things together. And this became this whole thing from 2006, 7-on of how do we really begin to do population substructure-aware genetic mapping and bring those principles in. And so I said, look, if this is Europe, right, and you can't get a single group that you're, you know, like we've got isolates, fine, I'll take those as true kind of good populations to use for mapping. But what about the rest, right? You know, how are we going to do this on a global scale? What were some issues of representation raised in the early meetings for the 1000 Genomes Project? So we had, at that time, the beginnings of the meetings for the 1000 Genomes Project. And we're at Cold Spring Harbor. And there's a group of us, maybe 80 people, 100 people in the meeting, to build a coalition of the willing, as David Altuer called it. And the proposal was let's try to do 1000 Genomes. How would we allocate it across representative populations? We've been close to trying to be comprehensive. But let's make sure we include several European populations, several African populations, and several Asian populations with the idea that you're going to pick populations that are going to close to each other so that there's a bit of a better bet at getting alleles that might be relevant across all of them. And the group of us said, wait a sec, this is going to be an issue, right? Look, we're sitting here in Cold Spring Harbor, and we're really going to do a project that includes nobody from the Americas. That's a massive missed opportunity. And many of us realize, look, dealing with vulnerable populations is a very tough problem. And populations who choose not to participate in biomedical research, you obviously want to respect and honor that. At the same time, number one, it doesn't get you off the engagement problem. And two, if our ultimate goal is to enable medical and disease genetics at scale, then we don't need the perfect sort of platonic models of how populations have split over time. We actually just need to engage potential patients. In that case, then let's go ahead and then study African Americans and Hispanic Latinos and just be very practical. And as a side consequence, we will get admixt genomes that will require us to think about admixture. And there was a group that said, well, you know, but these are admixt, and I'm like, so what? Like, they offend your Victorian notions of race, get out of here, you know? Admixt is a fact of life, man, you know? And so we've got to really kind of embrace that. And what was fascinating is like, so here's a project that's got 400 from three populations. It's already 1200 genomes, like, under promise and over to deliver, you know, because it's a thousand genomes project, we've actually got 1200. And a group of us is saying, you know, in fact, we got to add more. And so folks were concerned, well, can we sample them on time? And how would you do this? I said, well, let's put together the best sort of scientific plan we can based on the data at hand. We'll take the black. So Smith-Klein data, we'll take the hat map, three data. And it became evident that, you know, if we did 500 or so genomes from the Americas and we sort of sprinkled them in the right way. So if we do set up doing four, let's do seven, we'll do some in trios, like we've probably got a pretty good hedge on the alleles that might be relevant that we'd likely see at higher frequency in the Americas that may be absent in other parts of the world. At that time, there's a hypothesis, right, that there are going to be population-specific alleles that may be relevant to medical genetics. Hadn't really been proven out at any kind of serious scale. There were anecdotes or examples, you know, we've got sickle cell and we've got, you know, CCR-5 Delta 32 and others, but like, that's not a catalog. What were the arguments against including populations from the Americas in thousand genomes? Well, I mean, there were arguments about, can you, you know, like, this is going to delay the project and how are we really going to do this? You know, does anyone have samples in hand? And then we'd like to have samples that can get used that have the right consents, obviously, and the ability to have cell lines made and so on. So we, you know, and we don't have money for sampling, right? I said, okay, don't worry about it. We got a group together. We sort of self-funded the collections and said, we're going to translate the standard thousand genomes consents and begin to work with partners in Peru and Colombia and Puerto Rico. We've already got the thousand, we've got the African-American samples from Norman, Oklahoma. They're a part of HapMap 3. We have the Mexican samples from LA that were part of HapMap 3. We can include those. And I have to say I'm very proud because one of the four populations from Europe was the Spanish population of blood donors, and we were able to scramble fast enough in the Americas that we actually got our samples in first so that thousand genomes phase two actually had almost, you know, all the admixt genomes before they had the rest of the European genomes done. And so the other thing that happened is, of course, our friends said, well, if we're going to do the Americas, like, what about the Indian subcontinent? I'm like, yeah, what about the Indian subcontinent? It's a billion people, right? How do you not include people from the Indian subcontinent? And, you know, luckily got, and Southeast Asian. So we luckily got enough traction to add another 500. So the project went from, like, initially 1,000, which is really 1,200, to 2,500 samples, partly because some of us got together and said, look, this is important. And the data is telling us it's important, right? We made extraordinarily rational scientific arguments. These weren't political arguments. These weren't, this is, I mean, they weren't even ethical arguments. I mean, you could make very strong ethical arguments, but these were, like, from a purely scientific point of view, right? You know, the anthropologist from Mars point of view, you need to include these in order to make sure that the medical genetic studies that you're powering in the next phase, in the 2010s, are going to be properly powered in these understudied populations. So it became, you know, this kind of mission for me and my group and the people that we attracted. You know, we began to transition out of doing plant and animals. We continued to do dog for a while. And dog genetics has been a passion of mine. And, you know, part of the reason I moved from Cornell to Stanford was because Stanford Department of Genetics always been anchored at the medical school. And Mike Snyder, who was then coming in as chair of genetics, sort of committed to saying, look, I think this is really important. We want to do these kinds of studies at scale. Come work with me and we'll recruit the people to make this happen. It was an extraordinarily great move in terms of really building the kind of cohort of people that would move this ball forward. And today, you know, it's just incredible to see the people that trained with me and others as part of these efforts are now leading ancestry efforts at 23andMe, leading ancestry efforts at Ancestry.com, Neymar Kenney, who was a postdoc in the lab, is now leading efforts at Mount Sinai School of Medicine. Chris Canoe, who's in the lab, is now at the University of Colorado with the Biobanks. And so it's this sort of model that we really need to get the next generation who's sort of sitting at that interface and understands where we're going in terms of the data streams to bring their opinions to bear and the data to bear to power the next set. Why do you think GWAS across multi-ethnic populations weren't used earlier? Yeah, I mean, some of it is just success built on success, right? So I look at decode. Decode was just extraordinarily good, right? So most of what we learned, not most, half of what we learned in the 2000s about complex disease genetics, you know, came out of Iceland and came out of decode, right? You kind of can't fault them for that. And I think NIH invested, as it tends to, in the tried and true, right? So we knew WHI. We knew these consortia that had, you know, the set of samples. And so you start with where you got. And, you know, those cohorts, especially of older Americans, are going to reflect the demography of the country 60, 70 years ago. They're not going to reflect the demography of the country today. But there was also, I think, a growing appreciation that we had to get this right. You know, NHLBI in particular, I think, took this mission on to say, look, we got, you know, it's our mandate. And because of the health disparities that exist in heart, lung, and blood across African-Americans, Hispanics, Caucasians, East Asians, in the United States, we got to do something about this. So to their credit, I mean, they started laying down the right bets. And NHGRI, as I like to say, has always just been kind of like the special operations force of NIH. They kind of bring them in, you know, to do very strategic, you know, tactical things that kind of the rest of NIH just was never designed to do. And so that was really, I think, smartly deployed to bring in far more budget to bear on this problem than NHGRI had on its own, right? And so I credit Francis and Eric in many ways of sort of having prioritized, you know, with Terry, obviously at the helm, you know, to place the right bets in terms of what we're going to see in the next five years. And in fact, when the large-scale sequencing program was renewed, this sort of sequencing of case control with diversity as a focus, you know, became kind of front and center, and of course now the precision medicine initiative, in fact, is oversampling underrepresented groups, including rural Americans, you know, who also have been underrepresented in a lot of this kind of work. So I'm very proud that, you know, as a community, I think we've marshaled the right scientific arguments, the right kind of critical pieces of evidence that policy makers needed to sort of say, look, for the next phase, we've got to do it this way. Do you think the availability of better structural variant data across populations will help us answer more questions? So structural variation, I would put into the broader context of an understanding of genome organization. We've, of course, known structural variations critically important in disease, right? Huntington's. Like first-genium map, guess what? Triplet repeat, probably going to be important, right? And so I don't think people have left structural variation out because we don't think it's important. It's just been a very practical issue of snip chips. Great for this time. You know, short-read sequencing? Great for this time. You know, now we're getting sequencing technology that is beginning to reach the price point that you can begin to deploy at scale. And I would almost leap a little bit from structural variation, as I mentioned, into genome organization and think about the beautiful insights we're gaining from high-C data and others, particularly even at a sort of level of organizational understanding that also brings tissue and cell type to bear, okay? So one real possibility is as we're building cellular maps of the human population, right? Because it's not even of the human genome. Rather, there's a variation from individual to individual in tissue to tissue. You might be able to really hone in on some of these incredibly quixotic differences you see across population, okay? So in thinking about, for example, the work of my colleague Esteban Bouchard on FEV1 and asthma and how rates of asthma are highest in Puerto Ricans than in African-Americans, than in U.S. whites, and the people with actually the lowest incidence of asthma are Mexicans, right? So you say, wait, how can Puerto Ricans and Mexicans who are both sort of Hispanic and a different end to the continuum here, like, how does that work? And one hypothesis is that, in fact, because of a sort of potential gene environment sort of triggers, the differences arise, but they may arise with components that have to do with differences in background ancestry, but play out through differences in cell types that are produced in response, right? So if you're just producing more dendritic cells versus something else due to a toxin, then you might respond very differently, and that could underlie some of the differences that we're seeing. So immune response is one that we're very interested in and we agree to which a much better understanding of genome architecture and variation can be relevant to that. And then, of course, in the context of monogenic disease, one of the things that has just stood out in ClinGen and other efforts is that databases for adjudicating pathogenicity also have these kind of biases due to individuals who've been screened and participated, but we're also seeing the importance of, you know, again, genome organization. We've known, of course, about the importance of genome organization in monogenic disease, and so are you going to be able to tease that apart now as well in terms of the context of what's different from population to population? What would you like to see in genetic medicine in 10 years? So I would say say what you will about GWAS, at least it was reproducible, right? And I'm a big GWAS fan, but, you know, the most important thing about GWAS is we learn to do it really well, okay, and could just go into population after population and just we're well-powered and knew how to do that well. Medical genetics has always been a rather artisanal process, okay, and partly because it required just incredibly good diagnosticians. I mean, that was the kind of real magic, right? How do you clone Victor McCusick, right? Tough problem, right? Victor's mind had a kind of map of how these things were organized, that was a trade secret to Victor McCusick, but of course, you know, it's sort of come through all the folks that have trained. That doesn't scale, right? So in order to be able to get to a real comprehensive understanding of the molecular basis of disease, you know, genetic testing is one important component of it, but it's not nowhere near all we need to do. So in my mind, the real missing piece is going to be comprehensive cellular catalogs and models at scale for disease, right? And ENCODE has taught us how to begin to do that. I don't think it's resolved every aspect of it, but it's begun to teach us how to do this kind of functional work at scale. The cellular atlas is another. So, you know, what would I love in 2030 while it's a cross between the cell atlas and ClinGen, where you have a just massive database of alleles, ideally tied to phenotype and records of individuals, but with also the ability to get cellular and biomaterials that allow you to study the biology and really integrate it in ways that do enable some of the emergent properties of machine learning to come to bear so that while I can't clone Victor McCusick, I can have as close to a Victor.ai that could be useful at least in beginning to triage much of this that today is still, again, an artisanal slash heuristic, you know, slash ad hoc process. Given the rapid pace of technology, what questions are you most excited to answer in the near future? Yeah, so I've become like many in population genetics and quantitative genetics enamored of the idea that therapeutics can be developed, and it's obvious the degree to which genome editing has now begun to enable curative therapies, which are magical, right? I mean, obvious if we could get to curative therapies. That's the holy grail. For many monogenic and non-monogenic diseases, curative therapies may be much tougher than we'd like. You know, I'm thinking here, for example, about to make the parallel between early GWAS. First big GWAS, of course, was complement factor age and age-related macular generation. A hundred cases, a hundred and some controls. Boy, it would have been great if that's all we needed. Turned out, there was like an N of Y. It was a complement. Maybe Blonde Hair and Melanesians was like the other one. There's some things that kind of screech out at you, and they're the obvious things, but then the tail sort of drops off fast. In what we're seeing in curative therapies, we look at Luxterna. Holy cow, right? I've got literally a cure for blindness for a subset of people who have a particular mutation. And I'm so confident in that, in fact, Spark Therapeutics, and we'll see if this stands a test of time, today pays out on outcomes. So it's like curative therapy with a money-back guarantee. Right? If it doesn't work, you know, you don't have the right outcomes at 30 days, 60 days, 180 days, then the payments, they go back to the payer. Rebates, they go back to the payers. I don't know if that's like complement factor H, or that's like welcome trust case control, and we'll just get much better and better and better at doing this. I mean, certainly capital spending, that you're going to get better and better at it. You know how many curative therapies companies have spun up in the last couple of years. I do think what is tried and true are the long-term therapies like PCSK9 inhibitors and other monoclonal antibodies that just on the logic of trying to mimic protective loss of function mutations have allowed us to make dramatic impact in the management of this lipidemia. We're still arguing about how we pay for it and on what scale we should pay for it and in what order they should be deployed, right? Nobody jumps onto PCSK9s if they haven't failed statins, right? Why would you do that? Nonetheless, I think we will get to that kind of wiring diagram in the next decade for sure. There are probably 10 therapies under development now that are all monoclonal antibodies for different dyslipidemias. And is that like complement factor H, or is that a roadmap so how we could build out additional strategies from extreme to the human phenotype distribution to get these incredible superhuman mutations that some of us are lucky to have into a therapeutic form that all of us can have? And that would be an incredible dream, right? If a decade from now there are 100 new therapies out there that are allowing superman mutations or superhuman mutations to be mimicked in people. And some of these may go beyond curative therapies, right? Like my colleague Rasmus Nielsen has this beautiful paper on mapping a gene in a population of divers that creates this sort of magical spleen that allows them to just dive deeper in ways that, you know, hard to think about the medical application. Maybe there is a medical application, but you can imagine, right, if I made a potential mimic of that biologically, that might be very useful and interesting. So I do think you're going to get all kinds of interesting things arise and in fact are outside of what our usual blinders are, right? I think if you were to ask somebody in 2004, 2005, by 2019, where will the majority of people with genetic and phenotypic data be? A few would have said, oh, they'll clearly be in 23 million ancestry. That's where they are today. I think we've got maybe 2 million in DB Gap, you know, and there's another half a million that are going to be sequenced as part of the large-scale sequencing program, UK Biobank, still you can't get anywhere close to what's happened in the direct-to-consumer space. On the medicinal space, it really will be an open-ended question about how we choose to pay for these therapies, how quickly we can get to them, because if we can get to them really quickly and do it super efficiently, then you can also drop the price because you're not trying to recoup $2 billion investment over failed drugs. Can you bring the drug to market for $50 million? It would be amazing, right? Then you could sell it far cheaper, make it far more accessible. We'll see, right? I would be thrilled to see far more competition in this space too because that's the other way you drive down price and access. Those are my goals.