 So, the title of my talk is Team Irrigated, Progress Towards a Modernized Integrated and Collaborative Breeding System. Reading as an exercise is really just a commandeering of an existing natural system, and what we want to do is leverage the components of that system in order to push populations, in this case rice, towards phenotypes that are useful for mankind, rather than just simply letting natural selection take its course. To start off this talk, what I wanted to do is extend the idea that this is not by declaration. I would not claim that we are modernized, integrated, or collaborative at this juncture. We are working towards that. This is an aspirational title, and it's also extended by invitation so that we can begin to work together across platforms, across disciplines to think about how we can move forward in a unified way. Again, not for the sake of collaboration, force and collaboration sometimes becomes difficult, but with an eye towards synergy. What I mean when I say that is our aim as a breeding team is to create a transparent breeding system that allows a user to come and simply see the whole process. What data we collect, how we collect that data, how we use that data to make breeding decisions, and how we measure our success, and how we objectively measure whether or not we're making progress in that system. So I hope by the end of this talk, you can see in your minds the breeding system that we're trying to build. Just a few token slides in the beginning that is not lost in this audience, the importance of rice. This comes from National Geographic's Seven Billion Project, and the point I like to make, especially with non-rice audiences, is that more people live inside that circle than outside of it, and that to me is often surprising. You also notice the warm colors inside the circle indicative of lower incomes, and most people in the circle get most of their calories from rice. And so rice is an extremely important component of my life and what I do and why I'm here at Erie. Irrigated rice in particular produces 75% of the world's global rice production, so it's an incredibly important ecosystem to be focusing on. One thing we often point out when we talk about the state of irrigated rice in the world is the simple fact that varieties are old, and yields are more or less stagnating, and we need to change that. There's lots of different variables that drive that. Breeding is one of them. But this comes from Gary Atlin, the area-weighted average age of rice varieties in farmers' fields in Asia is 28 years. And so the big question on everybody's mind is why are farmers not turning over varieties faster? Why are they not gravitating towards newer varieties that have been released in the recent years and leveraging the benefits that come from those? There's probably a million different reasons why a farmer chooses or not chooses to grow one variety over another, but from a breeding perspective the question on everybody's mind is what's our rate of genetic gain? In theory if we're making genetic gain then the question is why is that genetic gain not making it into farmers' fields? If we're not making genetic gain or not making genetic gain for yield fast enough then perhaps then there's some responsibility born on the breeding systems to increase that rate of genetic gain in order to increase the adoption rate of new varieties in farmers' fields. So we've taken some opportunity over the last several months to try and estimate what our rates of genetic gain are in the breeding program. These are published papers, all of which attempt to estimate the rate of genetic gain in rice. They all more or less coalesce around a similar metric, but these are based on what we call erratrials and I'm a little bit uncomfortable with the way erratrials attempt to estimate genetic gain. Part of the challenge is with an erratrile you take varieties that are old and you take varieties that are new and you grow them together in the same field in the same year in the same season and then you measure to see how different they are from each other. Most of these publications are underpowered. There's not a lot of seasons of data that underlie them. Two seasons at one location. They only use released varieties, so you only ever get to see a sample of the breeding program and that happens to be the sample that graduated. You don't ever get to see the entire breeding program and often these are regressed on the year of release and so what happens is because these are regressed on the year of release there's lots of variables that influence whether or not a variety makes it to market or doesn't make it to market at different times. It seems more appropriate to regress it on the year the variety was born, on the year that the cross was made. So that's what we've tried to do in order to overcome the limitations of what's currently in the literature to measure genetic gain. We've attempted to try and measure genetic gain in the breeding program using all of the data that we had available to us. And so Jessica was instrumental in helping us put together this analysis. We basically went into our historical data. If any of you ever question the value of having your data in B4R, this is it. If all of this data was scattered across Excel spreadsheets and shared folders and other things, there's no way I could have brought it together and did this analysis. So there's value there. We pulled 102 studies from the last five years and that included about 17,216 irrigated breeding lines tested in over 23 different environments. Jessica put together a two-stage model that allowed us to estimate the genetic value of these lines. She can give you the details on how that model is constructed if you're interested. But the one that's really important here is that we included in that model the pedigree relationships. So we're not just looking at performance. We're looking at performance and relatedness to other lines. And what that means is, if you're of rice variety in the analysis, we get information from your performance, but we also get information from the performance of your siblings and your cousins and your parents and your grandparents and your descendants. And that allows us to make better inferences about your genetic value as opposed to your raw phenotypic value. So we refer to these often as breeding values when we create a blood measurement that's measured using the pedigree. And what we've done is we took all of those breeding values from all 17,000 of those lines and we regressed them on the year that the cross was made, not necessarily the year that the variety was released. And what you see is kind of a curvy pattern of genetic gain historically throughout the breeding program. In fact, we have had conversations among ourselves too about the idea that the word genetic gain itself is a little bit of a misnomer. Genetic trend is probably a better description because again, this is just evolution. You're just pushing a population in one direction or another. And of course, you can go in any direction you want. So if we look from 1960 all the way up to 2014 and you can see the error variance here indicates so there's sort of less data here, not surprisingly and there's more data up here. But the genetic trend is not necessarily linear. It's not this nice straight up line that we would necessarily want it to be. And this is not surprising at some level when you sit back and think that yield is not always been the primary driver of selection decisions in the breeding program. Sometimes it's been grand quality. Sometimes it's been, excuse me, sometimes it's been disease. Sometimes it's been plant type. But if you were to try and take this data and create a straight line across it in order to estimate some global rate of genetic gain between 1916 and 2014, that straight line gives you about 13 kilograms per hectare per year, which is a rate of genetic gain of about 0.3%. So genetic gain in the breeding program has been historically low. We started to get clues into this when Jessica was working with J.K. Lada and looking at the yields in the long term experiment of different varieties over time. And this is basically uses all of the breeding data that we have available to us in the breeding program and came up with a very similar estimate and a very similar picture. So we feel fairly confident that this is what the reality is. And so now we begin to ask ourselves, so why isn't this a nice strong linear trend upward? There's lots of different reasons for that. As we begin to dig into this, the first question we need to ask is, is the breeding working? So genetic gain as a concept in plant breeding or in evolutionary biology is not made every year. Genetic gain is made every cycle. So that means that every time you have parents and then they have kids, when kids become parents, that's a full breeding cycle. And so what we wanted to do was estimate how much genetic gain had been made per breeding cycle in the history of Erie, at least in the irrigated program. And so when we regressed the data on what Jessica calls the equivalent crossing generation, which is a proxy for breeding cycle, we see actually a fairly positive linear trend of about, whereas at 92 kilograms per hectare per cycle. So breeding works. We're making, we're turning over cycles. Every time we turn over a cycle, we are making gain. This is also evidence that we're measuring yield with reasonable accuracy. If we weren't, then we wouldn't be making gain. So this is strong evidence that we are making gain per cycle in the breeding program. So why hasn't the genetic gain per year then been higher? And part of the reason why is because we haven't been turning over breeding cycles as quickly as we could have been. In the beginning, when people were crossing with DG Woojin and Peta and creating IR8 and then recycling IR8 to create new varieties, and then recycling IR36 to create new varieties, breeding cycles were fairly short. But then when you take these breeding cycles and regress them on the crossing here, you see that there's a period of time here where we had, at some point, lines that were reaching cycle six and cycle five, but then we more or less stayed in this zone until the early 2000s. And in the early 2000s, we started to cross with newer material again. And the breeding cycle began to accelerate. So we had short little cycles here. We had about a 20, 30-year period where we didn't really advance the breeding cycle. And then in the last 15 years, we've completed another breeding cycle. So what we feel like, the conclusion that we can draw from this data is, since we know we are making genetic gain every breeding cycle, we just need to turn over more breeding cycles more quickly if we want to be able to make more genetic gain and change the trend in that initial graph so we can start to see more positive linear trends over years. So we've been breeding for 45 years, and we've completed about six breeding cycles. If you average this, which probably isn't appropriate because they're not quite evenly distributed, but if you average this, it's about 10 years per breeding cycle, and we want to cut that in half, basically, in the breeding program. So now that we have an understanding of where we are with our data and with our germplasm, what we want to do is begin to characterize it and then begin to act on that germplasm in a way that allows us to push the genetics of the breeding program forward. I like to use the word galaxy. Other people don't like it. I've discovered that in the last year, but I still like it, and I still want to be an astronomer when I grow up. And so we've created a galaxy of lines. This is those 17,000 lines. The spatial distance here is genetic distance. We have a target from the Rice-CRP documentation that says that by 2020, or maybe it's by 2022, you have to look at it. We need to be making 1.3% genetic gain per year. That's what it says in the CRP, which means we need to quadruple our rate of genetic gain compared to what it currently is. The CRP doesn't specify for how long we need to make that rate of genetic gain, so I've arbitrarily declared 10 breeding cycles to be our primary target. In the new strategy, which we'll go into in a moment, a breeding cycle takes about four years to complete. So if we want to complete 10 breeding cycles, that's about 40 years of breeding. And if you talk to Gary, if you talk to others that are familiar with the global landscape of rice production, the next 40 years are going to be the ones that are really the most important. 40 years from now, people will be eating less rice. Water will be more expensive. The earth will be two degrees hotter, and the whole system will be different. And we will have to change with that as it changes. But for the next 30, 40 years, we really need to do a better job of increasing the rate of genetic gain in farmers' fields. So 10 years, for me, is where we're aiming for. It looks like, based on some reasonable analyses, that we can probably go out further than that. So here's our data. So here is our breeding values that we calculated on the entire breeding program, 17,216 lines. The average yield is 4.4 tons per hectare. And our variance, or standard deviation, is 0.4 tons per hectare. Our heritability is about 0.4, and we have an effective population size of 39. None of these things were terribly surprising. I think this is, in fact, even having an effective population size of 39 is pretty typical for most breeding populations that I've seen in other crops and in other places. And so now that we've identified our germplasm and began to give it a little bit of an identity, I wanna talk a little bit about why I believe the kind of breeding philosophy that we're putting forward on population improvement is gonna be effective. At the end of the day, again, the title of my talk last year was Nothing in Plant Breeding Makes Sense Except in the Light of Evolution. And so what we wanna do is we wanna take this galaxy of lines, and we want to drive evolution. We wanna drive change and allele frequency over time so that we can change the average value of the phenotypes that we care about in that population. I'll mention, too, before I go into the details here that I'm focusing on yield because that's the data that we have that's robust in B4R. Yield and flowering time are the two data sets that are more or less collected all the time every year without fail. But we are interested in looking at the mean and variance of grain quality. We're looking at the mean and variance of quantitative disease resistance. Anything that is of interest to the farmer and by virtue of that of interest to the breeding program, we can do the same thing. We're just using yield as an example because that's where our highest quality data is. At the end of the day, like I said, this is just evolution. These are Darwin's finches, right? So Darwin's finches get blown by your hurricane to the Galapagos Islands. They begin to proliferate and without introduction of new finches from the mainland, recombination takes place in this intermating finch population. Selection is applied by the environment instead of by a breeder. And that selection changes things like beak shape all the way down to big fat beaks that allow them to break seeds. And those big fat beaks came from ancestors that did not have big fat beaks. And so this is what evolution does. This is how evolution changes phenotypes, quantitative phenotypes in populations over time. There's evidence of why this works, sort of the mechanism of why this works when you look at human height data. So human height has a heritability of about 0.8, which is really high. It means that it's under high V genetic control. And they have mapped using over 10,000 individuals about 50 significant QTL for human height. But when you take into account the alleles at all 50 of those significant QTL, they're still only able to explain about 5% of the variation in human height. But when Yang et al used a genomic selection model, which basically uses all of the SNPs and all the markers, not just the significant ones, they were able to explain 45% of the variance for human height. And the reason why is this. So this is a figure that comes from a paper that Jonathan Pritchard's lab published last year, earlier this year. And what you see here is all the markers in the human genome, every circle there is a SNP, and they are ordered by their P values. So the most significant SNPs for human height through over here, and everything else in the genome that more or less doesn't have a measurable effect on human height is to the left. But what he shows here is that for all the significant QTL in the human genome, the average effect on human height is fairly high. Any one of them will change your height by 1.4 millimeters. But when you look at all of these small effect QTL for which we don't have enough power to truly estimate, the median effect size of all of these SNPs is 0.14 millimeters. So it's 10 times less impactful on human height, any one of these things. But taken in aggregate, all of these small effect QTL in aggregate across the genome explain more of the variation for human height than all of the significant SNPs combined. And we believe that the same is true, for example, for yield in the rice genome. There's 39,000 genes in rice, and all of them probably have a minuscule effect on yield. And we might be able to find a few QTL for important traits that are related to yield, but those won't explain much of the variation. What we want to do is manipulate in aggregate all of the small effect QTL. And the only way to do that really is through recombination and phenotypic selection and eventually with genomic selection. So this is the genetic mechanism underlying why we believe population improvement approach to breeding is effective. This has been effective in corn. I showed you these graphs last year. Corn has adopted this strategy early on and has been moving forward with this. This is effective in soybean. All both of these crops have been breeding for yield, consistently making gain every year, have never hit a yield plateau. Even in soybean, where genetic variation is notoriously low, if you look across all North America soybean, across all the companies, Pioneer, Monsanto, everyone, all of their soybean germplasm descends from just 35 lines. And there's enough genetic variation in those 35 lines to consistently drive improvements for yield year in and year out, for at least in soybeans case, 40 years. In case you believe that rice is different, this is a figure from a paper from Brazil that was just published in October. And they basically, here's their population, just like we had our galaxy, they had theirs. This is yield on the x-axis here. You can see that when they started, they had a very large variance in a very low mean for yield. They selected the best lines. They allowed them to create progeny. That tightened the variance up quite a bit, but it improved the mean. They selected the best lines. Again, they allowed them to create progeny. Measuring their progeny, these are now the grandchildren of these lines. You see that their variance has stayed more or less the same from cycle two to cycle three, but their mean yield has improved quite a bit. So from 2005 to 2015, in just 10 years, they were able to increase the average yield of their breeding program from about seven and a half tons to 9,000 tons, 9,000 tons, nine tons. And this dotted line here is a check variety that they grew each time. So you can see here, they say 70% of their program before they started doing population improvement was lower performing than this particular check. After three cycles of selection, more than 50% of their breeding program was higher yielding than that particular check. And this came out last month from Brazil. So we really do believe that this isn't really even a philosophy, it's a law of evolutionary biology and all we're trying to do is commandeer that principle to be useful to a breeding program. So we've done the same thing. Here's our galaxy again. Here's our mean, 4.4 is our mean. We have then pulled the tail of these lines, cross referenced them with what's available in terms of seed inventory and also filtered out some of the lower reliable estimates. We ended up with a set of 86 diverse rice parents in the elite breeding program with a mean yield of 5.26 tons. So the mean yield of all of these lines that we've now picked is about, what was that, 0.8 tons higher than what we would consider to be, this is, I can't really call it cycle one. It's probably more like cycle six because cycle one was really back in 1960, but what we're doing is maybe a bit more concerted population improvement starting now. If you take these 86 parents, they have a higher mean yield. Their effective population size is about 33. So we still have quite a bit of variation among those lines and our intention is to consider these to be elite founders, if you will. We'll cross them with each other. We'll create progeny. Those progeny will then be crossed with each other to create new progeny and so on and so on. And over time, we expect that the average yield of the program will go up over time. And the beauty of, I mentioned this actually a little bit in Bangladesh. I didn't get a chance to talk about it as much as I wanted, but the beauty of average comes into the law of averages. So my boss at Pioneer, when I was at DuPont, said something to me once that I never forgot. He said, you know, Josh, most things are average. And that sounds funny when you say it because that's the definition of an average. Most of anything is average. But when you think about it, when you sample randomly from a program, you're gonna hit the average most of the time. So there's a real beauty in changing the average performance of the entire breeding program because what happens is after cycle three, for example, in this example here, any breeding line selected at random from the breeding program in 2015 would perform as well as the best of the best breeding lines available to the program in 2005. So it does not mean that they did not have lines that yielded this much. Back then they did. But because we've moved the average of the entire breeding program, any line selected at random is gonna fall somewhere near that average. And since you still have a significant variance, you still have opportunity to continue to select and continue to improve, whether it's yield or pick your favorite phenotype. So this is where we're at. We have our 86 parents. The biggest criticism I usually get when we talk about this is, Josh, that's crazy. You're gonna run out of genetic variation. I said, geez, that's right. Let's have a look at that. And so Juan David, actually, from working currently in Tobias' group, was helping me do some breeding program simulations using a simulation tool that Jessica found that was published by John Luke Janik and a group from Japan that was published in, I think, earlier this year. This is a breeding program similar to this based in R. It's super easy to use. We've been playing with it quite a bit. And so what Juan did was he took a starting population with an effective population sizes of 10, 20, 50, and 100, keeping in mind that our effective population size is 33. And he simulated the breeding program. He said, how many cycles can we do before we start getting so much inbreeding among these 86 lines that we run out of genetic variation? We showed that it's actually a fairly straight, linear trend right up until cycle 10. And then from cycle 10 to cycle 20, we're still getting a fairly linear trend. And then maybe out here by cycle 30 or 35, we might begin to see some kind of an inflection point. Keeping in mind a breeding cycle takes us four years to complete. So to even get to this point where we would start to see significant plateau in of our genetic variation, we're already talking about 80 years into the future. And you can see the other thing that's really encouraging here is that it almost is independent of how much variation you start with. We started with an effective population size of 10. And that's the orange line here. And you can see actually it's not that distinguishable from a much more variable population. Excuse me. So given the fact that we have an effective population size of 33, we believe very strongly that if we're able to drive population improvement in our current selection among these elite lines, that we'll be able to see a trend similar to this response to selection in the breeding program. So what I'm hoping to do, and this is part of what I think is going to be really important moving forward, then is to give some kind of meaningful identity to the elite breeding germplasm. So up until this point, if you asked any breeder what the elite breeding germplasm, what give me the most elite lines you have. Well, he's going to give you or she's going to give you the most elite lines they have based on the information that they have available. But in the absence of a closed breeding system where you're doing population improvement based on a foundational set of genetics, what's elite changes all the time. And there isn't really a strong structure around something that's classified as elite and classified as either non-elite or diverse or exotic. And so what we have here is this is the indica component of the 3,000 genomes material. And then what's white here is the elite breeding germplasm as we've defined it in the irrigated program. And you can see actually that this galaxy is different from this galaxy. There are yellow dots in here. So there are elements of the gene bank that overlap in genetic space with the elite breeding germplasm. But there's some fundamental differences between these two germplasm resources. And I think both of these germplasm resources have tremendous value that Erie can leverage as an institute. Obviously, I can't understate the value of the gene bank. But then having this elite breeding germplasm as well allows us to, we should be putting the same level of investigation, characterization, and inquiry into this germplasm as we are into this germplasm. There's some fundamental differences here. This is static. In fact, we aim for it to be static. That's the point of XC2 genetic conservation is to avoid genetic changes and to conserve what you have. This galaxy is dynamic. It is under active selection pressure. And it is under active evolution. And alleles are changing. And frequencies are changing. Some alleles are going up in frequency because they have value. Some alleles are going down in frequency because they don't have value or they have negative value. Some alleles we lose just due to drift and due to inbreeding. The whole genetic constituency of that population is going to shift and change over time. And that's because the genetic diversity in this galaxy is primarily under the control of recombination, not mutation. So we're getting new combinations of alleles all the time, every year. And those new combinations of alleles are giving us different phenotypes. And that's not happening here because there's not active population crossing going on in the gene bank. There's tremendous value to be captured in rare alleles of really large effect here in the gene bank. But in the elite program in this elite galaxy, the value capture is not rare alleles, but rather the enrichment of elite haplotypes in this galaxy through recombination. And this is actually the genetic mechanism that allows us to improve the average performance of all of the lines in the elite breeding space over time. So what we're trying to do, this is what Partha and Rosalind should be in here somewhere, they are working together to try and define something that we're calling the irrigated core panel. We're trying to create a set of lines that we can have reliable seed sources for. We want to sequence them. We want to phenotype them. And we want them more or less to be a proxy for the elite breeding galaxy. And this is a data set, a germplasm resource that people can then sink their teeth into to begin to compare how the elite breeding germplasm differs from the diverse germplasm. We can compare elite breeding germplasm over time to see how it's changing in this dynamic galaxy. We've put together a workflow here where we've taken those 86 top breeding lines that we referred to based on these breeding values. We filtered them for, the 86 have been filtered for only breeding values that have a decent accuracy. We pulled all of the pedigrees, all of the lines and all of the pedigrees of these 86 lines. And then we put them through this workflow where we eliminated the land race founders because in this particular workflow we're looking for the eerie breeding lines that were founders of these 86. We filtered all of those for the lines that contributed most to those 86. We filtered them to just the fixed lines. We filtered out the genetically redundant lines. And we ended up with a panel of lines of about 373 eerie breeding lines that contribute disproportionately high to these 86 lines that have the highest breeding values in the program. And so we call these our elite irrigated founders. Our 86 lines here are our current eerie elite parents. And then the other thing we did is we went back to these pedigrees and we took them all the way out to the pedigree endpoints and we asked ourselves the question, all the way back in 1960, before we had the breeding program started, what were the breeders crossing with and which of those lines that they were crossing with contributed to what we now know to be the most elite germplasm that we have available to us. And we've identified 185 land races with GIDs in the gene bank that have contributed disproportionately to these 86 founders. And if you consider only the land races that contributed to 5% or more of these founders, so that would be four or more of these 86 lines, that number drops down to 88. So we're still in the process, we've all we've really done right now is identified who these lines are. We're in the process now of hunting for seeds that will allow us to put together a meaningful panel that can be sequenced, that can be phenotype, that can be investigated, that describes or can act as a proxy for the elite breeding galaxy, at least as it exists in 2017. This will have to be updated say, I would say at least once every five years which is once a breeding cycle because unlike the diverse germplasm, our galaxy is under active evolution and is constantly changing. Just to wet your palate a little bit, these 15 lines here are the land races that we identified as contributing most to the 86 lines that we've found to be the highest yielding, not surprising DG Wu-jen is at the top which is the parent of IR-8 that contributed the Green Revolution semi-dwarf allele. So we have this list and we're anxious to work with folks in the strategic innovation platform to understand more about the lines that we see in the pedigrees of these high yielding 86 lines and begin to characterize them a bit further. So knowing our germplasm is one thing and having a handle on how much you have, how much variation you have, how much progress you think you can make is important but none of it means anything unless you have a breeding strategy. And so what I wanna do is now walk you through the breeding strategy that we've set up in order to help us drive the improvement of these lines over time. I think of a breeding program like a funnel. It starts out wide and there's a lot of material in there and then it goes down over time as you impose selection and then a few things pop out the backside. So in our funnel, not surprisingly you have to make crosses and so we make our crosses here. It takes about a year to go from a cross to an F2. We are working with PB operations and Tobias's group and others to make sure that we have capacity to verify that these are true F1s that are going into the breeding program and not self to the female. Again, a renewed focus on elite by elite parents. These 86 lines are in the crossing block right now. We have created combinations with them and we're excited to continue to create those combinations over the next couple of years. In our particular case, after you've made your crosses, you have to go from an F2 to an F6 at some point. So this is the line fixation process. We use the rapid generation advanced greenhouse which means it takes us another year to get to a fixed line that we can realistically evaluate. Coming out of the greenhouse, we have this what we call LST. I don't really like the name, but other people use it. So if I change the name, it causes confusion among collaborators. So we keep the name, but basically what it is is a seed amplification. So you get these fixed lines, these F6s, you just grow out one row from an entire panicle. And it's in this spot right here where we have, this is the mouth of our funnel. This is the recombinance that are coming out of the program. It's at that point that we impose our marker assisted selection. And also we do some phenotyping for highly heritable traits like maturity and plant height and maybe some grain type and plant type. That's year three. Year four is the OIT class. This is our stage one yield trial. This is probably where most of the action is happening. We have about 2,000 lines in this class. Every line that gets into this class is targeted for genome-wide fingerprinting. We attempt to do genomic prediction at this level. We're looking for superior yield in a multi-location trial. Right now our OIT class this season was grown at two locations. We intend for it to be grown at three locations next year. We also, at this point, then connect with Nessay and the Grand Quality Lab to begin to characterize these lines for their Grand Quality characteristics. And this is also the point at which we begin after we have all that data to select parents for the next breeding cycle. Once we've done our OIT class, then we have an AYT class, but here's a big difference. So here's two of the differences that you'll notice if you were familiar with the previous program. There used to be a PYT class. It was sort of a yield trial in between OIT and AYT. We've eliminated that class because we don't feel like it's necessary to test for that many years before entering into the variety release pipeline. But rather than coming up with one AYT class that's based on performance in the Philippines, what we want to do is divide the world into distinct breeding zones and then send out breeding zone specific AYT classes. In other words, I want to send the material to Bangladesh that's most likely to perform well in Bangladesh and the correlation between performance in Bangladesh and the performance in Philippines may not be very high. And we want to make sure that we account for that. We have obviously the Philippines is another breeding zone and these breeding zones can proliferate consistent with our funding. In these breeding zones, we're looking for superior yield and appropriate grain quality and locally resistant to local diseases in each of the different breeding zones. And how then do we take this OYT class and find out who are the best lines for Bangladesh? Who are the best lines for Philippines? Who are the best lines for pick your favorite breeding zone? This is where genomic prediction comes in and we're interested in using predicted performance in the breeding zone of interest to then enrich that AYT class with OYT lines that are likely to do well in that geography. And the way we are currently thinking about implementing predictions is if this is our OYT class of 2000 lines, we want to pull an unselected, what we call an estimation set out of those OYT lines. Those, these estimation sets also get sent out to the different breeding zones. So we would use the Bangladesh phenotype data from this estimation set and the genotype data from this estimation set. We would use the genotype data from this OYT class and we would then be able to identify which members of this OYT class are most likely to perform well in Bangladesh, for example. So there's multiple different entry points for genomic selection. We'll talk a little bit more about that in a minute, but this is probably the most significant one. So this is it, this is our breeding system and this is how we want to drive a product development system that's gonna allow us to create germplasm that's specifically adapted to the places where Erie's mandate is most relevant. The other thing I want to mention here is that this system here has a product cycle of five years. So it takes us five years from the day we make a cross to the day we submit lines to MET and we have a breeding cycle here for about three and a half years. So about this point here after three and a half years is when we start to select parents. And we think that the acceleration of our breeding cycle is gonna be one of the biggest drivers of increased genetic gain over time. This is just to highlight how germplasm moves. So the idea is that new germplasm goes out to the breeding zones of interest. Data is collected, that data funnels back into Erie, estimation sets are developed, and then sent back out again. And this cyclical model is what we feel like is gonna be the most successful. Other people call this a hub and spokes model for breeding. So this is more or less the breeding system that we're trying to build that we believe is gonna allow us to drive population improvement in the elite breeding germplasm. But having a strategy, having a system isn't good enough. We need to innovate along the way. The one innovation that everyone here has undoubtedly heard the most about is RGA. This is one of the things that Bert Collard was really strong in initiating when he was in this position. RGA is basically accelerated single seed descent in the greenhouse and it allows us to take this line fixation process from five years down to one year. And that acceleration in the breeding cycle is gonna be a big driver of genetic gain. The other thing is just cost. Before when we were doing pedigree selection, we would grow out F2s and F3s and F4s on hectares and hectares of land. We would select those lines. Doing it in the RGA allows us to do that at a more minimal cost. And it increases our selection accuracy. So the ability to identify the best genetically superior F2 is almost zero. Your ability to identify a genetically superior F6 at a multi-location yield trial is very high. So by evaluating fixed lines and not segregating lines, we've also increased our selection accuracy. All of those things are part of the breeder's equation. So again, the motivation for the RGA is it's cheap, it's short and it's accurate. And we can get about three and a half generations a year, but that's not an innovation that we created. That's an innovation I inherited from Bert. What we want to do is in addition to replacing all the pedigree selection with RGA, we've really only been able to get with the system we have set up about three and a half generations per year, especially with the medium and late maturing material. And so we took a hard look at our protocol and we found out that in this current protocol, we would plant the seeds in these tiny little trays. We would allow them to flower. Then we would wait about 35 days until they're mature and then we would harvest them and reseed them. What we found out was that we can harvest them 14 days after flowering or 21 days after flowering and the germination rate is exactly the same as when we harvest them 35 days after flowering. And so what we can do now is we saved ourselves about 12 weeks a year by harvesting the seeds early and then recycling them. And so now we can get four generations pretty squarely in a year in the rapid generation advanced greenhouse. Another innovation has been what we're calling this modified field nursery. Some people call it a field RGA. I'm trying to shy away from that term because there's not necessarily anything rapid about it. It's just cheap. And so the Bangladesh Rice Research Institute, for example, instituted a field nursery because they didn't have the resources, the funds, the capital to set up this big greenhouse like we have. So they said they wanted to do it outside. The challenge with Biri though is they do it all with manual transplanting and there's just no way I could have people manually transplanting 200,000 seedlings at two centimeters spacing. And so what we needed to do if we wanted to move this outside and out of the RGA greenhouse was to look at different establishment methods to try and figure out what was going to work best for us. We did that. We looked at these broadcasting. While the pictures are here, we looked at mechanical transplanting with the mineral trays. This was actually manually transplanted because you don't have this transplanter but we transplanted them at the same distance as the transplanter would have. We had mechanical transplanting with the Kubota. We had some manual broadcasting. We created raised beds and we also looked at direct seeding. What's crazy is when you look at the actual cost to create an F6 line in the RGA greenhouse, that cost is $2.34. By doing this outside and by eliminating all of the overhead associated with the structural components of the greenhouse, instead of $2.34 per F6 line, if we looked at raised beds, which was sort of the establishment method we liked the most, we're looking at about 10 cents. So we have just taken the cost of creating an F6 line and we've reduced that cost by 96%, which is going to have a huge impact on our ability to take that cost savings and reinvest it in other places in the breeding program, particularly in genotyping, which is where costs tend to go high. But we do lose some speed. Doing it outside doesn't give us the advantage of doing it rapidly in the greenhouse. But what we did find is that if you look at these different methods, so the raised beds are right here, on average the raised beds flowered about 20 days earlier than their manually transplanted counterparts. So we believe that once we have the system in place, some agronomic interventions will allow us to shorten the breeding cycle even further to where we can get three or maybe four generations in a year. So the hope is that we can capture all of the speed of the RGA, but maintain the 96% cost reduction. In addition to fixing lines, we're also interested in innovating testing. We have this little team we put together called the Precision Testing Initiative. We're looking at the effects of establishment, drone phenotyping and border rows on selection accuracy. And the idea here is that we'll be able to eventually decide on a standardized mechanism for yield trialing that gives us all the right information for the cheapest possible cost. In particular, we're excited about working with Steve to collect phenotype data using drones and then use that data to make breeding decisions, which I understand the GRC group has done quite a bit of work in doing that. They have tremendous variations. So what we wanna do is look and see if we can find the same kinds of patterns and accuracies in the breeding germ plasm, which has a much narrower range of variation. We figured out that in any given time in the breeding program, we are housing about 18,000 different genetic entities that all have different values, even depending on what information we have about them. And so she's putting together a seed storage protocol that allows us to make sure that we have enough seed of the right seed and that seed is stored for long enough that we can recover the value from it if we need to. Markers in molecular breeding is last but not least. We are working tirelessly to make sure that our mass is effective. So we do marker assisted selection in that LST class. Part of making sure that we have effective mass is making sure we're enriching the right alleles of the right low size. So we've done a lot of work. Part of the BSWAS has done a lot of this work to identify and work with Ricardo and work with Bo to identify the alleles and the low side that are going to be the most effective in the geographies that we care about and we've put together what we call trait packages. So the idea is that XA5 and XA13 together in a single genome is a trait package and that trait package is effective in different areas. Likewise, 23 and seven is a different trait package and it might be effective in different breeding zones. So we try to target marker assisted selection to enrich the alleles for the genes that are going to be the most important to us in our breeding zones of interest. The HTPG project with Aang at the lead and Tobias helping to interface with InterTech has given us a SNP genotyping platform that allows us for most of these alleles, we can get, we have SNPs for them at InterTech and we simply sample the LST class and send that material to InterTech for genotyping and then use the genotype data that they send us for marker assisted selection. Damian has done a lot of work to develop what we call marker QC metrics that allow us to understand the relative value of a marker. You can think of a marker as a diagnostic tool and what that means is there's a certain false positive rate and a certain false negative rate. This is the false positive and negative rates that we've calculated for the markers that are on the 10 SNP set that we would be using in the program. We're interested obviously in improving some of these where there might be problems, but you see error rate of 0% is actually really good. That's the positive metric. And then utility here is basically how useful is this locus in the breeding program? So if XA21, for example, which is a really important bacteria relief plight gene is at low frequency in the breeding program, what that means is it's extremely useful to us because we can begin to select for it. And you can see here these utilities, this would be the inverse of frequency. If they have 100% utility, that means their frequency in the breeding program is basically zero. So there is a lot of low hanging fruit that we can leverage to begin to increase the frequencies of these key genes in the program. And Damian and Rulix are putting together what we've been calling a TI pipeline or a trade integration pipeline that allows us to take some of these here, for example, that are basically zero frequency in the breeding program and back cross them into the most elite parents based on our breeding values. And then we can use those converted parents in our crosses which then allow us to do forward breeding and get those alleles down in this area, which is more preferable. And then we can begin to work with them in a forward breeding context in the breeding program. Last but not least, genomic selection is a big part of what we wanna do. We feel like genomic selection can help us make several different kinds of breeding decisions. We talked about that in the creation of these estimation sets. This is it. This is estimation set zero one is the first estimation set we've ever created. It's been genotype for us. And you can see here that each of these different colors is a different family. And this estimation set has been grown in Philippines wet season, Philippines dry season. We're sending it to Bangladesh to be grown there. And estimation set zero two is right on its heels. Each year we'll pull out a new estimation set out of the breeding program. We've used this one K, this one K Amplicon panel that Tobias and Juan and others in that group have put together for us. It's a thousand snips across the genome. You can see the relative positions here across the genome, the gold and the green are snips that are a really high frequency. The purple and the blue are snips that are at lower, I'm sorry, lower informant information content. And so you can see that we have gold and green and purple and blue scattered pretty evenly across the genome. When you look at the 11 families individually an estimation set zero one, even this family here which is actually pretty closely related would not have a lot of polymorphism and even that family has 300 markers across the genome that are polymorphic which is more than enough for us to do genomic selection. So we believe that this genotyping set, the SNP set that's been put together for us is gonna be useful. It could be useful per se, we can just use those SNPs but I think it'll be more useful in an imputation context we'll be able to use that day to day impute to a much larger number of SNPs and get some pretty decent prediction accuracies. Those accuracies will come in next year's presentation. We're also working closely with the B4R team and the Gobi team to basically set user requirements for those software tools so that we make sure that the software tools that we're creating are aligned with the kinds of activities that we do in the breeding program. And the Gobi team just showed me the other day actually their attempt at making this kind of dashboard that we visualized in August with a bunch of different tools all in the same place that breeder can go to and begin to use to do different activities in the breeding program without having to have them scattered across different places. Lastly, we are working actively to disseminate this information across our national partners and this is done through TRB extension efforts which has been largely governed by Sande Khatiar in India and Rafiq in Bangladesh. And we've received really good responses to the people that we've brought this kind of information to. So that's it. My closing statement is just simply that the landscape is changing. The scientific landscape is changing. The physical, the political, the funding and the need landscape is changing. And we need to change with it if we want to keep up.