 Well, good morning, everyone. Thank you for your indulgence. We delayed this a little for some technical reasons, in part because we want to make sure that we are capturing this entire day for video archiving purposes, because we're completely convinced that many people will be interested in watching this after the fact. And so we want to make sure that all the video feeds were appropriately working, which they are now. So my name is Eric Green. I'm the director of the National Human Genome Research Institute, and I want to thank you for coming. I hope you enjoy this day to learn about practical aspects, how to, with respect to whole exome sequencing and in particular analysis of whole exome sequence data. This is purely an effort on the part of our institute to educate and to help facilitate researchers using some of the most cutting-edge components, technologies, approaches in genomics. And we hope that this day, this is the first time we've put on this course, we hope this is helpful and certainly a sign that we're going to be very interested in getting feedback about because it's sort of the kinds of things that we would like to do in the future, and so we're very open to refining this. A credit for this course in terms of thank yous and acknowledgments mostly belong to Les Besiker, who you will hear from later today. Les is a branch chief in our intramural program and a very prominent human and medical geneticist and is somebody who's quite passionate about seeing these tools of genomics being applied to problems in human and medical genetics. And as a practitioner of this, felt very motivated to organize this course to help really see this technology reach even a wider audience. And Dave Kani both in our intramural program were very helpful in putting together the registration system for the website and some other logistical details associated with putting on this day's workshop. I'm going to spend the first 10 minutes or so basically, I'm your warm-up act. I'm just here to soften up the audience a little to get them ready for the real science, but I did think it was worth setting a context for this day, especially related to, as you can see from my title slide, what really is a profoundly changing landscape of genome sequencing. And for those of you who may not fully appreciate where we've gotten really in a relatively short period of time, I thought it was worth doing a little bit of a historic review. And again, this is all to try to put into context really where we are today. I can't help but probably start with just a little over a decade ago with the publication of the draft sequence of the Human Genome by the Human Genome Project. It was this data resource that in many ways changed the face of human genetics and genomics because it provided a reference information about our own genetic blueprint. That draft sequence was then finished to high quality and published and released and a year and a half later came sort of the end of the Human Genome Project with the completion of a reference human genome sequence. Back in 2003 when the Human Genome Project ended, our institute wrapped up a strategic planning process at that time which asked the obvious question of having now generated a sequence of the human genome, what's next? What's next in genomics? And we published that vision for genomics in this article in Nature on April 2003 which articulated many things that were going to happen we predicted and we hope to help facilitate in genomics in the coming years. I really want to emphasize one aspect of what we wrote about because I think it's so relevant to what you're going to hear about today because among the many things we wrote about in this strategic document was the call for technological leaps that seems so far off as to be almost fictional but which if they could be achieved would revolutionize biomedical research and clinical practice. And those of us who actually wrote this had enough hoods but actually go on and give us a specific example, the ability to sequence DNA at costs that are lower by four to five orders of magnitude than the current costs allowing a human genome to be sequenced for $1,000 or less. Now to realize how audacious it was that we put this call in press in Nature back in April of 2003, you have to appreciate what the cost of sequencing was right around then. So to put it sort of in perspective generating that first genome sequence as part of the Human Genome Project took something on the order of 10 years or so and ended up requiring thousands of people around the world to be involved in hundreds and hundreds and hundreds of sequencing instruments with a price tag approaching a billion dollars. And here it was, we were calling for the technological leaps that would successively shave off zeros off this number eventually delivering a human genome sequence for something like $1,000, which we actually thought was a nice rounded number and a ring to it. It also seemed like a very reasonable cost for a clinical test, for example, that one might imagine ordering on a patient some day. This became a bit of a battle cry for the field of genomics. In fact, the phrase the $1,000 genome was born out of that publication. And with that came a significant developments on a number of fronts. Number one, NHGRI put out a whole series of granting opportunities for scientists around the world to come and bring us their brightest ideas for new technologies for sequencing DNA. And in doing so, we gave out many, many grants over the last eight or so years all packaged under our genome sequencing technology program in route to the $1,000 genome. The good news is that was actually met with a significant amount of private sector interest such that multiple companies have been formed and lots of new ideas came up in the private sector, the infusion of additional resources. And I think both of those things in combination have led to some fairly spectacular technological advances. And in particular, those technological advances have resulted in the commercialization of not one, two, three, four, five instruments, but more than that. And just shown here are some pictures of the many, some of the many new technologies that are available, many of which shown here are actually commercialized or can be acquired by some means. And the great news here was not that we were beholden to a single technology but rather multiple technologies that seemed to be leapfrogging over each other with each little nuanced advance. The other great thing from my perspective and the position that I hold in overseeing much of the funding that we at our institute think about with respect to technological advances is that this is not the last slide we will show. In fact, I very much feel like someone who's sitting in an air traffic control tower at an airport. I have like nine planes on the ground, very safe, they're all good. But if I look out of the horizon, I know there's another technology that's coming and going to land probably in six months, maybe another one a year later, maybe another one two years and three years and so forth. The horizon is filled with yet newer technologies that in fact might even supplant the ones that are on the ground today. And all you have to do is follow the literature and realize that there's new technologies being reported at very early stages in journals like Nature and Science. And these are ones that might be hardened over the next few years that will supplant all the technologies that we currently are working with today. So be prepared for that, that whatever you learn today might be antiquated with in three or four years, but that's pretty exciting if true because it means that the cost will continue to go down and many attributes associated with those technologies will only improve. Now has this resulted in significant decrease in route and cost in route to the thousand dollar genome. And here we at least have some data to sort of go along with this enthusiasm. And the fact of the matter is at NHGRI we fund several very large sequencing centers and in exchange for the resources we give them to sequence genomes, they tell us every three months what their costs are by reporting how much money they've spent and how many genomes they've sequenced. We've cataloged that for about the past 10 years or so. And we all have all that data which we can then, we can simply have available to us to see what the trends are. So let me show you that data before I show you the actual graph. Let me remind you of one concept, Moore's Law. This might come up today in other speakers analogies. Moore's Law is a law the computer industry that basically says that compute power doubles every 24 months. And if you can keep up with Moore's Law, technologically that means you're just rocking, you're doing great. So that's the goal if you can only keep up with Moore's Law. So here's the data. This is the graph and the white line depicts Moore's Law. Notice that the y-axis is logarithmic. In orange is the cost for sequencing the human genome as generated at our large sequencing centers that NHGRI supports. And this dates back to 2001 during the human genome project. If you note that during the genome project and even the time beyond the genome project, while those centers were using that old, old, old, old fashioned technology of dideoxy chain termination sequencing, remember Sanger-based sequencing, that technology that actually got us the first genome sequence? Well, even then they were keeping up with Moore's Law. And right here they switched over to next generation sequencing technologies like you're going to hear about today. And ever since then, they have blown Moore's Law out of the water and they continue to do so to the present time. Now you may be asking me, where are we in route to the $1,000 genome? And what I would tell you is that we're just about there. We're not quite at $1,000, but boy, have we shaved a lot of zeros off of the original price tag. And it depends who you ask and it depends how you calculate. It depends what kind of quality, but sort of the common numbers that are given around for a whole genome sequence are 5,000, 10,000, 15,000, something in sort of that range, depending upon details. But of course, what this you're going to hear about today is actually a shortcut. The shortcut is let's not sequence the whole genome, but why don't we just first focus on sequencing the part of the genome that actually encodes for protein or the exons in the genome, otherwise known as the exome. So a whole exome sequencing has become a shortcut, which allows you to just focus on maybe the one and a half to 2% of the genome that we actually know how to interpret more readily and also is where all the protein coding sequences are. And that indeed has dropped very close, if not below $1,000. Now what is happening then with respect to sequencing and where is all this happening? Well, certainly sequencing is happening in a profound way around at medical centers, at research institutions like the NIH. But I would just point out that what is driving this as much as anything in a reduced cost environment is the view that genome sequencing becomes a bit of a commodity. And by that, I mean a lot of this might end up being performed in the private sector. If you go to the website, you'll find companies, such as a couple are shown here, but there are others that will offer complete genome sequencing. If you go to the literature, as I'm sure you do when you open up science and nature and sometimes see some of the advertisements, you will see whole genome sequences now being offered for under $5,000. Whole exome sequences being offered for under $1,000. And so I think the commoditization of genome sequencing indeed has begun. But the fact of the matter is that if all this was easy, we wouldn't need this workshop. If it was just about generating the data, well, then you guys could write a check to get the sequence data. You'd figure out how to use these technologies yourself in your own laboratories or outsource it to a local sequencing center. That's not the bottleneck at all, the generating data. Rather, the current bottleneck in genomics, which I think is what probably drew a lot of you to this particular workshop is the realization is that these machines just spew out huge amounts of data so fast and furious that it becomes extremely difficult to actually assimilate all of that information. And in fact, this is actually very new for biomedical research and is certainly being driven by these new technologies, where big data now has become so much a key part of what we need to deal with. And dealing with that becomes quite a challenge. Now, there are all sorts of obstacles associated with big data just handling that much data and storing it and processing it and pushing it from site to site. But one of the additional bottlenecks that you face that I suspect is going to be of great interest to many of you and you will hear about in the speakers being featured today, I refer to more as an informational bottleneck. Because the informational bottleneck is more related to these new technologies being able to sequence whole human genomes quite effectively or whole exomes even easier. And from that, you sort of can deal with the data as you must. But it's when you get down to filtering that data to sort of come up with your list of variants of your sample of interest compared to a reference sequence that then becomes an issue of, well, what do those variants mean and how do you go from thousands of variants, in some cases millions of variants, to trying to figure out in an individual human, what does that mean? And you ponder and you ponder and you ponder. And I suspect some of you are either already involved in sequencing exomes of individuals that might be patients here in the clinical center or individuals that you have studied or have DNA from. But certainly when you actually would go to actually look at these variants in the context of an individual patient, many of you probably feel like this. And I think that might be what has drawn you here today to find out from the practitioners at NHGRI how they are dealing with analyzing that data and trying to make sense of it. So I can't help but quote Harold Varmas, the previous director of NIH and now the head of the Cancer Institute, who wrote in a commemorative article in the New England Journal celebrating the 10th anniversary of the genome sequence. But where we are now, which is that physicians are still a long way from submitting their patients full genomes for sequencing. And I think you could substitute even their whole exomes for sequencing, not because the price is high, but because the data are difficult to interpret. So it was with this in mind and seeing these incredible opportunities for using these types of technologies but also recognizing the hurdles that are right before us that our institute seven months ago updated our strategic vision. I told you about the one from 2003. We published a new one just seven months ago when the 10th anniversary issue of nature that commemorated having the draft sequence of the human genome in hand for a decade. And we talked about lots of things in this article. If you haven't read it, it is freely downloadable at our website. And there's the URL if you wanna quickly jot that down. But we very much are looking at these opportunities in genome sequencing as if more than anything else is sort of driving the future of genome exploration. And I'm not gonna describe the strategic plan. This is sort of the organizing figure from it, if you will, which really talks about a progression of going from basic knowledge of how genomes are put together to how they work, but then using knowledge of genomics to understand the biology of disease, which I suspect many of you are interested in, and then more clinically oriented view in the audience or on the web are thinking about how to use genomics to then use that knowledge to advance the science and medicine and eventually to actually improve the effectiveness of healthcare. And we've sort of depicted in various stages when these various activities might be taking place either in the past or in the future. And everything in our view, at least our motivation is to push things right as much as we can so that eventually we really can use genomics to improve the effectiveness of healthcare. But specifically, what I am imagining many of you are interested in, certainly NHGRI is very interested in, is actually applying this and thinking about opportunities for the future. We talk about this actually in the strategic plan in a little text box, which we called Imperatives for Genomic Medicine, because in this particular box, in the examples we gave, we really predicted what we think was some of the earliest fruits of genome sequencing and genomic medicine in the future, some of the lowest-hanging fruit, if you will. And I suspect that some of you are very much involved and very much interested in some of these same areas. What are some of these areas that I know you're gonna hear about today in some of the talks? And I suspect many of you are working on as well. Well, it's using these fancy new technologies, for example, to study Mendelian or single gene diseases and traits. We've made tremendous progress understanding several thousand of them now at a molecular level, but it still remains several thousand for which the molecular basis is not known. And increasingly, just by sequencing exomes or sequencing genomes of individuals with very rare genetic disorders, you can identify using methodologies you're gonna hear about today, the genetic basis for those disorders. In contrast to rare genetic disorders, we also know the complexity of more common genetics disorders which are genetically complex, but there's been great strides here because of other technologies that have been developed with respect to methods that allow you to do genetic association studies that have now provided literally thousands of candidate regions in the human genome that we now know likely harbor variants conferring risk for very common, medically important disorders. And now, of course, there's great interest in interrogating those regions in greater detail, doing a complete inventory of all variants in individuals with these different disorders in different parts of the genome, or probably doing it in a genome-wide fashion, and trying to figure out what the genetic basis of complex genetic disorders are, and this is certainly gonna require the kinds of technologies you'll hear about today. And I believe some of the lowest hanging fruit, indeed, using genome sequencing and some of the earliest clinical applications will come in the arena of cancer genomics. You'll have at least, and one speaker I know later today will be talking about this. This is exemplified by the Cancer Genome Atlas, which is NIH's major thrust in terms of a large-scale program exploring different types of cancer and the genomic rearrangements associated with those cancers, but there are many projects being conducted around the world by many funding agencies and many institutions where, again, the power of genomics is being brought to cancer biology, I think, in very productive and profound ways. Well, in thinking about these technologies, it doesn't just end with generating the data on these diseases, be it cancer, be it rare diseases, be it common diseases. With this is gonna come all sorts of other issues. It's certainly we as NHGRI are very interested in, but I think increasingly the research community is gonna get more and more interested in, especially as this gets closer and closer to clinical deployment. And we're thinking about these things and you're gonna hear about some of these things, just dealing with communicating all this information about genetic variants to individuals when it's about their family members or themselves and how that information is gonna be communicated by the healthcare system is of great interest, even at a research level now to sort of pave the way for implementation down the road. Thinking about how these sequencing technologies might be used and applying them to newborn screening, which is routine now for a battery of genetic disorders. One can imagine a more efficient path whereby individuals might end up getting their exomes or their genome sequenced shortly after birth and having that information guide future medical decisions for such an individual. And of course the whole arena of pharmacogenomics where we are increasingly learning about the genetic basis of drug response and perhaps leading to yet more examples whereby genetic testing might pave the way towards more rational selection of medications for individuals. And there's a whole research agenda here that one can think about that's being greatly accelerated by whole exome sequencing and whole genome sequencing. But finally we must recognize that if all this data is being generated eventually in a clinical setting, for now even in a clinical research setting, that's a lot of information that's gonna come spilling out of these machines, that'll come spilling out of the computers and eventually will be available to practicing healthcare providers. And whether they can assimilate all that and understand that are big question marks. So we certainly have heard loud and clear during our strategic planning process that we are very much need to help facilitate the development of more intelligent information systems that will utilize knowledge that's being rapidly accumulated about clinical genomics and about the genomic basis of disease but making it in an environment that will allow practicing healthcare providers to access it and help guide what they tell patients and how they treat patients based on that information. This will very much intersect with an increasing reliance and interest in electronic health records and one can certainly imagine genomic information feeding very nicely into electronic environments such as electronic health records. But with that must also come electronic decision-making tools that would allow very busy healthcare providers to get the most accurate information about variants that are being uncovered in their patients and knowing which ones are actionable, what to do sign about and which ones are perhaps not actionable. And very little of this currently exists but certainly something that we at NHGRI and other institutes are beginning to think about. So that's what I wanted to tell you. I would just wrap up by pointing out that when we published our strategic plan in February in the editorial that Nature wrote and they said very many nice things about the field of genomics. They did remind us that the 2001 human genome sequence was always a milestone on the journey to better medical care. And I suspect everybody here is involved in research activities that are very much pointed towards eventually leading to having genomics play a role in improving medical care. But really where we were in 2001 and in some ways where we are right now is very much a destination. It's all pointing forward but we must recognize that there's a tremendous amount of additional work that's needed, certainly needed then, even needed now if we're gonna practically use these kinds of technologies to improve medical care. But at the interim we I know want to very much see these technologies used to advance biomedical research at places like the NIH. And we want to see these technologies and their use of these data reaches many individual investigators and trainees and people around the world which is why we try to get all this information out, the data, the technologies, the methodologies, the protocols and the analytical pipelines. And so what we're gonna tell you about today through a series of talks are many of these steps along this continuum including some of the steps associated with doing some of the issues, some of the ethical issues and consent issues around doing genome sequencing with human subjects because that also becomes very important to consider in the broader context of what is going on. So that's what you're here about today. And with that I'm gonna turn this over to the first speaker which is Jim Mulligan who's gonna now tell you some how-to aspects are related to whole exome sequencing. So thank you for your attention and please enjoy the day.