 Well, thank you very much for inviting me to speak at this exciting meeting. It's been really interesting to hear the outcome of many of the microbiome projects and hear the summaries from the people who did the work. So my title is Composition and Dynamics of the Human Virome. So I'll briefly introduce the planetary virome, the human virome and a little bit about approaches you can take to studying this problem. Then I'll focus much of the lecture on what is the composition of the human virome and how does it change over time with an emphasis on gut. And then at the end I'll touch on some open questions in future directions, introduce a little more data and talk about challenges going forward. So the global virome is remarkable. In seawater there are judged to be something like 10 to the 7th viral particles per mil. Viruses outnumber their hosts by a factor of something like 10 in seawater. And near the end I'll be arguing that that's probably true in gut also. Estimates multiplying this out, there's something like 10 to the 31 viral particles on Earth which make viruses numerically the most successful biological entities on the planet and also some of the coolest. So this picture has been developed by our own leader Proctor of HMP, Forest Roar Curtis Suttle and many other workers. And this picture shows the EM, some of the viruses that Lita caught in one of her studies of marine phages. The human virome is similarly gigantic. Perhaps most familiar will be the persistent latent infections that are extremely common, herpes viruses for example, papillomavirus. Many of the herpes viruses infect almost all of the people in the US population. Papilloma also very common, HIV and HCV something like 1% of the population. We're also very familiar with the viruses that infect us transiently like cold viruses or flu. Sometimes we deliberately infect ourselves with viruses like the vaccinia live smallpox virus vaccine strain. But still this is really the tip of the iceberg. The human genome itself is composed of something like 8% fragments of DNA that are discoverably remnants of retroviruses that infected the primate lineage on its way leading to humans. And so here's a bit of one of the human chromosomes shown here. Here are some genes. This tract that says LTR is the long terminal repeats of retrovirus. And you can see there are many, many of them in this bit of the human genome. So our genomes themselves are in part viral. And on top of that our bacteria and archaea have enormous numbers of viral predators, bacteriophage predators. Accounts suggest something in the range of 10 to the 10th to the 10th to the 11th viruses per gram of human stool. And although these numbers I think are kind of loose, there's no question that the size of the communities is totally gigantic. So obviously one of the things that's really launched this area has been the development of the new deep sequencing methods. And so you can apply those to studies of viruses in a variety of ways. One is to hunt for new pathogens. Another is to watch viral evolution. For example, HIV quasi-species evolving in response to a new introduction of a new antiviral agent and developing resistance. You can characterize where integrating viruses or integrating into genomes. And you can characterize complex uncultured communities of viruses. And so what I'll be emphasizing for much of the talk is going to be this last one. Using deep sequencings, mostly alumina high-seq, to characterize the human virome and see what's out there. And many, many strong labs have contributed these kinds of studies. Forest roar, subtle, Lipkin, Beatrice Hahn, and many, many others. So what is the composition of the human virome and how does it change over time? So we've been studying this in the context of our human microbiome demonstration project focused on diet, genetic factors, and the gut microbiome in Crohn's disease. It's a pleasure to acknowledge my co-PIs, Gary Woo and Jim Lewis, Gary is here. Our main project, our main study of Crohn's disease longitudinally in pediatric cases is just finishing up and we'll tell you about that soon. But in this talk I'm going to focus on a set of papers from a really strong graduate student, Sam Minot, characterizing the virome piece of the gut microbiome. And we've been fortunate to have funding from NIDDK, HMP, and others. So we've been purifying viruses from stool and it's kind of a general principle in these sorts of microbiome studies that exactly what you do has a very strong effect on the interpretations you can make at the end. And, you know, your choices have a lot of effect on the outcome. So the way we've been doing this, we've been highly purifying viral fractions. We've almost certainly been losing some of the viruses along the way. I'll be telling you about DNA viruses. For the most part, we haven't put much effort into studying RNA viruses. So we're only looking at a slice of the population, but because it's highly purified, we can point to each read and say that probably came from a virus. Even if it doesn't look like anything we've seen before, even if it doesn't look like anything in the databases, that's probably still something sequenced that was inside a viral particle. So what we've been doing is taking stool, homogenized several filtration steps, then either purified by banding and cesium chloride or centricon ultrafiltration, then chloroform de-rupt your membranes, degrade unprotected DNA with DNAs to get rid of contaminating DNAs, and then break open the capsids and get out the DNA from inside the viral capsids, and then either 4, 5, 4 or high-seq metagenomic sequencing. So this shows an example of one of Sam's studies. This is a 12 healthy subjects cross-sectional, about 40 billion bases of high-seq data, assembled using one of the de Bruijn graph methods and then comparing the contigs that we see. So here's a contig spectrum with the length here of the contigs and the sequencing depth on the y-axis. So you can see we got lots and lots of contigs. This is all 12 subjects, some of out to like 100 kilobases. Some are circular. They close as circles, which suggests we probably got all of the genome for those guys. So you can see you've got a lot of ones that are probably complete. The rest are a mixture of complete and partial. When we try to align these to databases, we see very little resemblance to anything that's in the databases for the vast majority. So that's kind of illustrated down here at the bottom. We did find one animal cell virus, a vanilla papilloma virus, type 6B, not unexpected, very high coverage. Everything else we saw had either low resemblance to bacteria phages in the database or no resemblance to anything at all. And so this shows some of the phage data where we're lining our reads and you can see little bits of alignment. Gray means perfect, red means mismatched. And you can see we're seeing patchy, sketchy matches, and these were the best of them. So really most of what we're seeing in these kinds of samples are new. And between humans, we see very little resemblance also. So really striking, either mostly phage that we can identify, one eukaryotic virus, and then a lot of stuff we don't know what it is for sure, that we're guessing most of it's phage. Something like 500 to 1,000 types per individual. Okay, so we can do a little better assigning the genes if we align genes within this dataset. We've got so many genes now from the high-seq data, open reading frames, that we could ask who resembles who, and we could find something like 25% of phage orphs have a match to a database orph with a very permissive threshold. But then within the datasets, 58% have at least one match, 30% identity or better. So that allowed SAM to then ask, well, how about arrangements, multi-genic regions? Do we see conservation of gene order and gene type into cassettes, which is well known to be a structural feature of bacteriophage genomes? And indeed we do. On the right, you can see several of these cassettes where you can see multiple gene types being similar, and these are derived from different subjects. The subject numbers are on the left here. And here's a cassette with several different gene types seen over lots of subjects. So the mean proportion of contigs covered by cassettes looks like something like 27%. So when we look within the deep sequencing data itself, we can start to see some forms of order. We've also got several cases where we've sequenced whole stool DNA, mostly bacteria, assigned the genes and looked at the kinds that are there. And then similarly done the same for viruses or virus-like particles, assembled, assigned worse to the ontology, and then compared the two. And that's shown here. So you can see that bacteria in yellow often devote a lot of their coding capacity to carbohydrate metabolism, amino acid metabolism, translation, ribosomes, phage, very little if any. Viruses, very little if any. Viruses on the other hand in red show a lot of genes, a high proportion of their gene content devoted to replication, recombination, and repair. So viruses are parasites, and you can really see it coming through in the metagenomic data in a comparison like this. Okay, so one thing we wanted to investigate was how does the human gut virome change over time? And in part this is investigating at the same time the question of why are humans so different from each other? What is going on with the gut virome that might explain this? So we studied one human individual for two and a half years by a dense time series analysis of stool samples. And so that's sort of diagramed here. We have a whole bunch of time points we studied. And importantly, Sam studied a number of time points twice. He took the same stool sample and did two separate purifications of viruses from those samples, sequenced and analyzed. So we have an internal measure of within sample variation, or within time point variation. And then we can compare that to between time point variation and ask if that's larger. So we purified viruses, 57 billion bases of sequence from high seek, we assembled with a De Bruijn graph method. We also did high seek analysis of stool DNA for three widely spaced time points. So we have a look at the bacterial communities at least at three time points. Assemble, we get something like in the range of 500 contigs, average of 82 fold sequence coverage. So the contig spectrum is shown here. Again, it's on the left, it's contig length by fold coverage. Some of these we now get up to million fold coverage for these small circular guys. We're now well over 100 kilobases for some of these viruses. And now more of a larger proportion seem to be circular. So we're getting complete sequences for at least some of them. The middle panel shows jacquard index on the y-axis that's asking for resemblance in community membership versus time interval with long time intervals between time points on the right and short ones on the left. And you can see that even after our longest time points in the range of two and a half years, we're still seeing something like 80% of the community membership still the same as with the earlier time points. So for the most part, the virome is hanging around. We're seeing the same forms all through the time series studied. We did a form of rarefaction analysis here. We took our contigs and then asked how many reads or how many samples did it take to get those contigs? So at least for the major contigs we were picking up, we seem to be pretty saturated. But I now think this is a little misleading. I don't think we did that well, as this might imply. And the reason is that we've recently done sequencing in these same communities with another method using the wonderful Pacific Biosciences single molecule sequencing approach, which brings in its own set of issues. But at least the biases are different compared to Illumina. And we acquired 138 megabases of single molecule sequencing data. And when we compared to the Illumina contigs, we found only 30% overlap. So this is just in the last few weeks. We're still trying to put this all together into a single picture. But it seems quite clear that our numbers are headed upward for the different kinds as we layer in more sequencing methods. And this just shows that some packed bio contigs can link to Illumina contigs in this sort of dot matrix representation. And we have examples where Illumina contigs have linked several packed bio contigs. So the picture on this viral community I think is getting better. But it does really illustrate that how you measure has a pretty strong effect on what you find. And so we're trying to put together a hybrid assembly with this. And who knows, maybe further methods to try to get a more complete picture. So with these contigs in hand and this longitudinal data, we're in a position to be able to ask, well, how did these communities change? As over the two and a half year period studied. So one is simple accumulation of base substitutions. By different, we broke it out into different viral groups. Some changed very little. The temperate phages didn't seem to evolve very fast. But this group, the microviridae, which includes Phi X174, evolved very fast. These are single-stranded small genomes in the five kilobase range, single-stranded circular DNAs. It turns out it's known that single-stranded DNA viruses evolve more like RNA viruses really fast compared to the double-strand DNA viruses. And this shows a phylogenetic tree of some of the microviridae we caught. And these show time series for several of these where the first samples at the top and time is proceeding going downward. And you can see accumulation of base substitutions in the genome. Now, the champions here were over 4% substitution over the time series studied. And that's taking account that's subtracting out the within-time point variation. This is an increase over time. And so this is sort of cool. In the microviridae taxonomy, we see some species separated by smaller values. 3.5% base substitution differences distinguish species for some members of this group. And we had a couple of viruses that changed more than that. So you could say we're coming into the range where we were watching speciation events in the gut virome over the 2.5 years that we studied this. Oops. So we could also see longitudinal changes associated with the CRISPR systems. So remember, we have bacterial DNA sequences. We have phage. So we could see that six viral genomes were targeted by bacterial CRISPRs. So this is just an example of one contig and CRISPR spacers from bacteria targeted the virus in several positions. Probably most of you are familiar with the CRISPR system. It's akin to RNAi. The bacteria have spacer sequences that are transcribed and then used as recognition elements to destroy incoming genetic parasites. We had one example of a possible viral escape mutant where we have a CRISPR targeting a sequence in blue. That sequence goes away in the population and an orange sequence takes over that has a point mutant in that CRISPR recognition site. So that might have been an escape event. So that's one form of change associated with the CRISPR systems that we're seeing. Another, the bacteria phages, the viral contigs themselves have CRISPR systems. Some of the phage are encoded in CRISPRs themselves. And in one case, we had a CRISPR contig and one phage, a CRISPR spacer in one phage, that targeted another phage in the same person. So it was as though phages were fighting it out with each other using the CRISPR system, which was pretty cool to see. And so we could see longitudinal change associated with the CRISPR arrays also. We could watch them change over time for a couple of the phages. So a third form of variation is associated with the diversity-generating retro elements. These are these amazing reverse transcriptase-based targeted hypermutagenesis systems, first discovered by Jeffrey Miller and co-workers. So in bordatella, phage BP1 has a problem, bordatella changes its surface coat periodically. But the phage hypermutagenizes the gene for its tail fiber recognition moiety, so that once in a while, when the bordatella phase varies, their phage is in the population that actually can recognize the new protein and then proliferate. And so we saw systems that looked like millers, where there's the major tropism-determinant gene with this hypervariable region, the tail fiber protein. Nearby is a template region, identical region, and nearby is reverse transcriptase. And it turns out the mechanism involves transcription of the template region, error-prone reverse transcription of that copy to make a DNA copy, an absorption of that DNA copy into the MTD locus. So in just model-independent studies where we tracked along these phage genomes, looking for regions of high variation, these things really fell into our laps. And so this shows the colors or reads, if there was no base changes, the columns would be the same color as in these template regions, but in these variable regions you can see extreme levels of variation. So extreme that in those short regions, every read's different from every other read in some cases. And these were associated with this kind of target template structure and a nearby reverse transcriptase gene. So we saw systems that looked like millers MTD, we saw others that were hypervariabilizing other types of coding regions, including IG family proteins, which is pretty cool. The vertebrate immune system is hyper mutating such genes to make the T cell receptor and immunoglobulins. Phage are hypermutagenizing these using a completely different method based on reverse transcriptase. And so we could see these reverse transcriptases. They're a specific subset of the reverse transcriptase family. We could see them quite abundantly in these phage and associate them with these hyper mutagenesis mechanisms. One of these seemed to be active in the longitudinal data that we got. So we could look at these kinds of DGR, diversity generating retro element systems, identify them, and for one of them we could say that we were seeing it be active over the two and a half year time period that we studied. Others, it's not clear if we didn't have enough power to be sure they were active or whether they in fact were inactive, which raises an interesting question of whether the mutagenesis mechanism may be biologically regulated and maybe there's something to figure out there. So circling back, why do humans harbor such huge viral populations and why are humans so different from each other? Well, it's one of the most central messages for many of these microbiome studies has been that humans have different microbes that are colonizing them, a great deal of individual variation. And so naturally the predators on those microbes, the phages, are likely to be different also for that reason, at least in part. But something we can add from this study is that at least some of the phage seem to be changing really, really fast so that when a virus colonizes a human it diversifies pretty quickly and over the lifetime of an individual there'll be a lot of diversification. The phage populations do seem to be pretty stable. So we think that this rapid change may be another piece helping to understand why human viral populations look so different from each other. So let me now go on to the open questions and future directions. One key thing in these sort of viral studies over many labs and sort of many related issues is the question of finding new viruses and associating them with diseases. And so a number of really strong labs have carried out a lot of these sorts of studies, Lipkin, Craig Virgin, Darice Wang, Relman, many others. And so this is a big challenge because just because some viruses there you don't know about cause and effect, as several speakers have mentioned. Did it cause a disease? Did the disease state make the individual susceptible to the virus? Did some third thing cause both of them? You really don't know. But obviously there's a lot that could be done to streamline and develop this process. So this just shows an example from our work where we have fecal shotgun data on two severe combined immune deficiency kids and one healthy kid. One of these kids was having GI problems. And when we sequence whole stool DNA, we find 20% of the reads are this little studied boca virus, a parvo virus. It was only discovered in 1995, benign as far as anyone knows. But boy, this is one heck of a lot of virus in this kid's gut. And the kid was having GI problems at the time. So this is just one of many possible illustrations of the general problem of associating a virus that you see by molecular methods with a disease state. And then other questions, what are the relative abundance of phages and their hosts and human feces? And then dynamics of predation following from that. So what I'm going to tell you about now is sort of a sketch for a calculation. We're trying to work our way through this. A provisional picture would be as follows. So from purified phage DNA in one human, the deeply studied one I described, we can see most of the viruses or at least a lot of them. So we can recognize them even though they mostly don't look like anything you've ever seen before. We also have whole stool DNA from this individual. So we can ask what proportion of the whole stool DNA is comprised by the viral DNA in the purified sample. So we find something like 5% of the DNA total looks like phage DNA. Though this is headed upward. We haven't added in the pat bio data yet. So however the genomes of the bacteria and phage are much different in size, maybe differing by a factor of 100. So multiplying by that, we infer that the phage outnumber their hosts in gut by about five fold, at least as a provisional first look. And again, we have more work to do to make this all real. So if that's true, we can start to think about predation rates. So material moves through the gut continuously. So if phage are at a constant abundance, then they must be created at a rate to replace the ones that are getting washed out as material moves through the gut. So let's say the transit time for this individual was one day. I'm making this up. An estimate that there are 10 to the 10th bacteria per gram of stool, then there must be something like 5 times 10 to the 10th phage per gram. So if a phage burst contains 100 phage, then you have to kill off 5 times 10 to the 8th bacteria to supply that, the number of phages that you're measuring that do seem to be at a steady state. So that would say that something like 5% of all bacteria are killed per day by phage predation. So again, all these numbers are very soft. And an ongoing project in the lab is to try to make these numbers more real. But it gives you a sense that maybe a substantial fraction of all the bacteria in the gut are getting killed off by phage daily. And then the last open question I want to sort of or new direction I want to kind of introduce, we could call virome epigenetics. As you'll all very well know, the human genome is subject to CPG methylation, recently hydroxy methylation was discovered and generated a great deal of interest. Phage totally dwarfed that. There are dozens of kinds of covalent DNA modification that have been reported in prokaryotic viruses. This lists a few of them. They get very exotic, including alpha-putracenol thiamine, dihydroxy-pentyluridine. Here's glucosalated DNA, which is a characteristic of phage T4. There's a gigantic zoo of DNA modifications that are present in bacterial viruses. They're known to block attack by nucleases, contribute to gene control. We're guessing they have additional functions also. So what's cool with the PAC-Bio method is that you can read out some of these modifications by their sequencing method. So this just illustrates their technology, its single molecule, immobilized polymerase, the template tracks through. What they notice is if you have a modification, this red base here, you can get a characteristic change in the kinetics of incorporation. And different chemistries seem to have different effects on the interpulse interval and peak heighten. So they have a way of reading out at least some of the modifications that are present in these viral genomes. So we've started to look through this for the deeply sampled subject I described. We're seeing various kinds of modifications shown by the bars on these sequences. For some of them, we can assign recognition sites. It looks like we've got one form at least that's new because it's on G and none of the other ones I showed you there were G. And as a general summary, looking over all the context, it seems like 80% of all the viral context are showing signs of covalent DNA modification. And so this looks like a really, really cool area to begin to explore and try to understand the functional significance. OK, so that concludes what I wanted to tell you. I introduced the global virome and the human virome. There's one heck of a lot of viruses associated with our bodies. The human virome, most viruses hang around longitudinally and the carefully studied individual we looked at. But some specific viruses changed a lot over time and that may help understand why humans are so different from each other. And open questions in future directions, assigning viruses to disease efficiently using metagenomic data, dynamics of predation, and then lastly viral epigenetics. And so it's a great pleasure to again acknowledge my colleagues Gary Woo and Jim Lewis and Sam Minot let a lot of the viral work that I described. And Tyson Clark at Pac-Bio has helped us a lot also. So thank you very much for your attention. OK, so quickly I'd like to ask you your current thoughts on the role viruses play in horizontal gene transfer of microbiota. Oh, viruses are big players in horizontal gene transfer. There are typically three main mechanisms in prokaryotes, transduction, transformation, and mating. So temperate phages move genes around. They're known to be medically important because integrating phages can carry toxins, adhesions, that modify the phenotype of their bacterial hosts. So they're one of the major agents, but not the only major agent. OK, let's thank Rick and all of the speakers from this morning. We just have a few quick announcements that are important. The first is that at 1.30 sharp, the afternoon session will begin. So we're now into the lunch and poster. Speaking of the posters, today are the even number presenters and tomorrow are the odd numbers and I think Friday are prime numbers or something. So one of the benefits of being here physically in person is that you get to vote on what you consider the best poster after you've seen all of the posters. And there's an NSA proof secret compartment on the back of your badge with a green piece of paper that is to be used for the voting. So again, I want to personally thank the speakers. I had a really good time hosting this morning session and we'll see you at 1.30.