 Right, so this is supposed to be a title of my short talk. I thought I will start by introducing myself and telling you why you might want to catch me later and talk to me about miscellaneous things that I do. I'm kind of a weird species of scientist, you'll see why. Perhaps I'm most interesting, not as a scientist at all, but as a guinea pig. I think I can substantiate the claim that my own genome is the best characterized human genome on the planet among genomes which are available. So it's been selected by the American National Institute of Standards as the genome to have a standard genome. And then if that picks your curiosity, I'm happy to comment some more. So you are the one like you. This is a book cover of my dissertation which was done in the field of machine learning and artificial intelligence. And you see that nowhere there it talks about biology. And so my one, maybe most known contribution from the point of view of machine learning to biology is this paper which I'm a lead co-author of and I'm happy to say it's been cited something like 7000 times in 7 years. Pretty good. So Rudy and I think a couple other people mentioned the system polyphen. In 30 seconds this is a server and method which takes human protein and tells you whether this amino acid substitution is going to change the function of this protein. So it's the best, the most used system. I can tell you it's only slightly better than the coin flip but it's the best. So at some point still being machine learning guy, I run into Mark Kirchner whom I believe most of you know and he suggested I come to the department, spend a few months, find application for machine learning. It's been many years. I'm still at the Department of Systems Biology at Harvard Medical School. And I actually do some of my own experiments which all look very much the same. I do in vitro fertilization, usually it's with frogs, sometimes other species. And I stand there with a clock watch, with a stopwatch very carefully timing and killing embryos at certain points and then I ask what are those made of? RNA, protein, other molecules. And try to reason about systems. So this is I think a two-year, maybe three-year-old story now. But again I just wanted to give you a glimpse of. It's already... So there are two interesting points here which are going to become relevant later. If there is a system where for several time points measure in parallel RNA and protein just from those measurements alone with no isotopic labeling you could fit two parameter simple model and recover synthesis rate and degradation rate. And so we've done this for genome scale at that point and redoing this now. I don't think I need to tell you that synthesis and degradation rates for proteins genome-wide are important and useful in many ways. One kind of systems embryology application that I'm proud of I don't think there are many questions in biology where you can ask a question which results in the number, right? So here's the question. If I take embryo which is a tadpole swimming, breathing, fish with a beating heart I put it in the mixer, I reach in and they take a molecule. Was this molecule of protein synthesized in the embryo or was this deposited by the mother? And so this graph here gives you an answer that after 50 hours in this after fertilization in this organism about 30% of protein molecules are made new. All right, so this tells you something about the systems how embryo prepares protein versus makes protein on the fly different proteins specialized for some cell type versus housekeeping and so forth. How do you measure the new? I'm sorry? How do you measure the new? That will take me about an hour to explain. So yes, please come to my poster for this. Then this is to say I've been fortunate to be part of the team that developed the so-called in drops, a droplet barcoding protocol which allows you to look at expression of RNA at a single cell level. All right, so when you take a system and you get mixed population of cells what comes out of this experiment whatever you do it with is a giant matrix of cells by genes. What do we do with it is a big question. So you could try to reason about clusters in this population but they're not necessarily clusters. You could try to reason about the whole manifold but what kind of manifold? Those are questions which are ripe for mathematics I think. So we were very lucky to get Caleb Weinrab, the student who developed the system of representing such data that really enabled this field I could say. The first thing I asked you to take the mental image of is a system called spring where every point is a cell and then two cells which are very similar in the space of gene expression are linked by spring. So it relaxes and becomes a manifold representing something. So what did we do with these tools? Well we went back to my favorite kind of experiment. We just took time points of developing embryo. We stuck it into the single cell profiling setup and I must say this is a teamwork of two very talented students and two lab heads and you get this sort of data. So what is this messy slide tells you? There are ten time points in development where each time point has many thousands of cells. It's a giant, I think it's probably the biggest to this day data set of its kind of 130,000 individual cells. What do we do with this data? So one idea is that you take adjacent points and connect into developmental tree. Very roughly it works like this. You start at the last point. This is last time point. You find a state. You choose a parent state in the previous time point and then you iterate. That gives you a tree. So that's great, the tree is very helpful. But you can also take all of 130,000 cells put it into spring and get this kind of a manifold representation. We can stop doing embryology with this data. Because if you think about this all sorts of reverse engineering of biology can come out of this data set and this is what I'm going to be busy doing for the next 15 years, I think. Because first there's time dimension. You could see that stem cell somewhere here starts and then a few hours goes out on the bridge and differentiates. You can ask which genes are important. You can extract cascades of transcription factor chains and so forth. You could look at the branching point and ask how decisions are made. You can simply ask which genes co-vary, absent and present together. You can begin to get all sorts of systemic information like protein complexes out of this. Finally, thinking back to the beginning of my talk you can get synthesis and degradation rate and turn RNA which we can measure which we cannot and will not be able to measure in the single cell level for decades probably. So with this in mind we thought we could spend 15 years analyzing this data but it's already a very rich resource. Just a month ago I organized what we called single cell jamboree. Janelia HHMI generously sponsored this meeting where we invited 26 mostly very advanced senior people in the field of embryology, specifically embryology of frog to come be trained to use our tools and look at this data. So there are jamborees which happen for genomes when people just sequence drosophila for example and look to understand what genes mean. Now this is the first ever similar effort where an expert in kidney, in blood, in neurons went and said all right recognize some genes I don't recognize others, annotate and give us bona fide sets of cells which are differentiated or differentiating. And so again in itself it's going to be very rich resource just very briefly the whole effort were organized after like this. So this is our tree. This is a giant poster that we had on the wall there and each expert received a small sub tree here took all of the cells which are tens of thousands of cells which are just between two adjacent time points and popped this substructure out in the browser in order to understand that particular snapshot in the process of differentiation and wrote a short assay about this. So what are recognizable markers what are novel markers are there any new cell types this sort of information falls out of that. As if frog data was not enough we compared this whole tree in frog to a matching effort also from our department in zebrafish and asked about a conservation are cell types conserved are the same genes used in the same way in this process do cell types fall out of the tree through the same root and so forth. So again there are some surprises here. So at this point I think it's clear that with all of this information my talk looks to you like a paper which unfortunately I think becomes kind of too popular in major journals now. I call it revolutionary technology enables unprecedented deep and expensive data set which confidently reveals a new depth of our ignorance about embryogenesis. So I would appreciate a chance to convince you otherwise but that will take some time and effort at the poster today. I think I can open for questions. Your previous slide was the zebrafish and the zebra state independently processed in other words are they independent trees? I don't understand the question independently dissociated, independently collected independently went through the setup oh yes absolutely independently analyzed through the process which I sort of illustrated here. So you just cluster every time point from the marker genes in that cluster understand which cell type is that and connect backwards in time and get to trees. So both trees show a perhaps surprising characteristic of very early divergence in other words there's... This was one of surprises absolutely right so there was a huge effort by community to create this kind of ontology and we compared so mostly consistent but many cell types look to be emerging much earlier than we thought. So one of the goals of the new Chan Zuckerberg Institute is to find the human cell atlas to find all the cell types in humans and that seems to me maybe you've already done that with Xenopus and Zebra Fisher? Yes I think so also mind you that they're working with dead people that creates for certain bias in cell type. So how many cell types... I mean how... I guess it depends how you define a cell type. This is Kirchner's question I really hate it this is like how many colors are there right you can zoom in and things would cluster and cluster some more and cluster some more depending on which genes you would look at. So you can kind of very rapidly get lost in this universe of representation so I think it can be defined rigorously. We are talking about 300 cell types mind you this is early development this is gastro-neural and so you know there are probably thousands of cells we never see in this thousands of cell types we don't see here. So how do you know the linkage between the cells I didn't get that? At different stages how can you link them? Oh so about any two cells or any two groups of cells you can just ask are they similar in the space of gene expression? So you just say this sample which was taken two hours later than this sample contains a few cells which look a lot like these cells yet different and that's how you make this decision. Okay so in this analysis you lost the location of the cells in the embryo right? Yes we did. It's super important there's several labs aggressively working on the ways to inject the plasmid which will allow you to reconstruct the true lineage by barcoding but it is lost in this data you can recover a lot of it because there's a very rich set of in situs for these embryos and so just going by markers you could do you could register the cell based on its expression profile using several in situs. So let's suppose that you're having a symmetric division you have a symmetric division going on already at this stage. So how do you now since it's a symmetric the properties also are symmetric so how can you link now to the cone? I don't think we can do it perfectly. I mean this is a good example of things we're not going to see until we sequence deeper and I also don't take samples 2 hours apart I take samples maybe 20 minutes apart so that they have enough cells which are similar enough I think that's what it boils down to. I don't believe that a symmetric division will create two cells which are completely different from one another most of the genes are probably going to be similar yet there are going to be principle differences in transcription factors and signaling molecules maybe not. So why did you pick Xenopus and not Drosophila? In Drosophila you can actually back everything with genetic analysis it's impossible to do genetics in Xenopus it's only descriptive Well I can give We can do genetics in Xenopus Would you like a polite or an honest answer? Give me an honest answer Ok seriously speaking large cells are important we're looking only at about let's say 1 to 5% of the transcriptome the rest is getting lost in the pipes and so giant cells one important reason You can do morpholinos and CRISPR knockouts in Xenopus You can do a lot I don't think Drosophila is in any way superior to Xenopus but that's being a groupie When is disassociating the cells how does that impact the profiles? I could talk for hours about adventures of finding dissociation protocol so that's very important I think it introduces very little bias and we had to work it out three times it took a year to work out in three different species that we've done I didn't mention the third species at all a brief answer is you dissociate rapidly and within minutes everything is on ice so most of the processing happens on ice there are probably some response but So I realize your analysis is in progress but can you give us a little bit of a flavor for I mean you should start seeing blood I'm not supposed to say this but I think it's coming out in science pretty soon Ok so what's the difference in your map versus the classical hematology? As far as it's Classical hematology? Well as far as the differentiation of various lymphocytes subtypes and such We're not that far in development at all to even begin to see those things but there's classical Xenopus atlas which we did compare to and as I mentioned it matches beautifully surprises are mostly in terms of how early things begin to be defined I guess I don't know what stage 22 means Oh there was a little picture there it's a little fish which it's sort of like a little fish looking sausage which has not even gotten its first heartbeat How early can you see no one differentiation brain differentiation You ask about brain differentiation How well when do you start to see It depends on your definition of brain we have a lot of types of neurons where we recognize familiar markers but all of this is RNA and so if you want to know about the functional thing I cannot say anything yet but we do see at the latest stage types of neurons so if you go to the poster I'll zoom in with you and look at different neural types you said between the two you have around 5% of transcriptome measured in the two species how much this is No I'm saying every single cell gives me shows me about 1-5% of RNA molecules that in the cell the rest is lost but as soon as I've taken several cells of a certain kind and averaged we have a very good representation Okay and comparing the two species how much this transcriptome is shared at different stages I mean first of all something I really don't like to admit is that zebrafish has much smaller cells but seem to show as high percentage for every cell you are asking how similar is the expression across tissues to my taste we can discuss a million metrics of this comparison but to my taste surprisingly not conserved you recognize what you recognize markers for cells for neurons and muscle but those are the ones we have been studying for 20 years because they show up everywhere very easily but if you dig a little deeper to my taste again very little conservation What is the minimum number of cells that you need for this analysis? Tough question thousands Well we did the first round of analysis with 50,000 across 10 stages we've seen much more when we added another 80,000 so I don't know how to think about the minimum No because this could change the data depending how many cells you have because you lose when you have a small number of cells you lose some of the cells with a certain signature but we will only know this retrospectively after we have done 10 million cells we will know how things change Okay You were supposed to go further and all questions could be asked during the panel discussion so maybe... Actually, the short talks are meant to just draw people to the posters and of course continue discussion during the final session other way and we will move to the next speaker