 So, thank you to the organizers for inviting me. I really enjoyed the meeting, and sorry the previous talk ran over, but it was my fault for asking some of the questions. And thank you most of all to the AV guy for getting my talk to show up. So, I'm going to talk at the beginning about, so I asked about what to talk about, and I decided to send in a title about work that we've done on synthetic yeast genomes. But after hearing the earlier talk about organoids, I decided to throw in an organoid bonus for work that is going on in organoids in my lab in collaboration with Andy Ewald. So, Sacromycea Cervicea 2.0. So, the introduction to this is that my previous life, before I was an academic at Johns Hopkins, I worked at a biotech company. The last product I worked on there was the 454 genome sequencer. So, it was really exciting at the time. This is a curve from 2003 showing that our machine was the world leader in DNA sequencing, only for a couple of years, however. So, then Illumina took over. So, let's see. Not there. So, this was our productivity, and then Illumina took over, and we had a couple more years on the market. But the curve I'll talk about is this yellow one, productivity in writing DNA. So, the same way that DNA synthesis has been increasing in efficiency, so has DNA synthesis. These curves are from Rob Carlson, who's a biotech writer. And that stimulated a project to make a synthetic version of the yeast genome called Sacromycea Cervicea 2.0. So, in order of appearance in the team, so the overall concept for the project really goes back to Jeff Bucca and Trinivas and Chandra Sagaron, who are having coffee one day and talked about possibility of maybe making synthetic yeast chromosomes. And shortly after that, when he was having trouble using DNA strider to design entire chromosomes, he asked me and my group to get involved. So, we've been responsible for the front to back informatics for the project. I got involved because I thought there would be some really interesting scientific questions. However, most of the time we spent writing workflow software. So, we did all the software to design the synthetic yeast genome to do all the ordering, to run all the laboratory, to automate as many processes as we could, and then to analyze the synthetic cells that we got out. Synthetic in that they have synthetic DNA. And then working with me were Sarah Richardson, who's now CEO of a synthetic biology biotech microbiome, Giovanni Strachodonio, who's now faculty at the University of Essex and Kunyang. And then other participants, Romain Kassoul, who's in the back and spoke earlier today, has been working on comparing the 3D confirmation of the chromosomes that we've synthesized with the native chromosomes, and then partners all across the world making different chromosomes. So, about a year ago, there were a series of papers. So, this figure was generated with data from Romain's lab showing the synthetic chromosomes organized in the nucleus in the sort of goldish color with the remaining wild-type chromosomes in sort of the sort of whitish-grayish-grayish-grayish color. So, we've made them, the cells work. Here is progress to date. So, yeast has 16 chromosomes. And in addition, we've designed and are making a 17th neochromosome that has all of the tRNAs on it. So, one of the changes that we made, I'll get through sort of the types of changes we made in the genome, but one of the changes we made is taking tRNAs from all of their native locations and putting them all together on one special chromosome with the idea that possibly the other chromosomes will be more stable because tRNAs are a place of DNA instability because of the high transcriptional rate, and then, I guess, DNA replication forks run into transcriptional apparatus and you get DNA strand breaks. Also, the idea of orthogonalization, that a lot of synthetic biology has been driven by electrical engineers who think about designing orthogonal components, and I don't think that idea has really gotten very far, but nevertheless, we've decided to orthogonalize the genome by putting all the tRNAs onto one separate chromosome. So, there you go. So, in blue are the chromosomes that have been completely synthesized, integrated into yeast cells, and are able to support the yeast cell life with fitness that is pretty similar to wild type, so no real fitness defects. Part of the process I'll talk about is the fitness defects that we found and what they were due to. In yellow are chromosomes that have not yet been completed, but all the DNA has been synthesized, most of it's been assembled, and some of the chromosomes are completed but have fitness defects that we are still tracking down and getting rid of. We have single cells that have up to three synthetic chromosomes at this point. There's been no real barrier to getting synthetic chromosomes into a single cell. The barrier is more that it goes through a myotic division which causes recombination between the synthetic chromosome and the wild type, and then we have to do back crosses. We, as in Jeff Buka's lab, does back crosses to get a fully synthetic chromosome. So quickly the changes that we've made, so house cleaning, we've taken off the original telomeres and we put on a universal telomere sequence. There's a lot of pseudo genes that accumulate in the sub-telomere. We've gotten rid of them. We've removed transposons and repeats, removed introns. So as I mentioned, we've got this neochromosome that has all the tRNAs. So this is an interim version of the neochromosome. We're maintaining the copy number of the tRNAs in the wild type genome, in the tRNA neochromosome, and sort of as we're doing the deletions, we're adding the genes to the neochromosome. So that work is being led at University of Manchester now by Patrick Tsai. So we're also doing synonymous recoding for two reasons. One is that part of the synthesis and assembly technologies have restriction enzyme requirements that either restriction enzymes are used to do part of the synthesis or assembly or for late stage, I guess like restriction ligation reactions. So sometimes we have to add and remove restriction enzyme binding sites and we do that with synonymous recoding or protein coding regions. We're also putting in watermarks that are called in our hands PCR tags that we take a protein coding region and we change the DNA sequence so that it is unique in, essentially unique in the genome really. It's the two, we have, so two recoded tags that are close enough together, a couple of base pairs that we can easily make a PCR amplicon to check quickly that the DNA content is synthetic as opposed to wild type. So this is actually one of the regions where reasons, this is one of the areas where we had to actually pull out some algorithmic work and make this run fast enough to be able to solve the optimal, the constraint problem to be able to get these to be unique in the genome. All right, so new capabilities we've done TAG TA, TA recoding so that eventually we might be able to introduce the 21st amino acid into the genetic code. And we've put in LOXP sim sites, so these are symmetric versions of LOXP sites that when Cree recombinases around permit recombinations in the genome to give inversions, transpositions, deletions, and all together there's about one million base pairs difference between the DNA we designed and the wild type genome as well as this additional chromosome. Why is the 21st amino acid? Why did you put the 21st amino acid? Oh, so this is getting rid of the TAG codon so then we can put in a TAG codon and have it and then introduce a tRNA synthetically coupled with an unnatural amino acid and then have an additional functionality in the protein code. So there are like more wilder science fiction-y type things to try to get all that apparatus working inside a cell and Farron Isaacs has done similar work to get E. coli that have been recoded to free up codons. Why would you remove the introns? Why would you remove the introns? So it turns out we can't remove all the introns. We thought we could but some give fitness defects. So we moved, most of these genes don't have introns and we thought it would be cool to get rid of the ones that are there. Because they seem to be sometimes clustered in important genes, right? So there's been other work and I don't remember the name of the group that did a project to systematically delete introns. Some introns you can't delete but not because of the intron itself it's because there's a non-coding gene in the intron and if you, so at least, so this is my, I'd have to go back and check my notes but that if you move that non-coding gene somewhere else then it's fine. So I'm, so I'm, don't hold me to that, that sort of, but there are other introns that people have deleted and a couple of them, I think they're five or six that when you delete the intron you get a fitness defect. Most introns you don't get a fitness defect when you delete. Part of the idea about deleting the introns and seeing whether some of the splicing apparatus then became non-essential, you know, is it just there for splicing of introns and protein coding genes or does it have other functions? So that was the general idea. This is, so I'm not really going to talk about the scramble experiments, so I'll, so the, so, yeah, so on the computer we take the sequence, we break it down to smaller and smaller pieces, eventually they get ordered, so they get ordered, then we receive them. At Johns Hopkins for a while we had a factory running with undergraduate labor taking the oligos received at that point from IDT and then doing PCR reactions to build them across to 600 more pieces and then build them up in succeeding stages of assembly. So the, these, these undergrads completed the synthesis of chromosome 3 published in 2014, so that, that was, that was really exciting, but the course is over now because all the DNA has been synthesized, so it was fun while it lasted. So here's a picture of what the design looks like if you want to make a figure for a paper. So these are different parts of chromosomes and these arrow things are protein coding genes, the green bands are the PCR tags we put in, the green diamonds are the lox piece and recombination sites, the red is an essential gene, these purplish are autotrophic, oxotrophic markers and, and so there you have the design. So, but we made some mistakes. So it is, it is a challenge to do all of this design and, and so yeah, so mistakes were made and I guess really they'd be my fault because my group did the computational design. So there are a lot of problems. So one is we don't really have any good models that map from sequence to fitness. So we just heard a talk trying to build up regression models that say here are the variants and here's the fitness. So those are still, you can't really do that. Also, trying to identify hyphenous sequences that obey synthesis and assembly constraints so those algorithms didn't really exist. We weren't, you know, sure exactly whether what we implemented was going to work well. Also the, the execution, just putting all of this together in a production pipeline. So when the human genome sequencing project started there were lots and lots of groups writing software to first, you know, look at gel images and get DNA sequences and then do all this assembly and, you know, sequence validation and instead it was a couple people in my lab doing all the design and then risk benefit. So this to me is the most interesting that in talking through the design choices so we just heard like why get rid of all the introns and so like one answer is like, you know, why not? But the, but there are other changes that, that we were thinking about making that, you know, we weren't sure, you know, pushing sort of the design goal of being able to test wilder things you can do with the genome versus worrying that we'll get dead cells out and won't have a way to figure out how to get them back healthy. So we wanted to, yes. You didn't mention the ribosomal RNA, you would use the corpus, right? The number? I don't know. I don't think we're really reduced the ribosomal arrays. That's my question. But use the number of this and make it a smaller chromosome. So, so I'll get to this later, but I'll answer it now. Actually, one of the fitness defects that we get with the tRNAs is that we don't have enough tRNAs in the cell because we're deleting them, but often we haven't added back the tRNA neochromosome. I'm talking about the ribosomal. No, no, no. So what we see is up-regulation of the ribosomal, of ribosome components to make up, to compensate for the depletion of tRNAs because we don't have as many copies of the tRNAs as we're supposed to have. So if we further reduced the ribosomal arrays, I think we'd have even reduced fitness. So we haven't been reducing the ribosomal arrays. I don't think we'd want to reduce the ribosomal arrays. Definitely not until we have the tRNA in. And then that would be sort of, I think that would be a reasonable thing to try, but for now we're not reducing the ribosomal arrays. So how many copies do you have? Maybe we'll move this way. No, this is a quick question. Okay. We have as many as there are on the well type. So we want to end up with a beautiful, well-functioning, classic yeast cell, but maybe like what we're going to make is more like, more like the Balor. So there you go. So all right. So here we go. So what mistakes were made? And if we knew then what we do now, what will we have done differently? So if present-day me went back to like 2007, 2008, there were lots of mistakes that I wouldn't make. But in terms of yeast design, what would the difference have been? So here's this first point about the ribosomal RNAs. So we actually made one blunder and it was not my fault because Giovanni warned Jeff about it, that there's a single copy tRNA that we, like Giovanni said, is a single copy. Are you sure we should delete it? And we deleted it and then there was a fitness defect and we had to add it back. So we told them that we thought it would be a problem and got deleted anyway. Fitness defect, we added it back. So that was a blunder. And so as I mentioned, the tRNA copy number variation that we transiently have reduced copy number ends up with ribosomes compensating the opposite direction. So if you look at those 2017 papers, each yeast strain that has a synthetic chromosome has mRNA seek done. And pretty much the only significant differences are that ribosomal genes are over-expressed. All right. So now here is actually a real systematic problem that we had. So I mentioned that we replaced the telomeres with a universal cap. And what we found is that the telomere is silenced and then there's an insulator that keeps the silencing from going into the subtelomeric and then into the regular chromosome. And what is happening in the synthetic chromosomes is that the silencing is extending further than it should. So that's happening for two reasons. One is that probably our silencing sequence in the telomere cap was not sufficient. The other is that since we got rid of a lot of the subtelomeric junk, or I guess maybe it's not John, we got rid of a lot of the subtelomeric pseudogenes. And they sort of were a buffer in the wild-type chromosome that if silencing extends into the pseudogenes, that's okay. But since we got rid of that, then it's easier for the silencing to extend into the real protein-coding genes that are used. So probably what's going to have to happen is systematically replacing the so-called universal telomere cap with a new and improved universal telomere cap. So that's on the way. So another problem that we have... Oh, did someone just... So these LOXP-SIM sequences that we put in for recombination, so they're completely stable once things are integrated in the chromosome. But the problem is that one of the steps for integration and assembly is homologous recombination of putting in synthetic DNA and it's supposed to land where it's supposed to land and get rid of the wild-type DNA. But the LOXP-SIM sites can seed homologous recombination. And then what we get is sort of a misassembly. And these are easy to catch. But the problem is that we didn't put catching them into our workflow until they got sort of further along so we didn't catch them as quickly as we should. So this isn't really something that we could have fixed because just having LOXP-SIM sites in there causes this to happen and we wanted them in there. So it's more that our process just should be improved to look for off-site homologous recombination. It turns out, however, that there are two problems with the LOXP-SIM sites that we found where we put them... So LOXP-SIM sites were inserted into the 3-prime UTR of every non-essential gene. So there are about 1,000 inserted to date. And two of them affect the promoter of a neighboring gene through a mechanism that we're not really sure of except that we know it's because we put the LOXP-SIM site there because when we take it out, then the problem goes away. So it's unclear how to predict these. So unfortunately, knowing this, it wouldn't really have changed how we did the design back then. But would it change the way that you go forward? No, because we've tried and tried and tried but we haven't found any good way to predict why it is that out of these 1,002 of them have this effect. So there's nothing... Oh, the effect is that I think it is the fitness defect due to down-regulation of the neighboring gene. So it's transcriptional... You pick it up on the profiling? So we pick it up two ways. We pick it up one way that we see the cell as a fitness defect. We pick it up seeing that the transcript is not expressed... is expressed below the level it's supposed to be. And then we cure it by getting rid of the LOXP-SIM site. And that was when we decided that was enough work done on the project. So, yeah. Did you correct the... Yes, yeah, yeah. Because these fitness defects, they're not big. No, I mean the mutant genes, like half one that are in the reference stream, did you correct those as you went along? Oh, the mutant genes that are in the reference stream. There's about, you know, there's a dozen or something. And it... I don't have to ask Jeff. That's a good question. No one's ever asked me that before. I never thought about it. They make a huge difference to the fitness of... So the reference stream is a mutant. So we're not... So this isn't based on the reference. I think this is based on a BY? Yeah, that's what I'm talking about. BY4741? Yeah. I'll have to ask Jeff. I should make a note to that. No, it's very helpful. Yes, very helpful, but we need to... I'll still... I'll bring it home by 4, don't worry. All right. So synonymous recoding. So here again, so these are the watermarks we put in. So we put in... So we recoded 60 KB synonymously. And out of the 60 KB, so 60,000 bases, three of them caused problems. One, it's completely unclear to us still why it causes a problem. And another, we create a stem loop in the PRE4, P-R-E-4 mRNA. So possibly we could have avoided it, but if we worried about every stem loop, we would have made... I don't know if it would have been worth the trouble. And then here, this is a real one that probably we maybe should have been smarter about. We created a RAP1P binding site in a transcript, which causes the transcript to be repressed. So possibly we should have screened more for creating RAP1P binding sites. So go... So do you put the watermarks inside the coding region? Oh yeah, that's the only place we put them. Because we don't know anything about gene regulation. All we know about is the genetic code and that if we have the same codons, we'll get the same protein. We tried to avoid putting PCR tags into the very five prime region of a gene because there's lots of evidence that that codon selection is chosen to melt structure and make it easier for the ribosome to have an on-ramp or whatever people say. So we try to avoid that. But otherwise we just go wild on the protein coding region. So going forward, maybe we're going to incorporate these in the next design, but looking backward, there's a trade-off between experimental and computational effort and probably it ended up being more efficient just to do what we did and then have the wet lab people fix the couple of bugs that's sneaked through. So I say that as the computational person. So our total bug count, loxp sim sites, a thousand added two bugs, PCR tags, 60 kb of tags, three bugs, stop codons, so no fitness defects in the stop codons. Synonymous recoding 5,000 bases, no bugs, tRNA deletions, no unexpected bugs, and then repeat deletions I didn't even mention. So far no fitness defects from getting rid of any of the repeats. So to me what this means is we should have been more aggressive in our design. That synonymous recoding has five times ten to the minus fifth bug rate. So the design choice that I was pushing for, that Jeff said no to, was to pick some of the other low frequency codons. So low relative synonymous codon usage and get rid of them also. So typical low frequency codon, fewer than 10% of the codons for that amino acid use that codon. So 30 to 70,000 occurrences of low frequency codons. So for each of these amino acids that we recoded, we would have maybe introduced three to five fitness defects genome-wide that I'm sure we could have gotten back, yes. Why is it not a sharp number? Why what? Why do you have a range of occurrences? Oh, because it depends on the amino acid. So different, so you have to multiply the RSCU by the number of amino acid occurrences of that amino acid to get the number of occurrences of that codon. And they're also about 10%. So I'm not telling you what codon it is or what amino acid it is. It's just like, that's sort of the typical range. There are like three or four of them, maybe five of them in the genome, in the yeast genome. So something that really shocked me is how few execution areas we had in terms of getting an email from Jeff that we have to spend money on nucleotides by the end of the month and we need to quickly get this order done and a lot of code that was written once and run once and then never touched again that actually like very low human error rate, much lower than I would have expected. Annotation errors. So something that caught us at the very end when we were trying to get papers submitted is that we tried to upload our synthetic genome sequence and annotations to GenBank and it got rejected. And it didn't get rejected because of anything we had added. It got rejected because of the legacy annotations from the original yeast genome annotation that in between when that first genome sequence was submitted and when we did this GenBank improved its syntax checker and then rejected some of the reference stuff that it had accepted before. So probably we should have been using the GenBank table to ASN checker. So that wasn't really our mistake, but it's like one of these things. So as Charlie mentioned that I wasn't aware of that mutations in the reference sequence, like here's a mutation in the reference annotation that we had to fix. So we haven't fixed the sequence maybe, but we have fixed the annotation. So challenges in genome design. So for mammalian. So now there are a lot of people interested in doing similar projects in mammalian. And so we're interested in that also. But mammalian is much more challenging for a lot of reasons. My only insight based on yeast is that I wouldn't worry after designing all these yeast chromosomes and seeing how well they worked, I wouldn't really stress out too much about getting the design perfect. I think it's better just to sort of start and see what works. Yes. What I understand is you clean up a little bit the yeast sequences. I don't understand where is the new stuff. What new stuff? What you did is you took the natural sequence. Oh, the stuff I'm not talking about is the locks piece insights and having the chromosomes programmed to rearrange in different environments and explore diversity space by generating lots and lots of very closely related genomes that then wander around in genome space. Yes, I'm not talking about that. And if you would put this new yeast in a natural environment would they out-compete? Well, I'm not sure they would really have so much fitness defects compared to regular yeast. The value is that it's very fast-evolving in terms of copy number. So they're very useful to evolve pathways or live in stress conditions that you turn on this recombination system and then it will increase copy number of things that should be increased quickly. It will decrease and it will be locked into the genome. So it's been very useful for that. More important, can you still bake with it? So the answer is yes, but the IRB won't let us eat the bread. One of the projects of our, so like there are groups that have, schools have iGEM teams to do synthetic biology. So one of our teams put beta-carotene synthesis into yeast to have yeast that was going to be enriched for, so carotene starts with a C but unfortunately it's vitamin A. So that they're enriched. And when you grow the bread, well, when you bake the bread, it smells a little like carrots, but the IRB told us specifically no tasting the bread. It's got DNA in it. When you measure fitness, it's in a specific environment. Yes, so usually we're talking about fitness. Well, fitness was measured at like three or four standard environments, like regular temperature, I think higher temperature to look for higher temperature fitness defects, a couple other fitness, but you know, we didn't probe it over a ton of fitness conditions. How long have they been growing? Ah, since, so different, so different strength, like I think we got our first chromosome, the first full chromosome was 2014 maybe. Are they evolving? No, no, no, I mean, no, they have the same mutation rate as a regular cell, which is like one, usually it's like one mutation per generation. They're not really, you know, we're not, they're not, and the loxpiesim sites are completely stable in the absence of Cree expression. There's nothing, like they're no more, they're no less, well they're no less, they seem no less stable in the wild type yeast and hopefully they're more stable because of the TRNA neochromosome, hopefully. All right, so thanks to funding NIH DARPA NSF on Erison Bio, and now I want to tell you in the 15 minutes, is it really 15 minutes? Because I started late. And there are some questions. You have a little bit more yet. All right, I want to... But still we think about one hour of this question. Oh, yes. So together with... That's right. But 15 minutes it's okay. All right, I just want to show some math slides. So, so I did not know before I started working on this, so this is a cancer project. I did not know before working on this project that for breast cancer, it's not the tumor that kills people. It's the metastasis. And that therapies to get rid of the primary tumor are not really effective at all if the tumor is already spread. So five-year survival for local or regional breast cancer is very good, but metastatic cancer, once it's spread, is only 26%. Metastasis is very difficult to study. So here's sort of a picture of what metastasis means is that there's a group of cells in a tumor. And these are... The cells are colored suggestively to show different cell fates in a tumor. So the same way that normal cells in the body have different cell fates, there's sort of a growing theme in cancer biology that different tumor cells have different fates that may have to do more with changes in, you know, epigenics and transcription factor circuits as opposed to somatic mutations. So the same way that we have different cell fates in our normal cells, because of cell fate choice, there seems to be similar cell fate choice in tumors. Part of it might also be differences in somatic mutations, it's an open question. So here colored suggestively are the blue cells that sort of lead an invasion and then the red cells that are more proliferative that follow along and then they see it a secondary site and then the more proliferative cells sort of outgrow and the more invasive cells are still there. So that's the type of picture and some evidence for this, Andy and I published recently using different protein markers, yes. I just want to point out that K14 may be required for the dissemination but it's not required. Actually, it disappeared after the tumor in the secondary site. Yes, so that's what this picture is supposed to show that exactly that, that these are colored blue and his experiments are expressing K14, that clusters, circulating tumor cell clusters look like they're mixtures of K14 expressing K14 non-expressing and then as it grows out we see the outgrowth of the non-K14 expressing. It's still some K14 expressing and it's not clear whether it's this cell outgrowing or whether there's sort of a switching between different lights. I think that's a very interesting question. So metastasis has been challenging to study in vivo for a lot of reasons. You can... But with recent funding we're really excited about studying it in organoid systems. So particularly, so the NIH Office of Cancer Genomics, the Cancer Target Discovery and Development Program and then seed funding from Ted Giovannis Foundation and then Breast Cancer Research Foundation and some supplements and seed funding we had. So here's my experimental partner, Andy Ewald who's in cell biology and oncology. We'll be recruiting 25 breast cancer patients a year for the next five years working with the director of breast surgery who consents the patients on the way in and the pathologist who hands off the samples to us, Edward and Ashley. And then the most important thing about the trainees working on this are that in blue is every French connection. So Andre of postdoc in my lab trained at Université Pierre-Marie Curie, Parisix, LOD and Eloise have just joined the group from Bordeaux and from Nice. And then Hildre and Matthew, they're not from France, but they're from the next best place, Canada. So that's the team. And the method that we're using is organoids which are just these, so this intriguing model that is very complementary to doing in vivo work in an animal model versus doing cell line work, say with human cells. It's very nice. It's a human model. It is clumps of cells grown in a 3D matrix that then behave like many organs. So here's a picture of a mouse organoid prep of taking the mouse mammary gland and then you... So these are 300 to 500 cells, so that sort of size. So in a normal individual for breast tissue, they self-organize into what looks like a mini milk duct with a luminal layer and a basal layer. And if you look at them under a microscope from normal tissue, they just sort of sit like that and don't do much. But what got me so interested in this project is looking at these movies of organoids invading. So this is an assay where this group of cells is put into 3D media and it starts to invade. It just caught my interest because I had never really thought about... I always thought about cancer as a cell division, not about this. And so I really wanted to understand this phenotype. And for me, understanding the phenotype means I'd like to put a number on it. That makes it easier for me to do any sort of statistics. And up to now, the numbers have been someone looking under a microscope and saying, oh, that's non-invasive, that's a plus, a plus, plus, a plus, plus, plus. So I wanted to do something more quantitative. Also, looking at these, I've looked at, I don't know, probably several thousand organoid pictures. You see different patterns with your eye. And I'll show some of the patterns later on. But looking at the patterns, it just seems like if we could use machine vision, machine learning methods to do clustering of the patterns, then different patterns probably correspond to different types of pathways that are being activated. And then we can dissect different things that are going on. And looking at these thousands of pictures, I could sort of see patterns, but I wouldn't really trust myself to put things into groups. And if we had a quantitative set of features to say, here in numbers is what this sort of picture looks like, then we could roll out all of the tools of machine learning, statistical analysis, and whatever other word of the moment we want, like deep learning. We could use deep learning on it. All right. So the quick math interlude is what we're doing to characterize the shapes quantitatively is, oh, so this is sort of washed out. But imagine, if you would, the contour that should be shown, if there's a sort of smooth contour for an organoid, that if you put a point in the center and then trace the radius as a function of the angle, you get something that's almost constant. If we're a circle, it would be constant. And then it's periodic, because if you go around more, it repeats. So that means you have to 4A transform it. And then if you 4A transform that, you'll get, if it were completely flat, only a zero-frequency component. You get sort of, you know, if it's a more complex shape, if it's really complex, you don't get a function anymore because if you have a really complex shape, the array from the center can intersect in multiple points. So instead, the trick is to make a parametric curve of the x and the y component separately as a function of the contour length. And then, so I was really excited when I found that with Keynote you can use Leytec. But I don't know any of the math, or any of the mathematicians, I don't know. This is baby math. Anyway, so what we do is we, so Vina, Andy's high-energy, like really patient graduate student spends hours tracing the organoid boundaries because segmenting the organoids is a pretty hard problem for us. And I don't have to solve that problem because we have Vina. So I use them and then I take those points, I interpolate them, I de-4A transforms, the zero-frequency component of the transform is just a center of mass, so I ignore that. And then we get these images, so then sort of in light blue, you can see that's a round shape in real space and then in 4A space, that's pretty much just a flat line, no spectral power. And then here's a real organoid boundary that's like a spaghetti monster. And then when you transform it, you get higher frequency components. So we do other steps. We normalize it. So what that means is if you zoom in or out, then the 4A components change with the zoom level by a constant scale. And we don't want to say something's more invasive just because we had a higher number on the microscope. So we scale out the size by normalizing to the first 4A component. And then we do a smoothing operation. So when I was a graduate student, I used cosine filters all the time, but I never knew where they came from until when I worked on this project where so what happens is that if you have like a zoomed in picture of something, you see pixelation. And then if you trace a boundary with pixelation, you don't get a smooth boundary, you get like a jagged staircase. So you can fix that by remapping it to the center of each segment. So then if you had a staircase, then you got a 45-degree angle ramp. So that's the smoothing operation. And that in 4A space gives you a cosine filter. So we slap on a cosine filter. And then, so something else you can think about. So mapping the two points to the center of the segment, that's an average. And sort of the opposite of taking an average is taking the difference between the two. The difference, it's like a local derivative. And it turns out in parametric space, if you take the difference, it ends up giving you the curvature. So we essentially weight the 4A modes by a factor of K squared. But if you do it right discreetly, it's a sine filter. So we throw on a cosine filter and a sine filter, get a weighted spectral power. And now if you give me the image of an organoid, I can tell you with a number how invasive it is. So I can do it for me, because then that means that I can do all the sorts of work that people have done for complex traits. So all the classic statistical genetics that people have done, now we can do for this phenotype. And people always talk about tumor heterogeneity. And so here we're actually characterizing it. So this is just like population genetics, except that instead of looking at heterogeneity of different individuals in a population, we're looking at heterogeneity of different little groups of cells within a tumor. So I'm so excited. So here are the results from 800 organoids from 52 different tumors grown for six days. And I learned how to... So when we only had seed funding, it was me and my Python interpreter doing all the work. So I learned how to put together these thumbnail pictures. So I'm very proud of figuring out how to do that. So each of these little blocks is a different organoid in false color, because if you give Python black and white, it says, oh, you gave me black and white, but color is much prettier, so it doesn't in color. But if you give it a color image where you have all three color planes and they're the same, it gives you grayscale. So that was very confusing for me for a little bit, that all of my grayscale images were coming out in color and all my color images were coming out in grayscale. But I figured it out. All right, so here what I'm doing is these are the... So each of these columns are organoids generated from a different person's tumor. So there should be 52 columns, one for each tumor. And what I've done is I've stacked them from least invasive to most invasive. And if you got up closer, I think you would see that you would agree with, for the most part with the way that my algorithm ranks them, that at the top they look more invasive, at the bottom they look less invasive, and then this is just a false color scale. So within each tumor, we're seeing heterogeneity. We have within tumor heterogeneity that in a given tumor, some organoids generated are less invasive, some are more invasive. And then what we'd want to do eventually is to take these organoids from a single individual and say what's different between these less invasive and more invasive organoids. Let's characterize them by RNA-seq, look at gene expression differences. You know, maybe we'll do whole genome exome sequencing to see if there are somatic mutations that make them different. Also, we have between tumor heterogeneity. So what I've done is, I've essentially used the geometric average because that seems to give a little nicer ranking, for overall less invasive on the whole, so this individual's tumors, organoids are less invasive, these are more invasive. What we want to do eventually is to see if these correlate with patient outcomes. Breast cancer survival, even though on the whole it's bad, still from primary tumor removal, it's five or ten years to get good end points. So we're going to be doing those correlations to see if the organoids are proxy for survival, but those outcomes won't be for five to ten years. Also, I can't guarantee that this study is powered to do that. But we're thinking about other studies that would be more powered that are... Question? Ah, if only I were a biologist. If you take a red spot, you disperse those cells and restart the organoid. Do they repeat the phenotype? Do they repeat the phenotype? Well, so what we're doing for that is... So the phenotypes are not... I'm sorry, the organoids are not highly proliferative. So they're not... There might be one or two cell divisions in the six days at most. So we don't really have experiments like that up and running. However, we have mouse-genic models where the phenotypes from one mouse-genic model to the other are very reproducible. So at this point then you're sampling different bits of tissue? Yeah, for some reason. What we're doing here for heterogeneity of the tumor, you're taking different parts of the tumor and get organoid that behave differently. Yes. But I think that every organoid has heterogeneity in itself. Yes. So you will be able, if you do single cell on this organoid, you will be able to identify the population of cells within the organoid, that metastasis. I agree. That is in the plan. So I was talking to, I think the person who spoke earlier in this session about the C. elegans single cell stuff, is if we do single cell mapping back from the single cell sequence results to geographically spatially where it came from in the organoid. So what we're doing as a start is we've got plans to do laser capture and micro dissection. So you can actually, eventually, enrich for these cells the metastasis and put them back into organoid or put them back into nude mice, for example. So I think we have experiments like that planned but we should probably talk after. We did that in the mouse and it's amazing. You can see clearly what population is the metastasizing population of cells. So we want to, so I want to, I'm really interested in circulating tumor cells. So that, I think like that is probably where we're going to, one of the directions we want to go is. MT markers, for example. Yes. Or not. Or not, right. MT, the one that metastasize. You will find proteases that induce. So what subtypes are these? You say that, no? Oh, I didn't say what cancer subtypes they are because I'm not. You use all of them. Yeah, we're just using one. It doesn't matter triple negative. I'm not. Because they are very different. Yes, they are different. I understand that. So these, these experiments were done with pilot funds where we were sort of taking tumors on the way in. Now that we have real funding, we can do it more systematically. Invasiveness is not a surrogate from metastasis. I agree. Carcinomas are invasive by definition. That's what makes a carcinoma different from an adenoma. Adenoma forms an organoid that is completely spherical. And adenoma is defined as a noninvasive tumor. Carcinomas are invasive, but most carcinomas are not necessarily metastatic. And in breast cancer, you can have metastasis occurring 20 years after treatment, which is due to dormancy. And there's no evidence that the metastasis originates from the circulating tumor cells. A lot of it is totally hypothetical. You can address that. Okay. You can take... I'm the math guy! You can address that easily, actually. You can address that easily. If you find an organoid that's invasive, you can put it back into nutmites. And then you will see if they metastasize. Oh no. In a mouse. Doesn't mean they metastasize in a mouse. Well, this is the best you can do. If you've done, we can just continue this discussion. Or otherwise, maybe we will allow you to... Let him talk, please. To talk. You have like three minutes to... All right. I will only... Your talk and then... Okay. I will show like this and then one fun slide. All right. So here... So I completely... So I agree invasion is not metastasis. They're also a dissemination... So we can't get metastasis in 3D culture. Instead, the closer phenotype might be... Might be dissemination. Also, Andy is working on entravization phenotypes where it's a co-culture with epithelial cells that look like a blood cell and seeing the tumor cells squeeze in. So that's like... That's his work that's going on. What... Since we mentioned K14 earlier, what we're starting to do is a standard population genics type tests to do within tumor, between tumor tests of association. So here actually are phenotypes of invasion without dissemination. Either it's a single column or a collective invasion. Here actually is more dissemination without invasion. So this is... I think this is the... I forget exactly how it says C31 tag where individual cells are pulling off. And so this sort of organoid actually has a very round boundary, not invasive looking, but it's like water molecules boiling off of a water droplet as the individual cells crawl off. But what's constant in all of these is that the leading cells are expressing K14, possibly pulling themselves to the matrix. And so we... So we actually have the... So Andy's staying for K14 because he's so interested. And then we did regression tests of K14 versus our invasive phenotype. And so here's a correlation for an individual tumor measuring the difference in K14 versus baseline and difference in invasion versus baseline and a strong correlation p-value of 10 to the minus 42. So what we want to do now is do this genomically for transcripts, proteins, and also do things that are sort of really more tied to the biology. All right. Yes. I think the K14 is true only for the ER negative. I think we should... So I have to go to see whether we actually are consented to look at the ER status of the organoids that we've done so far. I mean, we're consented for some things, but not for a lot, and actually one of the things that I have to be working on is our IRB. All right. Two more slides? Two more slides. So first, so thank you. So I want to show some of the family's French connections. So this was our... This was my wife's dog actually before we got married, but then the dog adopted me, born in France. So there's my family. So thanks to my family for letting me come here. So my wife has a matrize from Patel de Sorbonne admitted to the French bar. So Ezra was born in the 17th. So these two kids are being brought up francophone. My daughter's favorite food is Ponce Chocolat, and this is this dog, a three-quarter French poodle, and that's our new puppy, Ophilly. So there. Sort of washed out. So she's only three-quarters. She's a Labradoodle. So she's one-quarter Labradors, three-quarters. Yeah. And there we go. All right. So thank you. And they let me travel anyway. Okay. So please. So metastasis is an extremely important, difficult problem. As a math guy, it seems to me that working backward from the genetics of all the metastasis is the better approach because there is some part of the genetics of the metastatic tissue is reflecting the process of metastasis, maybe only a small part. So there are two problems with that. One is, say that there's EMT, then with EMT comes MET, so the back transition. And so what if what you're looking for is actually a transient that is only expressed by the cells as they're metastasizing, but then when they regrow, either you lose that cell type because it's a small cell population or you lose it because there's an MET transition. So looking at the metastasis might not be very helpful. The argument is fair, but looking before metastasis essentially opens you up to any possible interpretation. So that's why I'm so excited about methods to look at cells as they're traveling through the body. However, there's also a very low success rate for circulating tumor cells for reseeding. So it's like, if we're an easy problem, it wouldn't be open to work on. I have a question. Well, the biggest problem is actually a tactical problem because when patients die from metastatic tumors, they hardly do any resection of metastatic tumors. So what's available in the lab for people like you and me and the primary tumors. So yes and no, we have a great program like that at Johns Hopkins and Pancreatic, where in Pancreatic they, at least my understanding, so that they take out anything else that they find like as autopsy. Ralph Ruben, he mostly has primary tumors because when patients die from metastasis, there are very few places that actually pay for the autopsy and you have to do a special program worn by the autopsy to really excise the metastatic tumor. So we have a program like that in Pancreatic. The problem in Pancreatic is that by the time 90% of the patients are diagnosed, they already have metastasis. So and we still don't know what's the genetic difference between metastatic and non-metastatic tumors. Despite so many years of research and so many cancer genomes have been sequenced. We still don't know that. Yes, I'm so surprised, like I always thought, oh, there's this wealth of data, but almost all of it is primary tumor that, and also the sample numbers, my picture was, oh, these days everybody who has a cancer diagnosis gets their tumor sequenced. Yeah, and the answer is they don't, and the answer is no, and maybe they'll run like the foundation panel of like 50 or 100, and even that ends up really not being so helpful for therapy choice. And so it's really, so that's, yeah, so that's why I have to get hard to work on our IRB to be able to, yeah. And also usually the consent doesn't allow follow-up. So it's not as far along as I had imagined when I started. But the time somebody is dead, there's nobody to sign the consent form. Questions? We will thank again our speaker.