 Great so we're now going to move into a panel discussion and what we have here this panel is focus on basic biology and The use of modern code for understanding basic biology It's going to be chaired by Gary Karpin and there'll be additional speakers who Gary will will introduce So welcome Up there we go. Okay, so First as as Lee said we are having a discussion which means your involvement We've sort of as you'll see we've come up with some questions, but that's pretty much it for us We have no more ideas. So Maybe I should just speak for myself Please wave as I say your name. So Yes, I'm Gary Karpin. I'm the moderator the fact that Elise thought that I could do anything moderately was cute Dave McAlpine Brenton gravely Valerie Renke and Susan mango. Okay, so Just to start out the way we're going to organize this is we're going to have these four questions. We're gonna Yes Sorry, that's a maybe somewhat of an in-joke You have to either be from New York or Ashkenazi Jewish or something like that. Anyway, we have these four questions We're gonna have roughly 10 minutes total directed towards each question So will each of us will get up and present sort of the context for the question and and then we'll go through Have a discussion will limit the total time to about 10 minutes So we're giving you this in advance so that if you you really like question for and have some ideas wait till then and Jump in so where the four questions are gonna be how can we quantitatively analyze chromatin protein localization dynamics? genome-wide at high resolution in single cells How do we develop tools and approaches to quantitatively describe protein DNA interactions and occupancy? Have we comprehensively characterize the transcriptome and what are the conceptual and practical barriers to effective use of modern code data? By the research community for those of you that it can actually see the bottom and and the point is we're actually as you can tell from the questions not really Here to talk about the beauty of modern code and what we've produced but looking forward What's needed to to advance the field to advance our understanding of chromosome biology and and the other areas that? Modern code touches. Okay, so I'll start out and and I'll just give a very brief introduction to the question that I want to ask about dynamics and That is that actually this is since we have other institutes here. I thought I'd point out this comes from a project funded by in a GMS the the general medicine Institute I'll just point out, you know within cells you have as we've heard already a couple of different types of chromatin a Heterochromatin here in green you chromatin in red and Heterochromatin is marked by proteins like Heterochromatin Protein one Discovered by Sally Elgin a while ago and this just shows you that if you do look dynamically if you do live analysis with for example HP one GFP you see the The HP one domain the nucleus is bigger and it's basically the cell was moving around whoops Cell was moving around after you irradiate this domain just enlarges dramatically 1.5 fold Volume increases starts very quickly as soon as you can start to image within three minutes after radiation it lasts about five hours and you have all these Dynamic protrusions which have now disappeared even though it was supposed to keep looping So the thing is and this is the last little bit of data is just basically you see these dynamic Protrusions and you see this dynamic expansion in 1.5 fold over the normal volume But in fact when you do a chip experiment and you look at HP 1a and you look at untreated 30 minutes after a radiation 60 minutes after a radiation You still have HP 1 in the header chromatin now. This is good That means this is a is this is an expansion of this domain not spreading of the protein into you chromatin But it also tells you that if I'm just doing a chip experiment I'm gonna say nothing's happened and in fact a lot has happened So this gives rise to this question that or basically the point that chip the different C's high C Five C four C. I don't know what's going to be next really only provides static maps of cell populations and This is a real problem looking forward and thinking about how we're gonna understand function Using the kind of data monocode and produces and so this raises the question Well, I guess you could say should we or can we quantify to the analyze chromatin protein localization? Dynamics in a genome-wide basis at high resolution in single cells and Susan's Discussion I think bears directly on this But we're here talking about if we look at the kind of data that we've generated for modern code For example in histone mark distributions, how can you really analyze that dynamically in cells? at the single cell level and so Guess we'll open it up for a few minutes of discussion again, I would encourage not just our panel members, but people in the audience to get up to the microphone Minolis and Also, if you have other questions just these are launching pads for for whatever it is you'd like to discuss Gary so I Have to question the question Since you asked me to in other words Modern code has done so much with technologies that were developed Prior to modern code and I think what you're asking is a huge break in technology in terms of sort of transitioning from you know being on earth to being in space basically and It's it's it's laudable and it's visionary and it's extremely useful but what I'd like to ask is Could you get there with existing technologies in other words? Is there anything you can do to? Use chromatin immemorial precipitation and use the seas and so on and so forth in order to get to aspects of this Question that you can then reconstruct perhaps computationally or by integrating different data sets and so on and so forth So so I think I'd like to perhaps turn the question around and say or preface it with in absence of you know sort of breaking technologies of sort of Non-continuous type of technological development Can you envision a path with existing technologies of components of this question? That perhaps you could you could obtain Turn on the button. Okay No, I think the point is that we can't do with existing technologies We can get there partly Steve Heneckoff has done the the catch it type of analysis for H3.3 You can map that and and other kinds of variants across the genome I think what I'm mostly trying to point out is that if we want to understand genome function Which is the goal of the encode? Project and many of our own individual research We need to go beyond what we currently have and and the problems that I see now with current technologies are we can do a lot with respect to populations of cells and and chip for histone modifications is the best example I think Where we don't really have the resources to be tracking what's happening in real time Not just with cytology, but with any of these other methods And really gain a gain an understanding of what is the basic biology that's underlying what we're seeing And so we are probably getting and we'll discuss this a little more next in a couple of minutes We're probably getting misled by a lot of the not a lot but by the type that the type of technology that we have available today Sorry Sorry the the question was what about the pack bio technology where or a lot of the nano Sequencing methods that are coming down which are single molecule methods This can tell you yes. What's happening for DNA sequence at any particular time But it's not clear how one can do for example chip for histone modification Where the limitations are are are currently technical which is how do you? Get antibody concentration high enough to actually be able to do it chip for example So just having the sequencing capacity doesn't answer that Yes, sorry Sure. Yeah, well there are methods for a single cell methylation for instance But the thing that bothers me is all the animals so work of the recycling at one site You know over minutes you add a chromatin modifier take it off add a by sequence specific site take it off again and That on one hand and the other hand Most of the literature suggests that their births the synthesis such that you see the RNA levels of Housekeeping genes are not constant from time to time or cell to cell So I think what you'd have to do is not only get it in single cells But it get in an array of single cells in order to get the picture but I think Except single cell methods are coming along for analysis Yeah, I'm not sure I have a lot to add other than I mean there are groups working on this And I think it has more potential for histone modifications and such where you probably have you know Multiple events over a localized area versus say transcription factors, which might be you know Single mirrors or a few mirrors and it's clear the efficiency a chip is quite low I Do think it is plausible for you know again I know shrooms groups working on our group working on it for getting this working for the histone modifications, but Just add one thing Again, there are cross-linking is the other issue and I think Gordon Hager has used with isolated a nuclear use laser pulse cross-linking to get very instantaneous pictures So Gary how high throughput are some of these microscopy approaches? Can you also look at you showed HP one that h2av and other histone variants that might be involved here? Yeah, I mean you can you can certainly do the microscopy that the problem is that you're still looking at every You're looking at the blob You're not looking at a particular sequence. It's sort of similar to the question. I asked Susan about about the array You you at least from my perspective you want to understand what's happening at the level of resolution that we can get for Chip seek let's say rather than rather than but I'm not saying the psychology and the the imaging isn't useful I think it's very useful and tells you a lot, but we seem to have this gap and Right between what you can see in the cell and what you can see Biochemically So maybe I think we should probably just move on save it if we have time at the end as people get tired Dave Once you go ahead So my name is Dave McAlpine. I just want to talk to you guys briefly about defining occupancy So occasionally, you know, I'll have a student come to me and they're like alright Dave You know, this is my favorite gene and obviously it's regulated by factor a I'm like well How do you know it's regulated by factor a? They'll say well, I looked in the mod encode browser and there's a black box over a You know fact factor a and you're like, okay, so you got a black box But what does that really mean and then you can look at that in a little bit more depth and you can see The the two peaks here that I've drawn we've got a big peak and a little peak But what is the size of the peak really mean so this brings up another question. All right, we've got a factor bound there but We're talking about diploid genomes right so you've got two genes Is it bounded both alleles or just one allele and oftentimes work from the Snyder group You can use snips to detect differences in these if you have a snip right where your transcription factor is binding But in the case that you don't then you really still don't know whether you're occupied at one allele or the other we also Back to occupancy these are all population-based studies where we do an enrichment and again. What does it mean? Do we have one factor bounded a handful of genes or is it more commonly or distributed and bound to many of The same many copies of the same gene and finally a lot of the talks today eluded on Hot spots. This was a big topic that came up in mod encode and encode as well Very large fraction of the binding sites in the genome are occupied by multiple transcription factors Which often whoops wrong button, which gives you that view there But when you expand this out or is it the alternative that you have individual factors bound at many different loci And these are the things that the chromatin immunoprecipitation the chip seek approaches have not quite Enabled us to resolve quite yet, and there are ideas out there on improving this And I just wanted to bring that up to the panel and to the audience I was just gonna say I'd also I'd also add that it looks like For all of these different events that happen only a tiny fraction of them can even be associated with a simple Change in gene expression if you do something like knock down the transcription factor So even if you set aside hot spots and other things there are still a lot of mystery binding sites that seem to be true Seem to be reproducible, but yet don't directly lead to gene expression. We also have no idea what is going on with those as well So I'm just I'm just thinking out loud here. What what about and this might be totally naive and stupid but what about some kind of tagging of these different cells by Same metogenesis so suppose that we were bombarding a Population of cells leading to different mutations in each of them And then you could you know just like you can tell a little specific activity by looking at you know snips between individuals suppose that then you could you could distinguish whether whatever chip Signal you're getting is in fact coming from Only one many copies thereof or Different variants if they have a high enough mutation rate would what do you guys think is that? Possible I think you'd probably have to you know to get a high enough hit rate to accumulate enough snips in a population That are cell specific that you know affected enough binding sites to be meaningful that would be a Pretty messed up cell I imagine But it doesn't need to survive for that long You know right how do you interpret it at that point? I mean you you you have the sequence once you do the chip seek experiment and then you can tell well Where's the meeting? You know was the binding site mutated and so on and so forth? I mean Anyway, just just one idea starting up the conversation kind of a brain bow for chip Yeah, it would be a cool cool idea if one could pull it off I Mean I think a standard you're raising good questions. I think a standard issue for multiple Proteins binding at once versus individual proteins bound as a do sequential chip right as a it's a way of getting at this On the issue of quantifying these signals and what they really mean I think that's a really important problem And I haven't seen that perfectly addressed in any system and certainly things we've considered but not have not executed would be To compare occupancy with DNAs footprints and such where you try in a quantitative fashion where you can see how much Hopefully project how much is really truly occupied in vivo versus accessible And such that's one way to doing it I also think it's worth doing an experiment that we proposed and our grandpa never got to it seems is Where you might actually should watch what I say I guess I'm fairly honest. I think if we could somehow Set up. It's not a simple experiment, but to set up a truly open region where you put in something like lack repasser with known sites And then actually put this in and get it bound, you know presumably overexpression efficient Amount and then actually see what that chip signal looks like with one site to site Etc where you really reconstruct the site in vivo What makes it complicated is that you actually need to make sure it's open and it's not all occupied with Histones and things like that, but so presumably put bent DNA and things like that around it But to truly recreate a site that you can do quantum quantitative experiments on would be probably a useful thing to do Yeah, I mean I think we are getting close. I mean with the M&A's and DNA's Footprinting the chip exo approaches that you can start to see these specific footprints and binding sites And can you start to resolve? You know hot spots there. What's the you know seed factor that's binding? You know, we are getting there But maybe some of the stuff that Jason Leib is doing looking at the the dynamics and the turnover of specific You know factors with these come competitive experiments as well some maybe one imperfect way to Approach what Manolio was suggesting is to make use of natural diversity. So for instance in a population like the B cells look at the immunoglobulin genes you've got a lot of variation there You know that it differs by cell and so maybe that's one way you could do Sorry, I think I'm not sure the microphone. Did you hear? Oh, so let me well Let me try it again So so Manoli was suggesting that it would be nice to have cells that you knew were the same had the same promoter But where the alleles were Different and so you could mark what was going on and therefore try and disentangle what's happening at the single cell level versus the population level and I'm suggesting you could use immunoglobulin gene promoters and Because the the variable region acts as a as a marker It is your built-in variability in other in other senses the promoters ought to be the same in inverted commas But I think the issue there is that you can use that variability for that locus Not for the whole genome. So yeah I Thanks, so regarding the levels, I mean What about just systematically doing enhancer experiments for Regions that are bound at different levels and then trying to get a functional readout of what the implications of that are in other words, I Mean we're observing these differences in intensity in the binding but Perhaps with some of these large-scale validation techniques we can now start applying them to test, you know All of these different signatures all of which we're calling peaks and basically see if we should be thinking of big peaks and small Peaks and wide peaks and narrow peaks and are you talking about in vitro or like an in vivo titration experiment? Oh, I actually wasn't thinking about sort of a biochemical binding experiment I was actually thinking of a reporter assay where you're asking is that functioning as an enhancer? then that reporter assay can be you know any of the above but I but So basically besides just validating the biochemical activity of what's happening there Asking if whatever signal we're observing in fact has functional ramifications Because because I don't think we even understand that I mean if you see a I mean would you bet I Don't know your cap that a big peak is gonna respond more than a small peak Right, but maybe you have a factor that binds. It's very good at turning on a specific enhancer or promoter But it only binds a very small fraction of the time So I don't know how you separate the biology from the bottom. I mean the downstream output from the input So what you're saying is that in in different low side? Basically that the I mean I could misinterpret your question as to sort of You know in a different context a little bit will matter more and in another context You'll need a lot of the same factor and therefore you need to do the sort of functional experimentation With many different reporters to actually test the importance of the context. Is that what you were saying or something else? I think so So to meet this goal this is really lofty right now, and I don't see that this is something that's achievable Today, but it is important to know that parts of this are addressable I like Mike's net Mike Snyder said if you want to know about any two factors at a time on the same piece of DNA You can do sequential chip. It doesn't work well for all antibodies It's not easy, but it you can get at one part of this problem similarly you bring up the issue of the tall versus short peak and And code project has done a significant amount of work asking are those changes significant and it looks like they are Quantitative doesn't tell you is the occupancy 10% or 50% But it does tell you that a big peak more occupancy. Is it distributed among different cells? Is it uniformly distributed in a population? It doesn't speak to that, but but it gets at that a little bit And finally I'm aware of one publication. Gary Felsenfeld back in 95 did for histone modifications Attempt to come up with quantitative numbers by doing a titration method now again This was ensemble measurement populations of cells, but they could estimate in the population The average histone modification at a site was 80 percent 20 percent again It doesn't speak to is it 20 percent at 100 percent of the cells or is it 100 percent and 20 percent of the cells But there are ways to kind of approach this today I think the problem is that you want to understand how it impacts the biology in the end so Move on to Brent Okay, so I'm gonna talk about some of the the transcriptome features Okay, so So the transcriptome projects in in all the organisms that have been done So the fly the worm and the human have all been really successful and each has led to the discovery of thousands of new genes And Bob did a really good job highlighting all of that But what I'd like to tell you is go over some of the things that we've learned which actually tell you about the things that we don't know So for instance this graph here This is all data from fly but the same principles are true in all the species and so this is like a cumulative Count for how many genes we see expressed over this developmental time course of 30 samples and what you can see is Like you look at this line here It's it's going up and it seems to plateau a bit here but it's actually still going up as we add samples on and this Graph down here is actually indicating how many genes are expressed in each of the different Experimental types we have so this is the developmental time course And if we look at all the tissue culture cell lines the tissues that we've done and treatments samples Every single sample that we look at we discover new genes or we can see the expression of genes that we haven't seen in other samples So so far we haven't saturated things so there's a lot of new genes out there yet to be discovered even though I think all the fly project and the worm and human have all done a good job Another thing in here is in the fly project and the other projects the vast majority of the sequencing data for RNA has Been done on poly a plus RNA so there's all the poly a minus RNA left to discover and so in the fly project We've done a little bit and the difference between these two Lines here is actually how much additional discovery we can make in the poly a minus and we've only done it on 12 samples So there's a lot left. I think in that aspect of the transcriptome to discover So that's just looking at discovery then over here This is looking at actually splicing but the idea is to look at dynamics of gene expression And if we look at tissues we can see that basically there's a lot of splicing changes that that changed dramatically between different tissues But these changes are actually Diminished when you look at whole animals and this is because you have two tissues where the splicing is very different But when you grind up a whole animal it sort of looks like nothing's really happening Okay, so this is the same at the gene expression levels So as we get finer and finer in detail going down to single cells We'll get more and more information about the dynamic So I think this is pinpointing that we need to really get into doing single cell expression analyses to figure this out And the final one down here, which you probably can't see over the heads is really getting at this issue of Connectivity between transcripts So this is this de-scam gene and Drosophila which makes 38,000 isoforms, but the point is it using the data that we have It's impossible to tell whether exons on this end of the transcript are on the same exact molecule as exons on this end And so what we really need is this super long Single molecule sequencing technology. So if for instance Oxford Nanopore actually produces something and it does anywhere near What's advertised things like that might really go a long way towards addressing these issues? Manolis is just faster Is there really any bottom to the number of transcripts, you know with enhanced transcripts into sense and antisense short transcripts around promoters But more specifically is there anything you can say about these poly none poly a RNAs that might be insightful Are they long RNAs? So so a lot of the the poly a-minus RNAs that we did discover were previously unannotated snow RNAs a Lot of them were micro RNA precursors that were in some cases, you know 10 20 kb long And then just a lot of like non non-coating RNA type things that were we don't have any idea what they are Right. I wonder whether you can unmask some groups of them by things like knocking out the stem loop binding protein for histones So that go on to a poly a site, right? Right? Yeah, so then there are the the poly a tail less like histone transcripts and things like that Yeah, thank you. So I think what we need here is the same thing as you know, I was described the other microphone namely Disrupting each of these sites and then asking what do they actually do in other words? Yeah, I mean biology is messy and What makes it so wonderful is that it can cope with how much home I see it is and and that's something that you know Is part of the design principle in a way that it can cope with stuff happening and As you start sequencing deeper and deeper and deeper. I mean you'll find stuff that happens that might not actually Happen for any good reason might not actually have any good function and so on and so forth. I think What we're still lacking is the ability to sort of knock out that weird random or not random but rare you know junction that happens only in one cell out of ten thousand and see what happens and I think that sort of from the discovery to the validation perhaps the next challenge ahead is to Not discover, you know things any further down the rabbit hole but but instead sort of take the ones we already have and make them or see how often they're actually made or sort of localize them extremely precisely or You know disrupt them and see if they have any kind of consequence and so and so forth So I'm wondering if you want to comment on technologies that can do that whether that's even a feasible endeavor to sort of start You know chucking them off on the functionality side of things. Yeah Well, I think that's that's an area where the model organisms in particular are great at You know so you can actually go in and disrupt all these different elements So and then fly, you know recently there's been this The use of back recombineering that Hugo Bellens lab is pioneered that you can now basically change anything to anything else and put It in the precise place in the genome, so it's actually Feasible to go and do these things on a fairly large scale now And fly I think it's more challenging with worms, but it's it's doable as well I wanted to take it in a little different direction So You're your Venn diagram there is interesting When you take cell lines or something you get some transcripts. What happens if you sequence the same thing? Make grow, you know take the second larval stage from fly and Assay that ten times Ten different samples Yeah, well that's good, but we have not done that experiment. We haven't either. Yeah, I don't know I mean, but yeah Yeah, I don't know I mean I don't know what these things that are in tissue culture cell lines only are I mean I mean we know what the list are No, we don't know why they would be I mean some of these are annotated genes, so they're like real genes But yeah, maybe there but how much of this is I don't know biological fluctuation. Yeah Or or methodological Yeah, I don't know I I just wonder where where this where these extra things are coming from whether it's it's it's It's not really that they belong in cell lines. It's just that they They happen to reach your threshold in this in in that sample Yeah, that's absolutely true But they were you know there below any threshold where we would call them as a gene and in the other samples but I think isn't the point even with your Analysis of I mean the point is to ask what's the total capacity of the genome and if you think in an evolutionary context Do you think there must be genes? Sitting there that we don't discover as transcripts because we haven't figured out what conditions would induce them and so you know the more treatments you do the more Challenges you provide then the higher the probability that you'll find things like that But yeah, but there I mean evolution certainly helps if if the if the thing doesn't function well enough or Have enough of a role so that evolution exerts selection to maintain the sequence Then presumably we're not going to see that we're not going to see a function for that in the lab Well, actually a lot I mean one of the features of the new genes and that we discovered in the project is that they're More poorly conserved than the annotated ones. Those are actually positively Selected so the you're you're you're selecting. I mean they're still under selection They're just you're being seen. They're being selected for change, right? Yeah, and so that's also an evolutionary signature Because it's be well you compare it to neutral sites and you show that it's higher It's a K a K s ratio I mean that sex sex is very good for this My question was we know a lot about this transcriptome But what is more than codes vision on the proteome with additional layers of MicroRNA regulation etc. The proteome becomes very con complex So is like mass spectrometry or is there other things that you are thinking about for the proteome Because that's that's where the function lies well Being an RNA person I would say the RNA does a lot of the stuff in the cell But I mean certainly the the discoveries that have been made in the transcriptome has significantly expanded the proteome for sure But you know within the mod and code projects, you know, I'm not aware of any efforts to do any mass spec type stuff The worm Bob Bob did it for the worm. Okay, it just wasn't done for the fly. Yeah, and they're they're Funding for mod and code is finished. So there won't be any from mod and code for proteome, but they're Thanks Partly the words were taken out of my mouth, but even given the known transcriptome There's a lot to be discovered about the proteome, you know There's a discoveries of initiation with near canonical start sites all the translations of 5 prime UTRs and higher cells How these change with physiology even with a constant transcriptome? Yeah, I agree. Yeah, so for instance, I think the ribosome profiling technology would be Really great to use on the model organisms to really look at this There's there's always the post translate tome Okay, and last but probably most important Valerie is Going to discuss what I think ultimately will be the the most important question for the impact of mod and code which is Can people actually get to the data and use it? Yeah, so I was inspired to bring up this topic because I get emails like this one Which I do have permission to put up here Which I hope you can read it says, you know, I'm working in I'm working in the lab of Barbara Conrad on the control of apoptosis We're interested in finding out what factors bind to the eagle one locus and in particular whether say 30 is Is one of these factors? And so I have a list of questions is the eagle one a target of say 30 what factors bind to eagle one Which factors bind to the larger eagle one locus? How can I get a list of all binding sites for say 30? How can I judge their relevance? I'd like to know the answer to that one Do you also have genome-wide information on another factor? How about sess one any information on that and I get these on a you know fairly regular basis and I try to help out and point them to the cool sites that goss brought up but but this this is the kind of questions that a lot of the people out there in the community have and I can't actually do all their analyses for them And so and so I think that there are still some really key issues out there Despite our best efforts, you know that there are still some problems with finding and accessing the right data for a lot of people understanding What kind of analyses have been performed in particular? I think you know scientists like to know What what it means when they mouse over some a peak and they see a Statistical value of some sort and they don't really understand what that statistical test means or what sort of test was even performed is Tricky and then they get suspicious and then they don't know what to do with the data And then interpreting the importance and reliability of individual data points and for instance you know within the consortium we talk a lot about hot spots and And understand that you know in every single binding Event might not have an immediate impact on the expression of that gene But people out in the broader community aren't necessarily thinking about these questions in this way And then also a lot of people would like to do sort of intermediate types of analyses Not not be capable of downloading huge data sets and doing really complex things But sort of do sort of mix and match kind of mid-level types of analyses And I think it's we either enable them to look at individual loci or pull down large data sets But how to do the sort of medium-level analyses is still missing. So anyway, that was that was my point Can I Can I rephrase your question as we need Siri for modern code? In other words, I know actually I think that's part of the problem. It's now like Siri for So in other words what I'm what I'm trying to say is that You have some Level of interaction that can be automated and some other Interaction which would require sort of somebody hiding under the table and pretending to you know be a machine So artificial artificial intelligence. So my question is Twofold so a Can we add some AI so that you know people can sort of interact with it in a fuzzy way and and ask You know human questions and get reasonable answers and perhaps even translate these human questions into Lines of code that we can then give back to them so that they can modify these lines of code to sort of ask more precise questions so that would be one one way of sort of using AI to automate human questions into sort of Code that that people can then run and sort of you know show that and maybe show a set of examples where We can you can take all of the emails that you've received and have goss translate them into lines of code and You know sort of have sort of you know find your the closest example and modified kind of kind of thing the second one is And I don't know if you want to go first then I can continue after this. I'll you're in a role. You're doing good so the second one is Go ahead. I Can keep going So actually, maybe I don't know if there are comments on that. I mean Yeah, essentially, it's we've reached the point where Essentially we need the price of bioinformaticists to go down Or the money to pay for them to go up and Because I think that without this I think that's a very good idea But I don't know anything about the field of artificial intelligence and whether or not that's actually doable It is doable. I think goss in fact and Collaborators have made really good advances really great advances in terms of the new Formats that bring us closer to that, but you know, it's a huge problem out there in the NIH universe where unless you have bioinformaticists, you're basically in deep trouble because You can't interpret this stuff. I mean instead of having Valerie answer these questions I mean sort of having an army of you know sort of Kind of like a call center over in yeah, you know somewhere else Yeah Well only at certain times a day I would certainly support I would certainly support the AI and the call center and the more money for bioinformaticists and some more money for fly Biologists as well would be great So I just wanted to I guess Amplify the point that you were making and you're trying to solve which is we've gone into modern code multiple times and Because I know people in the field who are in modern code I've then called them up and said could you please explain to me This simple question, which is I've got a promoter what binds to it the sort of thing and they've walked us through it Or they've just gone ahead and done it for us So in addition to amplifying that point It would certainly be helpful if you just there's two basic things that non transcription people Generally query which is I have a promoter what binds to it or I have a transcription factor. What is it bind and If there was it just a simple shell way of walking into that question You've probably solved 70 to 80 percent of the queries into modern code Just being able to do that and I understand there's lots of caveats and so on and you can find ways to put that in But if you just pluck a random worm or fly biologists off the street and say, okay, take your favorite gene Here's modern code. I'm not going to tell you anything Go answer a question that you've made up. I think that would be very helpful Because it really is quite difficult and it's intimidating to try to get at it really simple questions like that so I get a lot of these kinds of emails also and But I seem to be getting in less of them and I was wondering if that's other people are having the same kind of Reaction because I think people are getting better at at digging data out of modern code So that's kind of an open question. I guess for the panel. Maybe maybe what kind of answers have been giving them I usually say you should email Gary I think it's the opposite I think as more I think it's more people are now accessing it And that's certainly true. There's more problems and I think also some people just give up Well, I think that a lot of the a lot of the ways of looking at this and a lot of these kind of questions, you know Really there if they're just interested in their particular gene They they really need to to go to the browser That's probably the best way to look at that and the kind of things they're in intermine for dealing with lists Is just fantastic, you know But the problem that we're facing now with modern code winding down is that all this is going to go static and it's going to go into Clouds and at least in some places that I know of you know getting the IT people to let you Actually access some of those sites is difficult so the I Would I would really hope that as encode goes forward and this is directed. I guess to NHG or I Until 2016 at least that they really make a strong effort to To absorb the modern code data and to keep it alive and to keep you know improving the the the the tools for people to access the data and to analyze it, so I think that You know modern code kind of started as Producing data and building a very complicated Data infrastructure at the same time and you know, it's like trying you know You don't normally build an airplane, you know at 30,000 feet, which is kind of the way things happened and now it's actually I think in in pretty good shape and and people in my opinion are having an easier time getting the data out and so if we could just You know make sure that that kind of continues to improve a little bit I think a lot of these kinds of difficulties will will go away Yeah, I am just to say I've also had plenty of people come up to me and and talk about how they have been able to use the data It's not like a you know It doesn't happen a lot So so there are plenty of people who figure that out and I think with increasing familiarity with the existing Databases and and better incorporation into worm base and fly base that people will figure out more and more how to use it And at least get some utility out of it For us it was really opaque and even looking at a browser was difficult So if you want to hear what it's like to be a clueless wet bench scientist trying to tackle this stuff I'm happy to fill you in and and one aspect is having that sort of You know email is one way but really being able to talk to someone So I wonder you know if it were possible to have say a workshop at let's say the worm meeting or the fly meeting or various things Like that I think it would be packed because I bet there are a lot of people Who would love to be able to do this and if you walk them through it and you could talk to whoever was explaining things You could begin to disseminate that a little bit and I mean of course would be ideal But that's probably not possible But if you could even have sort of some of these simple questions addressed at workshops in in meetings or something like that I think that might be really useful So fly base does that and I think the DCC was at the last fly meeting At the modern code workshop it is a very effective way I think as Brian pointed out I think that a lot of this is migrating all of this is migrating to fly base and that and people are gonna end up Using the browser to go through there So I think at least on the fly side that'll happen. I assume worm base it because it's really excellent will be Following suit so okay. We have time for two more and Manolis will have to do it over dinner So First one Sure Just really quick I think it's very valuable that the way Valerie put this piece of data this real question Up in front of people and I was wondering if there's any way people can think of kind of collecting a lot of these questions and You know sort of tabulating them together to get some sense of what are the really common questions that people have in terms of the data resource and ways of thinking about Prioritizing you know building of a tool or doing some middle-level analysis I mean people talk about this a lot, but it'd be nice to get some data You know in terms of what what are the mid-level analysis that really you know the majority people want and I I'm just putting it out there to think about I Think that's a great idea I think part of the problem has been that some of it goes to the DCC some it goes to individual investigators and in there Has yet to be a real way to conglomerate it. Yeah, I'd like to address to issue understanding what analysis have been performed for collecting data and Probably like question Will it be as a solution if If you will have centralized data storage for Methods for tools when customer will be able to log in through web interface Check all steps of analysis if required change some parameters and rerun tools So are you Suggesting that in the in the web databases that there be a way to sort of reiteratively Run an analysis a query over and over again. Yes when all tools will be represented in the form some Boxes when you can check all parameters algorithms, which you were used or was used and Either rerun on your own data. We're gonna let goss handle this one. I Wasn't gonna handle that question But you did something towards the you you mentioned that you can save at least certain Configurations of tracks you can save configurations of tracks and in mod mind you can save lists and queries and things like that I just felt I had to say something seems this is about data access and Brian was encouraged me to point out that Actually, we have these things called template searches and the idea of a template searches We try to guess what people want to do and then write a little page Which is really easy to fill out where you can paste your gene in or your gene list in and actually there's a template that answers several of those questions and we've kind of somehow in spite of standing up and giving lots of talks at fly meeting were meeting tutorials and this kind of people don't Listen and we haven't had We haven't had as that many questions come through help up modern code as we expected And it's interesting that it's been picking up quite a lot in the last few months and we do get some of questions like that We also get ones like I'm studying this gene and I'd like to design primers for PCRing it. Can you do it for me? It's just fantastic and So so when you know one practical suggestion that we could have if we'd have this conversation a year earlier Would be make sure you forward all your questions to the help Because that then helps us to know what the common questions are and then try to make maybe make the Really simple questions easier to answer Can I also answer the other question that was asked So right now there is an effort to put up an instance of galaxy on to What will be the permanent housing on Amazon on the Amazon cloud so that you can then write your own pipelines and be able to run them over and over again now in terms of the Automatic population of whatever the parameters were for all of the tools that were done on the analysis of the existing data now That won't be pre-populated into the galaxy instance But certainly we try to collect as many of those parameters as possible and they're annotated in the protocols that are in the wiki So it's accessible, but there's a lot of it I just want to make a quick comment and it contains the worst legacy of modern code in it What's that? all right quick comments, and you don't have to answer it, but Just like we're thinking maybe the data will become obsolete very soon because everybody has or will have very soon the capability of Generating modern code scale data, which has been amazing for 2010 and 2011 2012 But perhaps not for 13 14 and 15 perhaps one of the legacies of modern code Should be to actually educate people to use this type of genomic data And I think the effort should be placed now not just in generating additional data of the same type But also in educating everyone who generates these type of data to integrate it with the existing resources and To use these type of resources any kind of project and I think sort of funding the kind of computational and sort of also Educational efforts for using these type of genomic data sets could be part of what's unique to modern code Rather than just the data generation aspect, which I think has been democratized Thanks to the entire panel and especially what's left of the audience And we'll solve all these problems tomorrow exactly so I want to thank again panel members as well as all the speakers in the session We're going to start tomorrow morning probably at 8 30 and for anyone who's going to dinner the PIs Please just meet at the front of the room. We want to talk about logistics very quickly. Thank you