 Okay, we're going to move on to our second presentation. This is, yeah, Jennifer, you want to come up? This is a workshop report on a meeting titled Capturing RNA Sequences and Transcript Diversity. There's going to be two presenters. The first is Jennifer Strasburger, Program Director in the Division of Genomic Science, and then we'll bring in Brent Gravely, former Councilmember, to do the other half of the presentation. Good afternoon, everybody. To use a baseball analogy, we're now at the bottom of the ninth for today's presentations. So, Mike Smith, Katie Bardsley, and I, along with colleagues from NIHS, were involved in organizing a scientific workshop that took place in late May of 2022, the title of which is on this slide. So in starting off the meeting, both Dr. Green and Dr. Rick Wojcik, who is the director of NIHS, gave introductory remarks. And in Dr. Green's introductory remarks, he described NHGRI's 2020 strategic vision, which represents NHGRI's priorities and agenda for genomics research over the next decade. Under this, in one of the, excuse me, in one of the building blocks of the strategic vision document is building a robust foundation for genomics research, under which falls the umbrella genome structure and function. So under this genome structure and function umbrella falls research on genome-wide and genome-scale elements that relate to the diversity, fate, and function of RNA molecules, including splice and transcription variants, and modified bases. So Dr. Green stressed that NHGRI recognized the importance of assessing the genome beyond specific DNA sequence information to also include a focus on RNA. So again, under the broad umbrella of genome structure and function, NHGRI's interest in RNA include, but is not limited to the areas on this slide. The examples here are not comprehensive, but they reflect some topics where NHGRI has the greatest interest. This is an evolving list that may grow as the recommendations from the workshop are considered. So in many of these represent partnership opportunities. So less than a year after the strategic vision document was published, there was a call from the RNA community articulated in this Nature Genetics commentary to develop technologies to sequence full-length RNA directly with all of its modifications. This technological need is becoming more urgent because of the increasing number of chemical modifications to RNA bases known as epitranscriptomic modifications that have been discovered. So the publication cites the number of modifications over 140. So defects in the most widely studied modifications, such as N6 methyl adenosine, or the proteins that deposit, read, or erase these modifications are thought to account for over 100 human diseases, including many cancers, neuropsychiatric disorders, and metabolic diseases. So then based upon the strategic vision and the call from the community, we started working with NIHS on this workshop. The goal of which is to determine the current capabilities, needs, and prospects for comprehensive characterization and understanding of the true diversity of all RNAs and their modifications at a chemical and structural level in relation to normal and disease states. As can be seen on the slide, there were many individuals involved in organizing the workshop. In particular, I would like to acknowledge our executive committee members Vivian Chung, Peter Dodone, Brenton Gravely, and Blanton Tolbert, who provided key feedback on the scientific direction and organization of the workshop. And with that, I'd like to introduce Dr. Gravely from UConn Health, who was, again, a workshop executive committee member, and who will further discuss the workshop as well as its objectives. Okay. And here he is. Okay, great. Thank you very much, Jennifer. So it's good to be back to council. It'd be nice to be there in person, but this works out as well. So yeah, so I will be happy to sort of give a brief summary of some of the highlights from the conference that we had. There's a lot more information than some of the written material that's available on the NHGRI website for today's council meeting. But so what we started out with was a keynote address from Ana Pyle from Yale, who gave a sort of overview of the state of the field to sort of set the stage for the rest of the entire meeting. So these are just a few of the things that she highlighted that we needed to keep in mind. So one is that most of the RNAs that are expressed from the cell are very long. So over a KB, they're fragile and that they can be degraded. And a lot of them are low abundance. In fact, one of the things about RNA is you have a huge dynamic range in the abundance of RNA molecules and getting the access to the ones that are really low abundance is very challenging. There's a lot of functionally relevant information within RNAs that's not actually encoded by the sequence. And so what's primarily meant by that is the RNA structures that these molecules can form. But also, as Jennifer alluded to, is all the RNA modifications, since those are not necessarily encoded in the genome. So there's a lot of information that you can't just predict based on the genomic sequence of these molecules. One of the really complicating factors in the field is that most of the sequencing is done with short reads using bulk samples and inefficient enzymes. And if there are technologies that do long reads, single cells as well, but those have a lot of issues. So one of the biggest problems with long read sequencing is getting sequencing deep enough that you can start detecting the low abundance molecules. With the single cell data, the vast majority of that, with few exceptions, is all three prime end sequencing. So you don't actually get the full length sequence of the molecule, just the UTR, usually. And one of the things that really complicates all this is you have to make libraries for all of these. And the enzymes that are utilized to make these libraries are often very inefficient. So RNA ligases are not ultra efficient in things like that. So for instance, in single cell data, actually capturing the RNA molecules by the polyA tail is also incredibly inefficient. So it's really hard to actually sequence the low abundance molecule that way. And as I said, it's really difficult to detect the modifications in a transcript unwide and certainly knowing the stoichiometry of these is really challenging. And then finally, we really need some machine learning approaches and methods to interpret and manage the data in this field. So if you could go to the next slide. So the objectives of the workshop were fourfold, really. One was to identify technologies needed to comprehensively characterize RNA in the cell. And when I say this, this does not mean we're supposed to pinpoint exactly which technologies, but more just in a very vague sense, like what types of technologies would we like to see to be able to characterize these? We need to determine the types of informatics and bioinformatic resources that are needed, considering the steps needed to facilitate the rapid adoption of these technologies. Like if we have new technologies, how do we get these out into the community so that people can start using them? And then, how do we best incorporate public outreach and workforce development? So next slide, please. Okay, so the process for this meeting, we had a series. So there were keynote lectures. Then we had six or seven different sessions after the keynote session. Each of those sessions had a lead speaker. And then there would be a long discussion. And then we would typically have a breakout session with either two to three different breakout groups that would then talk about different aspects of what the main speaker talked about. And then we would come back together and give reports on these breakout sessions. So at the end of the workshop, the key takeaways were synthesized by the planning team reviewed by the executive committee. And these are all uploaded onto the executive summary for GDOM.gov for today's council session. So there's a short two-page executive summary. And there's a very long 20-page document that you can read that highlights every single talk that was generated. So then the link is below on this slide. Next slide, please. So there were six priority areas that we discussed throughout the meeting. So these were diversity, inclusion, and training. And a lot of that was not necessarily specific to the RNA field. These are sort of common themes in genomics and science in general. Another one was technology development is going to be a key aspect of this. Developing biological standards and centralized resources is going to be very important for making sure that the quality of the science is uniform throughout the entire field. A lot of talk about bioinformatics, computation, and data science. And then some discussions also on clinical implementations and then the environmental exposure and stress. The next slide, please. OK, so one of the sessions really talked about diversity, inclusion, and training. So again, these are issues that are not restricted to just the RNA field, obviously. But one thing that we thought would be useful is just having additional funding at the cross-career stage to promote diversity. So these could be funding undergraduate students as well as graduate and postdocs or even late-stage career technicians, so to speak. We thought that having outreach components that are similar to those that are done in some of the large NIH-funded consortium projects. So for instance, the ENCODE project has user group meetings every year, at least pre-pandemic, where they would go to meetings and hold sessions for people to learn how to use ENCODE data. So we imagine that if there's some sort of RNA group that they can do similar types of things. We think supporting undergraduate and graduate training is really important. One of the nice things about the RNA field, especially with, for instance, the nanopore sequencers, there's a very low cost to entry and to start using these. So these are actually sequencers that could be utilized in undergraduate classrooms. And so there may be ways to support that. An increase in public outreach and then some interdisciplinary, virtual and in-person training could be done to teach both the wet lab and the dry lab components and try to make these things accessible to people at all levels of their careers. So next slide. Okay, so technology development. So again, these are sort of vague things. We're not supposed to invent new technology at this meeting, but just kind of come up with the general framework. But the field really feels like there is a need for new technologies for full-length direct RNA analysis. And there are some that exist now, but the vast majority of technologies that exist require you to first convert the RNA into a cDNA molecule and then sequence it. And by doing so, you lose almost all of the modification information that way. And then a lot of the reverse transcriptases can't actually make it to the end of the molecule. So you often don't get actually full-length libraries. So we need new technologies to actually directly sequence RNA without making a cDNA copy. So we also felt that we need new mass spec capabilities for doing some of this. Another thing is having focused experts on efforts to harness the full capacity of enzymes to be used for RNA analysis. So as I said, for instance, RNA ligases are not terribly efficient. So you can imagine trying to engineer new RNA ligases that work much better or mine them from the myriad of microbes that exist. And not just RNA ligases, but a whole slew of RNA modification enzymes that would be useful for the field. In order to sort of create standards in the field, it'd be great to have libraries of long synthetic RNAs with specific modifications. So this is really important for the sequencing if we're going to be doing direct RNA sequencing because, for instance, the nanopore sequencers simultaneously sense multiple bases within the RNA sequence at a time when it's in the middle of a pore. And while your modification may be in the middle of that five or six nucleotide sequence, the sequences on either side of that influence the conductance of the ions to the channel. And so the sequence contexts of all of these things really matter. So you actually have to make synthetic RNAs for each modification with all possible sequence contexts around it. So this is actually a very complicated issue to create. And then we are also interested in detecting, locating and analyzing interactions between the DNA, RNA proteins and small molecules. So next slide, please. Yes, so the biological standards and centralized resources. So we think we need sort of gold standard references of synthetic and cellular isoforms with known chemical modifications. So you could have these basically synthetic sets of RNAs that could be shipped around to people so they would know what the answer should be and they can calibrate sequencers and analysis tools to these. We thought there might be some value in having a centralized resource of bank tissues and cell lines that virus trains as well that could then be shipped around for people to calibrate things as well. And another thing we thought might be useful was to create a consortium or a coordinated center directed efforts to simulate collaborations in the RNA field. And sort of the type of thing that we had in mind is the NHGRI funds these technology development grants and there's a whole group of them and a technology development coordinating center. So there are monthly meetings with all the funded groups. And then there's actually a pool of money to fund collaborations between those investigators as well. So having some sort of coordinated center that might facilitate those types of things. And then finally we thought there could be utilization for a centralized resource for direct RNA sequencing. So oftentimes people have valuable samples but they don't have the tools or the know-how to actually do the wet lab component. So they could imagine having being able to send these out to get sequenced at a centralized facility. So next slide. OK, and then with regards to the computational aspects, we felt that there are a plethora of RNA databases that exist in the world. But none of them really conglomerates all of the type of information. So there's so many different types of RNAs. There's, for instance, a micro RNA database and a circular RNA database. But there's nothing really that brings all of these together. And certainly nothing that does that that includes both normal and disease tissue. So one sort of RNA database would be very useful. Having standardized nomenclature is really key for people that work on splicing. For instance, there's a type of splicing where you have an exon that's either included or skipped. And these are alternately called skipped exons, cassette exons, included exons. There's like millions and millions of names of these things. So just having standardized nomenclature that people utilize so we know what we're actually talking about would be incredibly useful. Same for file formats, pipelines, and softwares. We definitely need more machine learning and AI-based tools and RNA training sets. So again, if you had sort of defined a set of RNA sequence files that people could download so they know what the results are for the analyzing, then they can use those to make sure that their data analysis pipelines are working properly. And then finally, there's definitely a need for doing RNA secondary structure prediction that considers chemical modifications. So there's a lot of really great secondary structure prediction software out there. But very few of them consider chemical modifications. And certainly none of them allow you to include the 140 or so modifications that exist in nature. So there's a lot of work to be done in that particular area. Next slide, please. And then with regards to the environmental aspects, we definitely need tools and regions to study how the environment alters the modifications or expression or splicing of RNA in normal states and diseases to look at this in a temporal and spatial way. And then ways to interrogate the impact of stress and RNA relative to signaling RNA protein interactions and trafficking. And this is something that would need to be very carefully considered in the sense that there's millions of different types of environmental exposures that you can imagine. So this is a very open-ended area. So I think there would have to be a lot of thought as far as exactly what specific areas that were worked on in this case. Next slide, please. And then as far as clinical, we think there's a need for having small molecules libraries that target RNA structures that could be mined by researchers. There's a lot of RNA modifying enzymes that could be potential drug targets that can be looked into. Delivering technologies to target specific tissues and cell types is definitely something that would need to be looked into. Can imagine developing new therapeutic pipelines and then having high-grade oligonucleotide discovery and synthesis with GOP manufacturing for clinical studies. So next slide, please. So just to sort of summarize, at the end of the meeting, we really thought that there were four main areas that there are lots of opportunities in. So I think one of them is definitely direct RNA sequencing with modification and identification and doing this on full-length molecules so you can look at which modifications exist at the five prime end of the message and those which are the three prime end so you can get the whole connectivity of that. We want to look in the broader arena of structure, including RNA binding proteins and biology. We think there's a lot of potential for RNA as a therapeutic and drug target. And one of the goals we can imagine is developing capabilities to obtain a truly comprehensive view of a transcriptome. And one of the problems, there's all these great projects out there, GTechs and code, things like that, that have done lots and lots of RNA sequencing, but they're not necessarily comprehensive due to the technologies that have existed when those projects have been carried out. So getting truly full-length transcripts that are comprehensive to capture all of the different diversity of RNA molecules and looking at the modifications, splice isoforms, et cetera, I think would be really transformative in understanding the transcriptome. Next slide, I think that was the last one. Yes, so happy to take any questions. I think both Jennifer and I would be happy to answer them. OK, thank you both for that presentation. Questions? Rich, start us off, and then Howard, you're next. Yeah, Bryn, that was a great overview. And probably more questions arose on your overview than I had anticipated in a sense. Could you give a perspective on what are the priorities that the field needs? I mean, it sounds like there are so many things the field needs that it's almost impossible to get a feel for what comes first. And in many ways, it sounds like we really don't know much about the true structure of RNA, that it's almost like the tip of the iceberg that we see above the water. And there's a whole lot underneath, but I don't know exactly how much underneath there is. Would you say that the evolving technologies are needed to really get a better understanding of RNA itself as like a key feature that needs to be done first or something else? Make sure. Oh, yeah, I'm still on. Yeah, so those are all really good questions. I mean, there is a lot to do. I'd say we do know a lot about RNA and RNA structure and things like that. But there's a huge amount that we don't know. A lot remains to be done. So I think having really good direct RNA sequencing would help considerably. I mean, there are some direct RNA sequencing platforms like NANA4 that exist right now. But you can only really detect like one or two modifications. There are 138 of them. We don't know what they look like in the sequencer. So those all have to be figured out. And getting really, really long ones is difficult. But one of the other things is the depth. If you're going to sequence actin and tubulin and gap deation and things like that all the time. And a lot of what you're interested in is less than that. And then you have poly-dentalated RNAs and it's kind of a mess. So I think just having a lot of people working on developing new technologies I think would help. I don't know if that answered your question, probably not. Well, I mean, it's helpful to know what the status of the field is and what the people in the field feel are the critical points that could be prioritized. Yeah, it's kind of the way I've been thinking about this. The goals of this would be kind of like HG37 to the telomere to telomere. I think that's kind of where we need to take the transcriptome analysis going from where it is today to something the transcriptome equivalent of the telomere to telomere in the pantheon. Howard, did you have a question? I may have misread your Stanford symbol as a hand up. I do, in fact. Yeah, so if I may go, is that all right? Yes, please go ahead. Hey, Brent, thank you for that summary. And I appreciate the presentation. I think this is a very timely and also very important topic aligned with the NHGRI's mission. I think there's really a confluence of basically the recognition of the need and also emerging technologies that make it possible to think about kind of taking on this challenge. You mentioned kind of the advent of the nanopore sequencing and also that you could directly read the RNA basis. And we know also there are two sex programs focused on RNA modification. So there are definitely additional chemical tools being developed. I kind of have maybe two comments, maybe I would like to hear your answer. First is that it seems that maybe even more should be, emphasis should be placed on RNA secondary structure because this is something that we can already get at and a genome scale with some of the existing technologies. And also that may be one of the simplest way to think about the biological consequence of an RNA chemical modification is how does it affect canonical base pairing, right? So even without knowing anything else like that, something is pretty straightforward to define and they can immediately match impacts on tRNA base pairing decoding, impacts on RNA single-strand or double-strandedness and therefore RNA stability. The second comment is that I know there's a very strong human focus, but maybe perhaps looking at model organisms according to microbial systems would be, model systems would be helpful because you have a lot more unusual basis if you, and I'd have much higher concentration in some of these other model organisms. Yes, so yeah, thank you for those comments. I agree with both of those. I think that during the meeting, there was actually a lot of discussion of RNA secondary structure and prediction and technologies. So some of that is diluted in the five minute distillation of both the meeting. But yes, I think that's a big priority and what both experimental and computational methods to look at that, and I think not just short range but long range, some of the type of techniques that you've developed in your lab to look at that, I think would be really useful. And I agree, I personally don't think that this should be restricted to humans. I think there's a lot of value in looking at other species, including microbes. Yes, Nancy, go ahead please. So RNA seek is such a rich, yields such rich information. It's just been heartbreaking not to see it move faster into clinical use, but one of the key challenges is unlike with the genome sequence where we could do research level, genome sequence and then confirm individual variants with CLIA sequencing, there's not really such an obvious way to deal with that with RNA. So what do you think are the hardest challenges with moving RNA seek into more clinical environment given how rich a biomarker it is? Well, that's a very good question. Yeah, I mean, to my knowledge, there's no CLIA certified RNA sequencing that goes on. So I don't know why it couldn't though. I think the greatest variability is gonna be really in sample collection. I think you can pretty much the rest of it standardized. But I mean, certainly for a lot of diseases, having both the genome sequence and the RNA is just extremely valuable in a diagnosis. I think that would be a great area to push forward on this as well. How leads? So thank you for this, that interesting presentation. This may seem trivial, but was there any discussion about the need for a single authoritative governing body regarding RNA nomenclature and coordinates? I can't even begin to express the amount of frustration when in VT and Nomad and ClinGen all use a different reference transcript and how that leads to anxiety and again, misinformation being shared with patients. There's really a desperate need for this. I'm wondering if that was considered at the meeting. That's a good question. That was not discussed in any of the sessions that I was in, but that might have been in one of the breakout sessions that I didn't participate in. I don't know if Jennifer knows, but I totally agree with you. Similarly, for whatever gene you're interested in, if you say look at axon seven, Howard may think a different axon is axon seven, and it's very complicated to align these things. So I agree that we would need this. So I don't know if Jennifer can remember if that was actually discussed or not. It was discussed. It didn't become the main focus of any, it was kind of a peripheral discussion, I think, and it was touched upon. It was touched upon as important. I mean, I think that the group, the attendees and participants in the workshop certainly did realize that a common glossary to help understanding across the field would definitely be helpful. Judy, go ahead. Yeah, kind of one of the, you alluded early on to some of the problems with the single cell transcriptome, kind of three prime bias, very sparse, very incomplete. But one of the advantages of single cell is that kind of there's gonna be an avalanche of nuclear transcriptomes. And so kind of, I'm sure you didn't mean to imply this at all, but it's a very dynamic and part of the problems with these references is it's so dynamic and it's so context specific. And so kind of capturing RNA velocity, the dynamics of RNA relative to protein expression, which is the advantage to this is it scales, but the protein expression is the biology. So there's a lot of complexity. So developing a uniform reference, I think, is challenging and kind of it's just so dynamic in the biology. Yeah, I would agree with that entirely too. I mean, you might be able to create a uniform reference for like a particular cell line, grown under these conditions. But yeah, certainly once you have grown in different labs and if you're certainly using tissues and all that, they're very different. But having a really good understanding of all the transcripts that can be made from genome would be incredibly useful. Go ahead. I think this may be more relevant for Jennifer, who might be keeping hands on us or maybe others who were involved in this workshop. So we jointly held this workshop with NIEHS. Correct. Has there yet been other interest either at the workshop or subsequent of other institutes who might be interested in pursuing something with us? I mean, I know NIEHS is enthusiastic. I'm just curious if there's a coalition forming beyond the core two institutes that hosted this. We really haven't pursued other institutes in any further activities going forward. However, there was a lot of interest in, and much, many people from other institutes attended the workshop. So other program, other extramural program directors did attend and seemed engaged and... Absolutely. Okay. And as well as there were representatives from NSF, from CDC, from NIST there as well. So I think there was a lot of interest. So maybe I'm the eternal optimist, but boy, you know, and it's not inconceivable. We would conclude that there's so much trans-NIEH interest that we could go talk to the Common Fund or think of other trans-NIEH models, maybe even multi-agency models, because obviously the bigger the coalition, the more ambitious we can be. So that's gratifying to hear. However, it might take longer to launch. So if you try to get into the, I mean the Common Fund queue, it's probably 25 at the earliest. If you start getting with other federal agencies, it's probably fiscal year 30. No, I'm just kidding. But it's gonna take, it's just gonna be harder, but it might be worth it. Other questions for Brent or Jennifer? Okay, Brent, it was good to see you again. Jennifer, thank you for a great presentation. Yes, likewise. Good to see everyone. All right.