 All right, folks, it's good to see everyone. Thank you so much for coming today. My name is Steven Moss. I'm a senior program officer and the board on life sciences, the national academies. And I just want to welcome you to the workshop that we have that we have here today. As you're aware, this is a part of a larger study related to RNA modifications, and we're really excited to get the opportunity to learn a lot more about this area, building on some of the important work that's been happening. I'm going to start with the NIH NSF and other organizations. So I'm going to start with some introductory remarks and then I'll pretty immediately pass it off to Brenda Bass, who is one of the co chairs of the study committee who will give us some further opening remarks. There we go. Yes, I was keeping background. To start, we have a land acknowledgement related to the National Academies. We do acknowledge the National Academies is physically housed on the traditional land of the Nakhachchuk, Anacostin and Piscataway peoples past and present. We honor with gratitude the land itself and the people who have steward it through generations. We honor and respect the enduring relationship that exists between these peoples and the nations in this land. We thank them for their resilience in protecting this land and aspire to uphold our responsibilities to their example. So I want to talk a little bit about the National Academies to start. As we kind of mentioned, we have two, we have this is part of a larger activity. We do several types of activities. One is consensus studies, which is what this larger activity is a part of. This is gathering groups of groups of experts to put together reports. And then we also do a lot of workshops, forums, roundtables, other types of convening activities. So today we're having a workshop to help learn, help the community get together and learn more about this area and form some of the parts of the consensus study. And so at this particular juncture, the two activities are connected. Sometimes they are separate. It all depends on kind of how we're operating, but we're excited to kind of move forward in this area. As I've said, about five times already, we are, we are part of a larger consensus study. This is just a clip from the website itself. I want to thank the Warren Alpert Foundation and the National Institutes of Health who are our sponsors of the study itself and the workshop today. We have a lot of background and previous reports and workshops that have similar topic areas and have helped to feed into the study that we're doing here today. The first that's a good mention, a good thing to mention is that the National Academies came out with a report about mapping and sequencing the human genome before the human genome project started. And then there has of course been a ton of work sets in subsequent to that. And I'm highlighting only four other four reports here toward precision medicine, which was a big effort related to omics and understanding the basic science needed to move into a realm of precision medicine. And then we had a precision medicine round table that was formed that did a lot of workshops and other areas of extra other areas of interest. And then more recently back in 2020 actually in this room right before the pandemic started, there was a workshop on next steps for functional genomics, where we talked a lot about multi omics and different areas that were emerging funded by NSF. So I want to quickly acknowledge the study committee that is putting together the consensus report, and it was also responsible for playing helping to plan this workshop, chaired by Brenda Bass and talk to our TJ Ha. We're really glad to have these folks helping us out. And then I also want to thank and acknowledge the staff, the National Academy staff who has been helping out with this project. Most of them are here today, and they're, they've been really instrumental in making sure that everything runs smoothly as this project goes along. A couple housekeeping items. We encourage you to be an active participant in today's discussion. You know, take advantage of your breaks take advantage of question and answer. We really encourage that there's an operative there's two ways to ask questions. In person, there is a microphone. So please, if you are interested in doing that, we encourage it. There's also Slido, which you can use on any of your devices. This is for virtual and in person folks, the way Slido will work is that once you are logged on and you type in your question. We'll have the opportunity folks who are moderating the Q&A panel will have the opportunity to see all the questions. You're actively on Slido, you can actually up and down not down vote you can up vote questions so that popular questions come to the top and are easier to ask. So I'm going to stay here for one more second while I see people using their phones. The QR code. If you need this information it's also in the agenda. The QR code isn't in the agenda but you should pretty easily be able to access the Slido by just going to the website and then typing in the code that's available so again, if you missed this, and you want to go back to it it's on the top of your agenda. And I'm going to go forward now. Here are our housekeeping items technical issues. My colleague Nam, who's up at the front helping with technical issues as we speak. It will be available and also for those who are online. Please email or message them. There's also other folks who can help you online. Other housekeeping comments and ideas made during the workshop should be attributed to individual speakers and not their organizations, unless otherwise stated. This includes the National Academy so thought shared during the course of the workshop should not be interpreted as the opinion of the National Academy of Science, Engineering or Medicine, and I'll specify or the committee who's putting together this study. The workshop will be recorded and will be posted after the workshop is complete. And there will be a summary available. Oh sorry I forgot to take that part out but there will be a summary available there is no graphic recording for this workshop. I forgot to edit that harassment and bullying will not be tolerated. Please be respectful of all panelists speakers and fellow participants. And with that, I will pass it over to Brenda. Thanks Steven, and Steven and all the staff here are just wonderful and have been very helpful to us and I, I'm really excited to see all of you in this room and the people on that are attending virtually. Thank you so many of you and I know that you have a lot of expertise that are going is going to be helpful to us so please don't hold back. Tell us what you're thinking and your opinions. So let's see, can I, do I have control over this somehow. Okay, I'm going to stand over here. Can I see this. What was that. Oh, I have. Oh, but then I might need my glasses. So, I, so I actually changed a few words in this first sentence on this slide, and the committee is charged with trying to map and sequence modifications. I think that that is sort of only part of it and that the ultimate task is to develop ways to an evaluate whether this should be done to start at one end of an RNA, a single transcript sequence the whole thing, and get all of the modifications in that single transcript. We know now that we can map and sequence short regions we can find in an ensemble of RNA where the modifications are, but that doesn't tell you who the partners are in one single transcript. So that is the ultimate goal. And we need to understand is there a scientific need for this. So where are we right now. What are the current methods. What are their limitations. What are the state of the current RNA databases where are we going to store all this all of this information information. What are the challenges related to using the outcome of such a study for scientific clinical and public health needs. What are the computational needs. What are the data ecosystems that we need to consider. What about policy related to this task. How will it affect workforce. How in the world would you set up the infrastructure to do such a huge project and you probably you saw on that last slide that that some people make an analogy to this project to the human genome project, and where when we started the human genome project it they had none of the techniques that now allow us to sit down at our desk and actually pull up our favorite gene and know its sequence. So someday should we have this for RNA and its modifications. And of course we envisioned that such a task would involve new technologies. So in. So this I am not going to go through this because I have essentially said all this. So today what you're going to hear, we're going to start off with a talk just to set the stage. What are modifications where do we find them what are some of the techniques, what are their functions and overview talk. And then we're going to have during the day a panel on direct sequencing technologies which we can't there every now and then. And then this development of all of the nucleotide standards is is a really, really important thing to consider. What we're talking about here are modified nucleotides in the context of a certain sequence, and maybe a repository that you might put these standard. So everybody can use these standards. So they understand what the signal would be, for example, in a direct sequencing experiment. The standards we're going to learn about today. I were we're also going to hear a little bit from people who have been involved in such large scale collaborations, and there's a lot of challenges to getting something like this and organizing how it's going to be done and we're going to try to learn from some of these people who have done this already. So what should be, what should we be really concerned about. Again, should we do this, what are going to be the pitfalls, if it's decided to, if we embark on this project. Tomorrow we're going to have we're going to continue with some talks on different RNA applications and their impact packs related to biotechnology and disease. We're going to have some sessions that are talking about methodologies that currently are really not applied to finding and sequencing RNA and its modifications, but maybe they, they should be used so we'll consider that. And I think now I turn it over to my wonderful co-chair, TJ Hawke. No. Say that again. Oh, okay. Lydia, please come up. Did we switch because that's okay. It's good to tell me. Is Lydia. Oh, Lydia's virtual. Good morning. Good morning, Brenda. Good morning, everyone. So I hope you can hear me okay. My name is Lydia Contreras. I'm a professor here at the University of Texas in Austin and chemical engineering. My lab works on regulatory RNAs, particularly in stress responses. We have been really interested in mechanisms of RNA oxidation, as well as an understanding RNA protein recognition of modified RNAs. So today, it is my pleasure to introduce our first speaker for our workshop, Dr. Fred Tyson. So Dr. Fred Tyson has really been a true champ for this area in general. He is currently a program director in genes, the genes environment and health branch at NIH, particularly at the Environmental Health Science Institute. His portfolio includes research on how the environment impacts the epigenome and most recently, of course, the epitranscriptome, lung cancer, tobacco exposures and electronic nicotine delivery systems. Dr. Fred Tyson leads the NIH S target epigenomics and frame epitranscriptomics programs, and it is the POC for the diversity supplement program. He has worked on several trans NIH initiatives, including the centers for population health and health disparities. The road epigenome mapping consortium and currently serves on four NIH common fund working groups. In May of May 24 and of May of last year, Dr. Tyson and an executive committee, along with several lead faculty in this area, put together a workshop that was entitled capturing RNA sequence and transcript diversity from technology innovation to clinical application. This was mostly led by the Human Genome Research Institute and, of course, the Institute of Environmental Health Sciences. I was fortunate to be part of this workshop, and it covered a number of topics that we have heard some of which which Brenda already alluded to. But I remember there was a lot of discussion about even defining our nomics and what does the term mean. There was a lot of discussions of applications, direct sequencing technologies, limitations, what is needed to move forward, and major themes that stick still to my head where themes around the lack of standards, data management and sharing in the field, and the biological impact of our name modifications. So even if we get to the point where we can sequence and map them, what does it mean for signal for trafficking for intermolecular interactions, particularly associated with proteins, etc. Yeah. So, again, welcome, Dr. Tyson. Thank you for being here and for all that you already contributing to supporting research in this area. We look forward to your talk. Thank you for that introduction, Lydia. See if I can move this one with mine here. Good morning all and thank you for the opportunity to give an overview of the workshop, proceeding this one that was convened by the National Human Genome Research Institute in the National Institute on Environmental Health Sciences in May of 2022. I'd like to get some of you that are on this committee participated in that workshop and we had representation from from leading RNA scientists coming from multiple sectors, including academia, government and industry that actively participated. Okay, so the workshop really had four primary objectives, and that were to they were to identify technologies that we need to characterize RNAs, determine infrastructural needs and what steps are going to be needed to facilitate the rapid adoption of these technologies once they come online by the scientific community. And then we also wanted to identify how we might best incorporate a public outreach component, as well as workforce development in this area. Next please. So several program and review staff members from both NHGRI and NIEHS participated in multiple aspects that are related to convening this three day workshop, and we were led by an executive committee. And I want to point out that the members of the executive committee were Blanton Talbot, Pete Deedon, Brent Gravely and I want to especially acknowledge Vivian Chung who has really been a mover and kind of a force of nature behind and in front of the scenes, and trying to get this this effort, moving forward here and in other venues. Let's see. So these committee members certainly have made many important contributions and continue to make important contributions to both RNA biology and chemistry. So I'd like to put into context why the leadership of both NHGRI and NIEHS supported our workshop as well as this this one in the larger consensus report that was mentioned earlier. The NHGRI director, Dr. Eric Green addressed aspects of the NHGRI 2020 strategic vision and how they've gone beyond these those broad brush strokes and have begun laying out goals for genome structure and function science in the following areas. First, we want to talk about direct sequencing. He identified need and opportunity for direct sequencing technologies as an area the NHGRI has actively encouraged for almost a decade through their long standing nucleic acid sequencing technology development efforts. He also pointed out that we currently lack methods to routinely detect or quantify the vast majority of modifications in sequence context or to simply determine each modification while directly sequencing. Moreover, we simply have an incomplete understanding of the diversity of RNA. Technology innovation and development are needed to enable full characterization and quantification of all RNAs transcribed from the genome. Now genomics research really requires physical standards including synthetic RNA with base modifications. NHGRI has been and will continue to encourage research efforts and development of resources in this area of strong need. Now this was the first mention of a theme that was continuously mentioned and throughout our workshop and that was standards. So going forward, our analysis will be a component of the of the NHGRI's multiomic approaches to disease and risk. And at the cellular and molecular level there's an enormous opportunity to study aspects of functional genomics. That is how elements of the genome contribute to biological processes. The NHGRI vision fully embraces the importance of developing technologies and applying them to comprehensive RNA sequencing and characterization. As part of what's needed to build a robust foundation for genomics. And in short, it would be, it should be routine that we consider on a genome wide scale, the diversity, fate and function of RNA molecules, including splice and transcription variants and modified bases. Now Rick Wojcik, the NIHES director followed up with describing why the capacity to directly sequence RNA with a base modification context is critical to the field of environmental health sciences. Environmental health science investigators that are supported by NIHES are increasingly appreciating the potential impact of environmental toxicants and exposures that can compromise the deposition, recognition and removal of RNA modifications and how this may be mechanistically associated with adverse health outcomes. As a result, there's a growing number of grants in our portfolio that we're supporting at the look at the impact of toxicants on epitranscriptomic processes. In 2018, five years ago, we were not supporting any grants in the area of epitranscriptomics, but currently we're almost up to 35 active grants in this area. We're supporting R01s, R21s, R03s and F31 as well as R35s that are looking at environmental impacts on epitranscriptomics. The NIHES portfolio is looking at modifications, readers, writers and erasers of epitranscriptomic marks. We're using a diversity of exposures and employing state-of-the-art technologies to address questions about where toxicants, the epitranscriptome and human health intersect. Being able to sequence RNA with a modified base context will certainly significantly advance the field. Now, I'd like to spend the rest of the time talking about the workshop itself. The workshop, as Lydia told us, was entitled Capturing RNA Sequence and Transcript Diversity from Technology Innovation to Clinical Application. We had 298 attendees over a three-day period and it featured six scientific sessions. The keynote presentation came from Anna Pyle and that really laid the foundation for the meeting in terms of delivering a state-of-the-science address. She stressed the importance of studying the diversity of RNA sequences and modifications to advance the understanding of the role of RNA in health and disease. She provided an overview of the biological and technical challenges in the field, new strategies being developed, important unmet needs and potential future directions that we might go in. This was then followed by five scientific sessions with plenary talks with each followed by three concurrently held breakout sessions. And I'll spend the rest of the time summarizing the highlights of these sessions. I want to point out that Jensen Nam talked about structure and biological implications. Jeanne Lee talked about aronomics and applications with small molecules as therapeutics. And Juanunu talked about technologies for direct sequencing. Chris Burge addressed in the plenary talk, infrastructural needs. And Brent Gravely talked about how we might get the community to embrace the technologies once they're online and get more of them disseminated. Okay, if I can have the next slide, please. Okay, so technology development really underlies many biomedical discoveries and applications and advances in RNA science will require technological innovation, development and standardization. New methods and technologies are needed for full-length and direct RNA analysis to comprehensively characterize the diversity of transcripts. And we need to look at isoforms as well as modifications in sequence context. So how can this be accomplished? Well, we talked about investment in mass spectrometry capabilities through modifications of existing instrumentation as well as the development of new devices that are really based on our RNA. And this is really identified as a real critical opportunity. We also discussed focused enzymatic research efforts to develop useful molecular tools that overcome rate limiting analytical barriers to molecular analysis in RNA biology and would be helpful with sequencing approaches. Additionally, the group suggested that technology innovation and development efforts need to be enabled to generate long synthetic RNAs with specific modifications which would allow multiple advances in the field. This is another discussion of standards. And we also need high throughput and subcellular methods for detecting localizing and analyzing interactions between RNAs, DNAs and RNA binding proteins as well as small molecules. We also made efforts to identify what optimal biological standards and controlled centralized resources are most needed to advance the field. And certainly a gold standard of reference synthetic as well as cellular RNA spanning the range of known modifications will enable large scale efforts to develop and advance technologies. This would include common sources of well characterized RNAs both synthetic and cellular with specific exon chains and modifications for technology development and spike in controls that would enable comparisons across approaches. The field will also benefit from a centralized resource with bank tissues cell lines and an example or virus sample for testing of new modifications and methods. Participants also encourage the use of consortium or coordination center approach to stimulate collaborations for both wet and dry lab advances in direct RNA analysis. As we're talking about wish list or what we really need a centralized mass spec facility resource for RNA sequencing and modifications to assemble and calibrate work on a pilot project using direct RNA sequencing methods would also benefit the field. Okay, for opportunities and computational resources indicate that we need the development of a comprehensive centralized interoperable standardized and searchable RNA database that includes different RNA types and both are both typical and disease tissue and that would really again help support significant advances. There were calls for nomenclature file formats pipelines and software to be standardized updated and developed. The field would benefit from more efficient data processing and development of machine learning or AI based tools as well as a streamlined workflows to enable RNA analysis efforts. The determine at the field would benefit from the creation of publicly available and appropriate training sets, including large data sets and those generated from in vivo transcribed RNA. And very importantly we talked about developing RNA secondary structure prediction algorithms that consider chemical modifications. So robust discussions regarding how sequencing with a modification context can advance our understanding of how environmental stresses or exposures can compromise functional modified transcripts. The participants thought it important in the context of identification of specific enzymes reagents or tools that can induce or excuse me insert stress induced modifications to understand the biological impact of distinct environmental exposures. They also thought it important to develop tools to study specific RNA modifications and a variety of different cell types to connect these changes to environmentally induced or associated pathologies. Development of the capacity to map temporal and spatial dynamics of RNA modifications inside a cell and tissue in relation to functional responses to stress or physiological conditions would also advance this field. And finally we also talked about the development of methods to interrogate the impact of stress on RNA functions, like signaling RNA protein interactions and trafficking. Now I'll reach to the public is is is we think a needed component to increase our understanding of the importance and impact of RNA research on public health and medicine and I think the pandemic we're coming out of that is is a clear illustration of how the public outreach could inform folks better know what we're dealing with. And outreach components should be required of any large NIH funded consortium and center projects addressing this additional funding for diversity career awards and administrative supplements was recommended, as well as support of undergraduate and graduate level training, particularly at minority serving institutions to improve the diversity of the workforce in the next generation of scientists who will be tackling these issues. There must be interdisciplinary and incorporate both virtual and hands on instruction in both wet and dry lab methods. Training opportunities should be easily accessible, high quality, low or or no cost with online training videos on the use of technology as well as data analysis. And these would be considerable drivers and getting the community to adopt or embrace these technologies. Now we know that such a resource would require a home as well as a long time commitment and probably resources to maintain and keep the training modules going. But the training really should be available to all professional levels. Next slide please. Okay, there are several challenges and opportunities that that exist when discussing how direct sequencing can be directly tied to clinical applications. Many RNAs fold into structures that can be effectively targeted with small molecules strategies are needed to identify them. Validate and optimize small molecules that target functional structural units within the transcriptome. RNA sequence can predict small molecule binding sites that will allow like inviting to the functional structural units. Existing RNA taking on targeted small molecules currently use a range of different mechanisms to exert their effects and these include direct splicing with cellular proteins inhibition of translation of undruggable proteins and the deactivation of functional structures and non coding RNAs. So I think I'm kind of running out of time so what I'm going to do is just then go to this this last slide which really kind of highlights that we identified several different opportunities within that workshop that can be addressed as we move forward with this field. If you want to see a full report there's website that has the full beating report it's got an executive summary. There's a there's a full agenda there's speaker bios many other things are on that report. And I encourage you to go visit that and again I realize I've gone over my time I apologize for that and thank you. Thank you Dr Tyson so I'm wondering if there are questions in the room. There are questions in the room as of yet, but I'm monitoring that, but how much more time do we have for questions about 10 minutes. Okay. So, maybe I can ask someone to get us something to get us started Dr Tyson so when we had this workshop. I remember a big topic of conversation was this idea of standard. So should we be sort of consolidating the number of systems the cell types the types of mods. Should we be sort of constraining this problem a little bit so that we can be comparing results across labs and across different efforts. So we just wanted I wonder if you can recap some of that and maybe some of your own thoughts there. I thought that one of the things that we talked about really was if we are going to support large scale efforts that we want to have some centralized types of resources I think one of the things that we know certainly from from doing the epigenome work is when you get different labs doing different things and people are using different protocols. I think we had a lot of variability that hopefully if we had some centralized resources to kind of pilot some of these activities that would reduce that type of variability. I think we need to have standardized nomenclatures, as well as pipelines again to reduce variability. And those types of things again are going to require a kind of a coordinated approach. When you talk about standards there there are a number of different types of standards we talked about standards for both, you know, the transcripts, both long and short ones that have modifications on them. Standardized technologies for doing things and I think to, to kind of be able to reduce variability and interpret results consistently. That would be a lot of calls for standardizations I think we're going to need to kind of just define, you know, what is the low hanging food where we're the standard that we can more immediately kind of implement over the next few years. Lydia we have a question in the room from Kate Meyer. I think this meeting did a great job of identifying, you know, some of the major needs in terms of standards and bioinformatics resources training other infrastructure needs. I wonder if you can comment a little bit on some of the new initiatives that have maybe resulted from this meeting to help support some of those goals. That's, could I comment on them. I can say something but an actual comment for a record no. There are some preliminary efforts to try to see if we might be able to develop something but we can't really. If we were actually working on an initiative I couldn't tell you that. But I can tell you we are in discussions. And that will have to suffice for now. Maybe turn off the zoom and all the recording and then you can. No, no, no, no, no, no. I love how Kate you tried. I do too, but you know. Other questions in the room. There is one here. Do you hear me, Lydia. Yes. Okay. So, I guess I'm trying to envision, you know, if something like this project goes forward. Do you think about, you know, NIH is contribution but how it might intersect with contributions from private organizations. I guess our, how this can proceed at the level we've been discussing, almost clearly mandates a private public partnership I don't see how we can do it I mean I don't think we have the resources at NIH to solely, you know, support this. And I've had some conversations about what we think the figure would be to actually support this as fully as we'd like to see it supported, and I can't even make my mouth warm. Yeah, good answer. And then, and then if you were going to think about private contributions are the things that you think NIH would be much more suited for and that other things would rely on private contributions. I do think there are things that NIH can actually really foster and do but there are some things that are going to require some resources that we're not going to have access to. As we talked about standards for instance, when you think about doing these standards with these modified chemical bases there are very few places in the world that can do this. In fact, I think I can think of two and they're not here. So, yeah, we're going to need to really kind of cast a broad net net and identify what are the things that NIH can support what are the things that private entities can support and, and we can't be restricted to just doing this work here in the US because I was just saying, in terms of these types of standards on these nuclear side bases, they just, you know with modifications on the side bases. We don't have that capability right here right now, but there are people that do. Thank you. I have questions also from in the chat. Fred, I'm going to read a couple of them to see what your thoughts on so one that I'm really curious about is in reflecting about the workshop. Did you think it achieved what you hope for. And if not, how else would you have restructured what else would you have liked to learn. I think the workshop really achieved what we wanted to do we want to bring together a good representation of some of the thought leaders in RNA biology chemistry together we certainly didn't get everybody but we got a lot of folks in there in the room. I think we were able to identify the priorities from the slides that Dr. Bass put up earlier. I think we covered many of those topics, but certainly they could be copied more exhaustively. I think maybe one of the things that we didn't do was really probably identify the low hanging fruit in terms of what can we do in the next five years. Certainly, again, I think we have some direction on some things as Kate asked, are we thinking about things yes we are thinking about things so we've, we've kind of identified some areas that we think we can begin to approach right now but I don't think. I think, I think we did a good job of identifying priorities, I think we could do better in terms of narrowing those down in terms of what we can attack right now. There's another question here that I'm also really curious about so based on the workshop, do you envision NIH as the lead in an RNA modifications effort. If not, who should, or if so, who can you work with who are the lead partners that you envision in this effort. I speak without authority. But I believe NIH should lead this NIH cannot do it alone but I believe NIH should do it. And I see members from NHGRI so I'm not going to, well I am going to but I'm not going to kind of put anybody on the spot but certainly they represent a natural leading place to get some of this started but they can't do this alone either. As I indicated earlier, these, these type, this type of work has great importance to environmental health sciences as well so I think that NIHS is also should be a lead on here but I think all of NIH I think you know that I don't see people here from IAD that I, they're here I don't know them but anyhow, I think this would be something that would be of great importance to that institute as well. I think most NIH institutes if not all really should be interested in this work because, again, it will impact, I think so much of human disease and health that all of the NIH Institute should be partnered in this to do this. In terms of who else, I don't think it would be appropriate for me to identify other partners in the private sector that probably could be engaged in this but they know who they are and and I suspect many of you all know who these folks should be. And I don't think it would be appropriate for me to kind of say well these folks should take all this these folks should tackle that other other people probably should should identify those components. So, a related question Fred, if NIH is the lead then should RNA be incorporated into NHGRI or should a new institute be created for these efforts. That's a great question Lydia, but if we look at the reality of where we are right now. I don't think, I don't think separate RNA Institute would be feasible right now. Certainly, I think NHGRI and Jennifer you can you can certainly correct me if I'm wrong but I think they've already integrated this into their program and it has been I think for the last 10 years or so it's not longer. So, again, is it being done exhaustively. No, but again nobody can do it exhaustively and I think that's why it really kind of requires a trans NIH effort to do this. I would think a common fund would be a great place to do this. Unfortunately, this year the common fund didn't agree with us. But that doesn't mean that that's dead in the water. There are certainly other places that they can address but I don't know if it's appropriate for me to do that but there are some other places that maybe engage this as well. Do you want to say anything about that. Okay, we won't say anything about that. But there are other agencies that could take a leading role in this activity as well. In the video we have one more question in the room and then we should maybe keep going but that's rough. So, do you foresee this and other initiatives like the studies could or should go beyond mapping of the modifications to something more like a structures aren't structures which would have more therapeutic value. So do you foresee this initiative could or should go beyond mapping modifications to say solving the RNA structures. Oh, absolutely that I think is a major component because certainly the RNA structures dictate a lot of the function as well as identify the binding sites that you may be able to get your your small molecules to target that can be used for therapeutics or diagnostics I certainly think that that the structure is huge because that again may identify therapeutic targets. So, absolutely yes I mean I think any initiatives that will come out will certainly address structural elements. Thank you. And maybe have one last question so that we can end on this one Fred. So, since the last workshop has there been any new developments in technology that show promise as disruptive towards sequencing. I know that actually one had developed some things are published something that came out right a long time at the same time as the workshop, where they're, I'm not sure how many he's able to get but he has some new technologies that are available. Jennifer Strasburg would probably address that better than I can because they have within the program at NHGRI different funding initiatives and grants that are supporting this as well as some work that they're doing with the NSF that may be able to talk about some really developing the top technologies that are becoming available. So, yes, I imagine in the interest of time we should probably move but but Tal is here and he could certainly address these things. Again, thank you all. Thank you so much Fred for starting us often. Thank you for your contrast for asking very good question. Our next speaker is Dr Kristen Koot Moo. Please come come forward Kristen and we decided that Kristen should give a talk pretty early on in our workshop because she works on modifications she thinks about their functions. And she also thinks about techniques to characterize and map these modifications so Kristen is an assistant professor at the University of Michigan, and she is the sehan a gay. Oh, I said it right assistant professor so she has a chair so Kristen please come forward and looking forward to your top. Thank you so much Brenda for the lovely introduction and I, and for everyone to introducing this topic. Going forward, let me set my timer so I don't ramble on too long. What's going on. Yeah, this screen is flashing I thought that screen was. Well, thank you for the invitation to be here I'm excited to really help kick off this workshop on RNA modifications one of my favorite topics cells face the daunting task of having to maintain the right number of proteins in the right place at the right time. One general strategy they use to do this is to modify all three major classes of biomolecules in the central dogma in order to control their structure function and stability. Beyond people in this room of scientists as a whole tend to think and talk a lot about the sorts of modifications that get put on to DNA, or onto proteins, things like methylation and phosphorylation. However, it turns out that that central molecule in the central dogma RNA can also be highly modified. And there are over 150 different varieties of modifications that can be put on to these molecules. And we've known about the existence of modifications in RNA really since the dawn of RNA research in the 1950s when some of the most abundant modifications in RNAs were first discovered. And for a very long time, we've thought that these modifications were primarily the purview of non coding RNAs. And just as a reminder and non coding RNA is a catch all for 98 to 99% of the RNAs that exists in a cell. These RNAs are simply RNAs that do not act as templates for protein synthesis. In contrast, it's long been thought that protein coding mRNAs contain many fewer modifications with the only one that we really knew where existed, commonly being the m seven cap found on the five prime end of eukaryotic mRNAs. However, the last 15 years or so have really flipped this long standing dogma on its head with the advent of deep sequencing technologies, allowing scientists to develop techniques to pick their favorite of those 150 modifications and go in and develop tools in order to directly sequence and identify map where that modification can sit at all sites across the transcriptome. These deep sequencing studies have given us maps of roughly 15 different types of RNA modifications and indicated that there are over 10,000 sites of RNA modifications in both non coding RNAs and protein coding mRNAs. These modifications have really ignited the imagination of scientists worldwide because they have the potential to impact every step in the life cycle of an RNA. After it's been made. They can change RNA structure and dynamics, splicing and maturation are RNA stability RNA protein interactions and even how the RNA itself functions. I'm consistent with this perturbations in non coding RNAs and redistrict redistribution of mRNA modifications are linked to a wide variety of diseases and negative outcomes for human health, including neurological diseases, cancers mitochondrial and vascular diseases. This shouldn't be surprising, given that these modifications are linked to really core biological processes including development circadian rhythm maintenance protein homeostasis immune response, and so on. Together, 50 years of enzyme knockout work coupled with genetics coupled with recent mRNA modification maps suggest that RNA modifications are important in some way for biology and this work has given us a really beautiful birds I view of the RNA modification landscapes where we know modifications exist and we know that they can redistribute in response to cellular stress cellular stress or other things that perturbed cellular biology. However, we don't really know what these individual modified sites do. In terms of thinking about mRNA, this has led to two sort of hypotheses in the RNA biology field about what mRNA modifications in particular might be doing and I'm showing these hypotheses in their, their extreme here in order to make a point. The first hypothesis is that this new found class of modifications in mRNAs don't do much, and that they're just biological noise resulting from the off target action of non coding RNA molecules and there's a whole lot of consequence to them. The second hypothesis is that they do everything, and that they are a major regulatory mechanism for pretty much all genes and are at the center of gene control. In reality, I think that the real answer is going to end up being somewhere in between, and a part of what I would like to put forward for us to discuss in this workshop is how we're going to get between these two different models in order to figure out which of those many sites are actually really contributing to biology. And I like to think of this as sort of a kin as to protein post translational modifications which we know are everywhere and are very important for controlling the action of some proteins, but also don't always matter that doesn't mean that we shouldn't look into them and wonder what it is they might be doing. So in order to do this, we need to move down from our birds eye view to what I like to refer to as the Google Street view of RNA modifications where we zoom in on one modification at a time and ask what's there, how much is there. How did it get there, and what is it actually doing in order to perhaps change interactions between a given RNA molecule and components of the cellular machinery. And I think that this really leaves us with four grand challenges for the RNA modification field. Our first challenge is to establish the chemical diversity of modifications in both non coding and protein coding mRNAs. Our second challenge is to figure out where those modifications reside and how they're added enzymatically. Also, how those enzymes are regulated and controlled. We also need to not only know what's there and where it is, but how much of it is there, because when you're trying to read through 10,000 plus sites of RNA modification to figure out which ones are biologically doing something. It's really useful to know if that modification is incorporated 2% of the time, or 90% of the time this is really a key piece of information. This is a key piece of information, not only for mRNA modifications, but also for non coding modifications and I just like to give a small shout out to T RNAs, which we thought for decades were stoichiometrically modified, but clearly are not. And so this is something that we need to be looking at in general across RNA species. And most near and dear to my heart is that we need to figure out what the function of individual sites actually are because, to my mind, mapping is a wonderful, beautiful thing but it is not a be all end all purpose. The ultimate purpose of these mapping studies needs to be figuring out what the actual function of individual sites are so that we can know how RNA modifications contribute to biology. So what I'm trying to take as has been discussed pretty, pretty extensively already some large scale collaborations between scientists from different fields in biology, chemists and engineers, collaborating heavily with both academic industrial partners and government agencies in order to achieve these really sort of lofty goals that I've stepped forward. The workshop on my modifications is going to address each one of these points Brenda laid this out nicely so I won't belabor it but there will be sessions today where we are going to learn more about quantitatively mapping modifications, and also about how those modifications might be impacting biology, as well as how to collaborate effectively. Okay, so I'm now going to switch from generally introducing the idea of RNA modifications to talking a little bit about the research in in my lab. I'm using this just as an example of the sorts of things that I think are going on in the RNA community it is certainly not everything that is happening out there but I just wanted to give you some perspective as to the sorts of things that we're thinking about. The research in my lab is really set up in order to try and come up with a bio with a with a biochemical framework for quantitatively understanding the biological role of RNA modifications on the molecular level. In order to do this we have four areas of interest. Our first area of interest is to develop and implement mass spectrometry tools in order to identify quantify and map modifications. Secondly, we're interested in how those modifications get put in by RNA modifying enzymes. Many of the enzymes that modify non coding RNAs are now being revealed to also have mRNA targets and I think that, although these enzymes have been studied for decades it's important to sort of revisit what it is we think we know about the enzymes because they're clearly targeting a much larger set of substrates that are much more diverse than the substrates that were initially identified. I'm also interested in trying to understand the consequences of both tRNA and RNA modifications on protein synthesis. And lastly, RNA viruses is something that hasn't come up yet today but RNA viruses just like all of the other RNAs in the cells contain lots of RNA modifications so I'm also interested in trying to look into what the molecular level consequences of those modifications are on viral gene expression and evolution. Today I'm going to really just touch on these three main areas of interest starting off with mass spec. The reason I chose to start off with mass spectrometries we're going to talk a lot today about direct sequencing by nanopore but I notice that mass spec was underrepresented so I'm going to highlight some of the aspects of my program that focus on using this and developing this sort of technology. The work conducted in the studies I'm going to be talking about today has really been the brainchild of a single student in my lab Joshua Jones who's absolutely excellent. And we're working together with my analytical chemistry collaborator Dr Bob Kennedy because I'm a mechanistic entomologist by training who just had some crazy ideas about things I wanted to do with mass spec and Josh was brave enough to make them happen. So before I go into what my lab has done I just want to generally frame for the audience what mass spec is capable of doing. So mass spectrometry can tell us a lot about RNA sequence and RNA modifications and there it can do so really in two different levels. At the first level, you can chop down RNAs in a sample and analyze the nucleosides that are present. This sort of approach has been used widely for a very long time in order to discover RNA modifications. In order to identify which modifications are in a particular sample and this really is the gold standard way of quantifying RNA modifications. However, mass spectrometry can also be used to directly sequence RNAs which is actually a much more difficult thing to do. Right now, sort of the state of the art in the field is that we can purify short RNAs for example short t RNAs can be purified individually and robustly sequenced and you can also we're starting to be able to sequence semi complex mixtures. So for example, mixtures of our RNA partial degradations. And I'm going to talk about work in both of these realms now. We started to give everybody an idea as to what the sort of nucleoside workflow looks like and the sort of information we can get out of this type of studies. When we're trying to analyze what modifications are present in a sample and how much is there. We take an RNA sample we digested down into nucleosides we then separate those nucleosides out by liquid chromatography and shoot each one of those nucleosides on to a mass spec. This allows us to simultaneously monitor all of the signals arising from both modified and unmodified nucleosides at in a single assay. And the technology that my lab has developed which really is just a forwarding of technology developed by the limbaugh lab and others really standing on the shoulder of giants. We're now able to simultaneously quantify 51 ribonucleosides with the highest sensitivity for ribonucleosides, and this is done in the absence of mass spec contaminating ion pairing reagents. And just in general the pros of this sort of approach are that they are direct you're directly looking at the RNA, their high throughput, they're extremely sensitive, they're quantitative. There are a number of internal sample quality controls because by looking at 51 nucleosides in parallel. If I'm looking at an mRNA sample, for example, it's very obvious if there are tRNA contaminations if I start to see big funky tRNA modifications pop up in my sample. The major drawback to this particular technique is there's no sequence context and I'm not going to lie that is a major drawback but I just want to emphasize that right now this is still the gold standard method for quantifying modified nucleosides in a sample. My lab has also become more interested in trying to directly sequence RNA by mass spectrometry and I'm going to tell you two brief stories now about directly sequencing total tRNAs as well as viral RNAs. So the limitation in sequencing RNAs is that it's hard to do top down sequencing because you can't just shoot long RNAs onto a mass spec and expect them to fly and be able to sequence them. However, it's difficult to chop down RNAs into reasonable pieces that can be mapped back to the transcriptome because there are no RNAs that are akin to proteases. Proteomics researchers have these great tools for being able to chop up proteins into reasonable sized pieces that they can then use for analysis. So RNAs that are commercially available right now don't do this. They are incredibly promiscuous and they just chew things down into tiny bits. And so this makes it really difficult to chop up RNAs in a complex mixture and be able to analyze them in a way that we can get any information out. This has been done once by a previous group that developed orthogonal nucleases in order to be able to do this, but these nucleases are not commercially available, and they are really good nucleases so they kill cells. Trust me, we've tried. When you try to make them because they just chew up all of the RNA in the cell. And so we wanted to see if we could come up with a more robust way using commercial enzymes to directly sequence total RNAs. And so we did this. This is an example of why having perspectives from multiple areas in science can be really useful for pushing things forward. We did this simply by using our RNA knowledge. So normally when you take the total tRNAs and you digest them down, you get a number of short fragments and if you go on and you sequence these fragments by mass spec you can get about 10 to 30% coverage on total tRNAs. However, my analytical graduate student, Josh Jones has been sitting in on all of these RNA meetings for, you know, five years now and he comes into my office one day and he says, you know, we've been trying to make all of these enzymes in order to get larger pieces of RNA and it's just a pain in the butt. Why don't we just fold the RNAs before we chew them up with single stranded nucleases that should really give us discrete sizes that we could then go on the sequence. This seemed like a really straightforward idea. And we tried it and it works quite robustly so now we get much longer sequences, just simply by folding the RNAs before we degrade them and we can then subject to these two mass spectrometry. And sequence them directly using CID and doing this if we were to compare the coverage we got from the fully digested tRNA and from the digestion of the folded tRNA you can see we've improved our sequence coverage significantly and this is true not only for tRNA. And I have shown here but for tRNAs broadly across the transcriptome where we can now using this method directly sequence total yeast tRNAs up to with up to 98% sequencing coverage and most tRNAs having 60 to 90% sequencing coverage and if you were just to compare here the standard tRNAs and RNAs t1 digestion on unfolded tRNAs to our current method which as the bars in blue you can see we've dramatically increased our ability to directly sequence tRNAs. Additionally, we've been interested in trying to sequence even more complex mixtures and so we've moved up to started to start directly sequencing viral RNAs. Professor Blake wouldn't have to at Montana State University in order to obtain viral RNA samples and we're then applying these samples to direct sequencing not only by mass spectrometry, but also by nanopore sequencing so we can directly compare the two. To resubject MS2 bacteriophage first to nucleoside analysis and then to direct sequencing by both LCMS and nanopore we see by both mass spectrometry techniques the inclusion of a single pseudo uredine modification. This is in great contrast to what our nanopore sequencing suggested which was over 300 modification sites many of which were from what modifications that our nucleoside analysis indicate don't even exist in our sample. This lesson has led us to propose that we should be really thinking carefully about integrating the LCMS and nanopore platforms in order to better develop algorithms for directly sequencing RNA modification by nanopore. Specifically, I would suggest that we consider at this venture where we are in terms of the state of technology using mass spec to identify which nucleosides are in a sample that we're analyzing so that we're at least picking the right program, nanopore program to use. Secondly, we should be critically thinking about the data that we get for nanopore thinking to the number of sites that I can measure by nucleoside analysis correspond to a reasonable number of sites given by these algorithms. We can also use nanopore to verify sequence context in well expressed genes I think that one of the take home messages for for me or from me I hope you get is that we should trust but verify with all of these RNA sequences we should be looking at sequences by orthogonal methods and making sure that we know really what's there. And lastly, mass spec is still a great tool for providing foundational data sets on short abundant RNAs like tRNAs that can then be used to ultimately enhance the algorithms in nanopore technologies which I think are eventually going to be the way to sequence RNAs. Okay, now we're going to move on to talk a little bit about the consequences of modifications that it is we're discovering and we're studying. I wanted to focus on the consequences on translation. Because translation is a cellular process near and dear to my heart it also is a process that uses a lot of modified RNAs including modified tRNAs, and we now know modified mRNAs. So there have been a number of studies by many people including honey czar and Jodi Puglisi and many many others that have established sort of two separate classes of modifications in mRNAs and how the ribosome deals with those two classes of modifications. The first class of modifications I like to think of as modifications that basically cause the ribosome to have a hard stop. When the ribosome encounters these modifications it can't go on. Many of these modifications are incorporated by RNA, sorry by RNA damage processes. When the ribosome stops at one of these modifications this causes the ribosomes to collide, eventually signaling for the degradation of the nascent polypeptide chain and the aberrant mRNA. The sorts of modifications that I want to focus on talking about now are a second class of mRNA modifications which have been shown to not cause the ribosome to form a hard stop but instead cause the ribosome to simply slow down. These sorts of modifications don't result in ribosome colliding and therefore the ribosome makes it all the way through transcripts containing these modifications in cells, going on there, makes it all the way through these modifications in cells, and there's the focus of what I'm going to talk about today. Something is wrong with this slide. I was supposed to say there's supposed to be an image of Inocene right there which somehow didn't translate. So, we know that in addition to slowing the ribosome down which could of course influence protein production, RNA modifications can alter also alter translational accuracy there's a large body of literature demonstrating how this can happen with the RNA modification in a scene as well as with damaged bases. I'm going to talk to you a little bit today about some of our work looking at ureidine modifications particularly focusing for this crowd on the pseudo ureidine modification. The pseudo ureidine modification is of particular interest in terms of trying to figure out its biological function because it's one of the two most abundant modifications found in protein coding mRNAs it's in fact almost nearly as abundant as that N6 methyl adenosine or M6 a modification we tend to hear so much about pseudo ureidine is primarily found in coding regions this suggests that it is a modification that the ribosome is going to be seeing regulatory in cells. And they can also be incorporated with quite high occupancy meaning that many of the pseudo you sites in mRNAs are in fact, conserved, and they can also be incorporated at high levels meaning 60 to, you know, 98%. Additionally, we know that the insertion of pseudo ureidine varies across the transcriptome with stress and development. I'm really interested in pseudo ureidine because this particular modification has been studied for decades in the context of non coding RNAs, where Paul agris Eric Westhoff and others have demonstrated that pseudo ureidine, in addition to base pairing with the adenosine can also base pair with other nucleosides suggesting the possibility that this might allow the ribosome to sometimes prefer to take in and bind a non cognate tRNA leading to the insertion of alternative amino acids. So we started to get at this correct question directly we used to fully reconstituted in vitro translation system in which we have all of the components for translation purified and we can add them back together in discrete amounts and really control our reaction, adding or subtracting one modification in this particular reaction we have a unmodified or modified phenylalanine codon here in the empty ribosome A site and we've added total charged tRNAs and just looked at an unbiased manner to see if the ribosome only makes the expected product or also does some level of alternative amino acid selection. Of course it did do some level low level of alternative amino acid selection or I wouldn't be talking to you about this. We went on to characterize some of the alternative amino acids we saw get put in kinetically and to really do transient kinetics and get an idea of how much of these amino acids could be inserted and how robustly on modified codons. In this particular scheme we're reacting ribosomes that contain an unmodified or modified phenylalanine codon with an isoleucine tRNA so we're looking at misincorporation on the ribosome. When we do this, we see that isoleucine misincorporation can really be either enhanced or limited in a very context dependent manner, depending on where that pseudo urodine is located within the mRNA. As a mechanistic enzymologist when you see funny things happening in your test tube you always want to go on and make sure that you actually see the same thing happening in cells. So we worked with Dr. Bejoy to Roy who was at NEB and is now at Moderna in order to do this. Dr. Roy and her team generated Luciferase mRNAs that contained either no modifications or were fully substituted with pseudo urodine. They transmitted these RNAs into cells, purified out the resulting protein products and did an unbiased mass spec looking at what amino acid was present at every single position. In the peptide analysis of the products of these two different mRNAs are shown here. So up here is the Luciferase peptide that flew the best and for which they have the best sequencing depth. All of the codons that do not contain a U or all of the amino acids that are generated from codons that don't contain a U are not bolded. Those from codons that contain a U are bolded and those where we actually observe substitutions are shown in red. The take home message from this is that we do see substitutions at a higher level on the mRNAs that contain pseudo urodine. However, we do not see them at all use sites suggesting that there is context dependence just like our in vitro assays demonstrated and subsequent work has come out the last year showing that this occurs not only in our system but that it can happen in other people's hands as well, which has been very reassuring. As a chemist I now that we know that there's some sort of context dependence the next natural question is why. So we worked with Dr Aaron Frank, who was at the University of Michigan and is now head of chemical biology at a raucous therapeutics in order to do some MD simulations to try and understand this we also concurrently did quite a bit of RNA melting studies. Aaron and his team modeled an isoleucine tRNA bound to a phenylalanine codon in the context of a crystal structure we'd collected with Dr Yuri Polinikov containing a pseudo urodine mRNA. Aaron's MD simulations much to my astonishment absolutely recapitulated what it was we had seen in vitro where he demonstrated that by inserting a pseudo urodine at the first position and a phenylalanine codon, you have a stronger base pair between the phenylalanine codon in the isoleucine. He saw the opposite at the second position where we also saw isoleucine being discriminated against suggesting that it's really fundamental properties of these RNAs that are driving the phenomenon that we're seeing. And this leaves us with a sort of overall summary of what we think pseudo urodine might be doing during translation elongation that is without pseudo urodine there the ribosome makes a lot of protein, mostly the right thing. When pseudo urodine is there the ribosome can make a little bit less protein sometimes and it can also make a small number a low level number of peptide products containing single mutations. We then went on and said okay now that we know pseudo urodine might be doing something how is it getting where it's supposed to be in order to do this we're studying pseudo urodine synthase enzymes or these enzymes have been linked to intellectual disabilities and cancers. Our work I'm going to summarize here in one slide that basically suggests that these enzymes are much less specific than has been long thought, and let us to a model for pusses selecting their RNA targets, which is that under unstressed growth. Pusses reside mostly in the nucleus, where they have access to a limited number of substrates and the substrates they do have access to are bound with RNA binding proteins or with ribosomes under stress. We know that puss seven can relocalize at least under heat shock out of the nucleus and into the cytoplasm where it can now see a much wider variety of substrates, and it doesn't fact modify more substrates under these heat shock conditions. We've gone on to follow up on these studies identifying a number of different conditions under which puss seven relocalizes under stress. We've also tagged the puss seven enzyme, forcing it to be either only in the nucleus or only in the cytoplasm with an nls or ns tag, and look to see how that impacts cell viability. And we do see that having either an nls or an ns tag under different stress conditions, changes cellular fitness and response to stress. This is really in line with a sort of an emerging idea in RNA modifications both across mRNAs and tRNAs that many of these RNA modifying enzymes relocalize in response to stress and that this might be doing something important in terms of stress response. So again, welcome to the RNA workshop meeting. Thank you for listening to all I've had to say and I'm excited to talk to everybody today about trying to figure out which sites do what when and where, and thank you so much for your time. That was great, Kristen. I'm going to try to start us off and it relates to your point that we've got extremes of what modifications might do they might be the key to life, they might not do anything. And I agree, the answer might be intermediate. And, you know, if we're embarking on trying to sequence all RNAs with their modification, you know, we probably want to prioritize in some way and I'm, I'm, there's no right answer to this but I'm thinking about your, your assays of pseudo uroding in message and how it could trigger a different amino acid insertion. Can you envision a screening, something to say okay, these modifications are the most important we should focus on these and, you know, you were putting them in and not a natural context but I would be very curious if you have any ideas on that. Yeah, I think this absolutely has to be done. This is the sort of thing that we're thinking about how to do moving forward it's actually pretty tricky with the proteomics to be able to come up with how to approach such a problem I think our approach currently has been more to try to build up some tools. So, I didn't show it here but we've done a bunch of different you modifications where you basically walk the modification around the base and put it in different contexts, and we're trying to figure out what context, cause the biggest perturbation in rate and the biggest perturbation in amino acid addition and then go back into the transcriptome and look and say okay where are sites where it's in this context and start with those because I think we need a priority list. But yes, absolutely some sort of large scale proteomics study is absolutely warranted. Thank you. You're welcome. Kathy. Yeah. So I'm just wondering that quantitatively how much of the pseudo user that in the coding sequence in terms of the given site occupancy will lead to a functional impact. And what's your plan to tease apart past MR and the tRNA targets in terms of its function. And that is, those are all fantastic questions. I think that we only recently even have an idea of what the stoichiometry is right Chuan had a paper, a bid seek paper out earlier this year that measured stoichiometry is transcriptome and in that paper they saw a lot. They saw translational read through with pseudo year Dean at stop code on mRNAs and they were able to actually verify this by, by Western blotting. And the stoichiometry of the pseudo year dings that were at those sites did not necessarily directly correlate which have with how much read through they saw so I think it's going to be a tricky question, unfortunately, none of this is easy. I wish that I can say it was and then in terms of teasing out the role of mRNA versus tRNA. I don't know, I think in vitro is really going to be the way that has to be done, because you really can't just knock out one of these enzymes and look in vivo and say that's what's going on. So, I was thinking back to your, you had the one modification that you saw, using mass spec versus the 300 position by nanopore. So I'm assuming that one was was determined those are the 51 you could look at. Yeah. Okay, so, and then those are the most common ones. I mean to be very well, we even those include some weird ones. My question was, so there's 300 by nanopore you believe that you had 299 just totally erroneous ones by nanopore do you think that those are actually modifications that aren't those 51. No. So you think it's 299. I do. Yeah. And we are working. So I want to say I think, first of all, I want to be clear that I think nanopore is going to be the future of this field I think long term, absolutely that is where we have to go. But just like Fred was talking about this morning standards are going to be absolutely necessary for the development of rigorous platforms so that we really know what it is we're looking at and nanopores great for looking at the RNAs where we already know what modifications are there. It can say yes or no we see something there. But in terms of being able to call modifications de novo, it's not there yet and that's going to need some some more implementation. I want great, great talk. So the, um, towards the end of your talk you alluded to the contribution of intracellular cell transport dynamics to modification say oh I think it's a big deal. And I know that correctly so people usually address situations of stress. As you know my lab will be pushing the idea that before we can understand stress, we need to actually reconcile with the fact that even transport set homeostasis. And it is and it is those changes where there are competing rates between rates of transport and the rate of modification by particular modification enzymes that that we have something to say so my question is, among all the challenges that we have listed, right. How can you comment on the challenge of even establishing what homeostasis look like with so many competing rates with current or future technologies. It's a big challenge, big my comment. I think we're going to need to do a lot of imaging and I think it's also going to be end up being pretty cell type, specific. So it's not a small question in that we could establish this in tech 293 cells I imagine if you go into neuronal cells, something's going to look a little bit different and it's, it's going to be hard. Right so this is not like sequencing DNA, where each cell contains the same DNA. So this is a much more complex problem. Oh, sure. So we have some questions from online or the virtual attendance, do you see any favorite ones there. I just will go through them from top to bottom. Lydia asked pseudo use been studied for many years. How can we accelerate studies uncovering the biological function of other mods, and that remain uncharacterized. What types of assays and technologies could be helpful so I would say pseudo use been studied for many years but we have not yet had the technology to study pseudo you well until about. I don't know the last six months or so. And so there's still definitely a lot left in that space to begin with types of assays or technologies. I think, coupling what we're looking at in the RNAs with actual proteomics and with whole RNA direct sequencing by furthering nanopore studies will be most most useful. John Moss is asking about a fifth grand challenge or suggesting a fifth grand challenge, which is looking at the interaction of different types of modifications or crosstalk. This is a great question. I have a story I didn't talk about today where we actually see crosstalk in the installation of tRNA modifications. This is for sure going to be a thing and it's quite interesting. I'm going to answer one more question than than than many has a question. Jeff keeps wants to know about stoichiometry. So he's asking what percentage of a modification of an RNA species is modified at a given position I think this is absolutely a key question again as we're trying to weed through thousands upon thousands of sites, knowing the stoichiometry at individual sites is absolutely essential. And so we again when we're thinking about mapping modifications we don't just mean mapping we really need quantifying those modifications as well. Yes, Jeff, I think it absolutely matters. Many. Yeah, sorry. Great talk. I have two small questions. One is regarding that transcript for Luciferase with those size exclusive like they're everywhere or the particular locations. Yeah, they're everywhere. They're everywhere. Okay, so for that particular transcript. Okay. And then the second question is, I was just wondering about the significance of the, the subs that you got for the phenylalanine tRNA. Before there was still like what point 5% of proteins point 05 yeah and that's totally normal that that's completely in the realm of when translation makes a mistake. And then after it was like 1% one. Yeah. Okay, yeah, that's significant difference. But I mean it's still I'm not going to say you're still a ton of it, but biology is fantastic at taking whatever is around and available and usurping it for its needs at specific times so I imagine that there might be one or two sites at which this happens much more robustly and it's important in a scene for example does this. Are there anything we have. Did you go through these. No, I didn't. We are going to about to have a break is that right so. Yeah, I think we should go on for a break. Thank you so much. And that was great start. And we're going to take a break until quarter of the hour. So it would be about, I don't have any ideas at 945 1045. Yeah, 1045 and for those of you who are in person. There's a coffee downstairs there's no more left here but there's coffee available if you need it. I'm an associate professor of bioinformatics and data science at the Luddie School of Informatics Computing and Engineering at Indiana University recently changed our university name from IUPU I are going to be changing very soon to IUI and also our school had an engineering my lab is broadly interested in single molecule methods for mapping RNA modifications structures and interactions. So that's our major interest. And today I'm going to be moderating this session on direct sequencing technologies. Our first speaker today is Eva Eva no Nova. She leads a epitranscriptomics and RNA dynamics lab at the Center for genomic regulation in Barcelona, Spain. She works on the use of use and development of direct seek direct RNA sequencing to study RNA modifications. So this session will have four speakers each taking about 15 minutes of presentation and followed by 30 minutes of Q&A. Without further ado, Eva, she's online. We move to the next speaker while. Okay, so you suggest. Oh, we are good. Perfect. Thank you. Hello. You hear me. We can hear you. Oh, no, yes. Okay, good. Good. Sorry, sorry for the issues. Okay, let me just now share my presentation. Okay, do you see the full screen. Okay, perfect. So yeah, so thank you very much for having me here. My name is Eva, and I'm group leader at the Center for genomic regulation. And then basically our lab is called the epitranscriptomics and RNA dynamics lab, because we're basically interested in what RNA modifications do because RNA modifications as others already know, but very briefly are dynamic features that expand the lexicon. And basically in our laboratory, we have been using different technologies to study them. A lot of the efforts are related to direct RNA sequencing and it is what I will be basically talking about. And then we also use other technologies also to study them, like Illumina or also mass spectrometry. And this is because we're interested in understanding their biological function both at the molecular level as well as more at the development as well as their role in disease and in intergenerational inheritance. So briefly, the epitranscriptome comes in more than 170 different flavors. And here it's already a plot that we made in the lab already now some years ago there's an updated version that has been just published. But basically what you can see from this plot is that there's a huge variety of modifications. There's many that have been associated based on the literature with the diverse human diseases the ones labeled in red here. And then like back then, these were the ones that we could actually study in a transcriptome white fashion the ones that are circled in green. And this is basically because most methods actually relied either on antibodies or chemicals that had to be selective for the modification of interest. So very briefly. You need an antibody or a chemical that will selectively react with the modification of interest. And then, then you can couple this to library preparation and finally to you can get the peaks that will tell you where the modification is you may have single nucleotide resolution using certain techniques. And then you can couple it to reverse transcription that if there's a bulky modification, it will lead to a drop off or if the chemical affects the modification in a way that also causes a drop off you will see this kind of signature. So in this sense, back already now some years ago, direct learning sequencing appeared as a promising alternative to actually study any modifications in a transcriptome white fashion. I don't need to introduce the technology to the public but pretty briefly it's it's a technology that allows to sequence native RNA molecules. And basically as the molecule goes through the poor, it causes a disruption in the current intensity, which can be converted into the nucleotides. One of the major advantages are that it will have no PCR bias it can attack all modifications. You don't need to customize your protocols to actually, you know, for each modification which isn't especially important because then you don't need to pre bias your decisions of which modification you think is especially important for your biological system of interest, because you actually do single molecule resolution you can actually get quantitative results in principle. Um, and then you can also get eyes up from specific RNA modification information and principle, as well as study codependencies and so on and so forth. So the advantages in theory are clear, but then I would say that there's some major challenges that remain to actually make this real solution. Firstly, and most importantly, there's actually no base color for RNA modification so the Nanopore has released some base color models for our for DNA modifications but this is not yet the case for RNA modifications. Also like in general there's a large input requirements. So the default library preparation requires 500 nanograms of collie plus material. But there's no some protocols that have been released that in principle can start with 15 nanograms. But there's other issues with that which I mean beyond the scope of what I would like to talk about today. But yeah basically, in any case is still large amount of inputs. In principle, the library prep is actually only limited to poly A plus RNA. And of course you can then maybe consider polyadenylated in vitro but that also has some consequences and issues. And it's in principle also limited to long RNA so it's not that the technology in theory cannot sequence short molecule but the truth is that it is capturing them poorly and when you try to map them there's issues there's these calling issues. And so on and so forth. So this kind of sets up a series of limitations that need to be overcome to actually make this a real alternative to study modifications in a transcript from white fashion. So today, due to the scope of this workshop I will basically just talk about challenge one. We've been working on the four different challenges and I'm happy to answer questions regarding any of the four. But the presentation will be focused only on challenge one which is detecting modifications. So briefly, the theory is that whenever you find a modification you should have a shift in the current intensity at the position where there's a modification of interest. So there's the theory which is very nice. But then in order to actually get this to work you first need to align or resquiggle, which, because basically the nanopore signal, it's not actually, even though there's like, it's supposedly translocated at an average speed. This is like as any biological system it's not perfect so there's actually like an over-stretching in some regions and under-stretching and then you need to kind of do this resquiggling. You can use dynamic time warping or different approaches, but basically this resquiggling process is also dependent on the base calling accuracy. So when we were actually doing the very first approaches trying to use, you know, these resquiggling to detect modifications, we were actually using very old base callers, so albacore, which is now deprecated. And then it basically was a disaster back then. So this is 2017. So then actually what we realized is that modifications also causes kind of base calling errors. So we could somehow overcome the issue of detecting modifications of the resquiggling by looking into these systematic base calling errors. And that's when we actually developed a Pinano where basically we generated some synthetic sequences that had unmodified NTPs and then we also generated ones that were completely modified. It is important to know that because initially we wanted to try to train a base caller. We actually generated the synthetic sequences with covering all the possible 5MERS with and without the modifications. That proved to not work and I'm happy to discuss why. But what we also found is that, you know, if we just extract the features, both at the current information and the base calling information and do some machine learning. In this case, we finally ended up using SVMs. You can then classify the KMERS on whether they are modified or unmodified. And this was set up for M6A but then later on we show that it also works for different modifications. So this base calling error signature is a simple proxy to actually detect modifications in a simple manner in a transcript in white fashion. And it works for many diverse modifications. So here I'm just illustrating some additional examples. I'm going to insert in here to primal methyl and importantly here when you remove this no RNA guiding that specific modification, the error signature is completely lost. Meaning that this is not just some base calling accuracy issue, but actually this signal is driven by the modification because when you remove this no RNA guiding this modification. It is lost in a very selective manner. And the same happens with the other modification types. It's important noticing that sometimes these errors are not just like a single position but actually a spread signal across, you know, the Pfeiffer region that actually contains the modification which makes a bit more complex to detect certain modification types. So okay this is nice so we can detect them at single site but what about in single read so I can we go quantitative. So then for this we kind of have to revisit the issue of risk wiggling this problem needs to be solved but then, thanks to the improvements in base calling and base calling the algorithm that basically you know boopy was actually doing a much better job this problem became reasonable. And basically we can now sorry we can now resquiggle the reads and then see for example shifts in the current intensity and positions that that basically have the modification. And here that is just a control showing that in other cases you don't see this shift even though it also has the modification. So basically using this kind of approach, one can envision that you can predict the stoichiometry of modification, we propose this software here but others also have now developed their own. But basically the idea is that using specific features you can use signal intensity. You can also use dwell time other people and we have, we actually propose to use a different feature called trace as well. You can basically predict the stoichiometry of how many reads are modified relative to those that are modified. And here's the final overall estimates and in the knockout you can see that basically we don't predict them to be modified. And in the wild type they are actually highly modified in agreement with mass spectrometry. So something that is also important we're mentioning here is that actually as I was briefly mentioning before the feature that you use in my opinion do matter to you know to actually how well you predict your estimation so here we were doing some comparisons of you know signal intensity or feeding different combinations and finally we went up with the one that was predicting better this stoichiometry based on the mass spec standards that we had built, which were validated by mass spectrometry to actually make sure that they had the amounts that we were expecting in terms of incorporation of pseudo uredin. Okay so then if we can detect modification individual reads can we look into modification dynamics so we looked at this some time ago but then like the long story and a short answer is yes that you can use it to detect modification dynamics you can go from absent to present kind of scenarios where basically for example here in this example heat shock conditions you can detect the modification and then the normal you can't. But then you can also see scenarios in which basically you know you have a low stoichiometry of modification in the normal condition, and then upon heat shock you increase the modification abundance. And then this is just to also show that not only in these kind of snow RNAs that we were looking here but also even in mRNAs this also works. So now there's many tools to predict modifications that are now out there. And basically in general they rely on either base calling errors as I introduced or they also rely on this on the current intensity or raw signal information right. So if I kind of but then this is really like not what you know what we have for DNA it's still far from being the standard because there's still not base calling the novel RNA modification which I think is ultimately the goal that we should aim for. So basically like just to kind of recap of where all this fits, like here we have the sequencing of the RNA strand. There's some data acquisition, you know using the minnow software, which basically records the fast fives for each individual read that is sequence. And then in the base calling process these fast fives are converted into fast queue here using goofy but other softwares could have been used in the old times albacore. We also consider using bonito although the RNA version is not available. And the important thing is here that you have a base calling model that is being fed to the base calling algorithm. And then finally you can then map your rates. So then basically at the end as I was saying you the tech modifications in three different ways using base calling errors is one way using alterations and current intensity is another way that has been proposed. You would use a modification aware base calling model that would be the third version so you would need here a bit different base calling model that is not just for the canonical basis so it doesn't just predict four letters but it would predict more letters. So then for example here from the wild type you would predict in which individual read you have a modification and in the knockout you should see a loss or practically a complete loss of your modification. You can even have estimation of modification probability for each base and read and position in your reads. So then the question is, can we base call modifications. So as I was saying for this what you need is a model that will predict not four letters but actually you know for example in the case of m6 say at least five letter and for this you need to train a modification aware model. So that's what we've been working a lot on and we started again with m6 say precisely because we thought it would be the an easy scenario for which we could have some training data. So, so here the key thing is that for this base calling model you need to train it, and you need a training set. And basically this is a bit what we have been trying to do so here there's just some couple of snapshots of real data, where actually we have tried to base call to train a m6 say base calling model. And here you can see you know that an individual reads, you know for this position that is a previously described on six say you have some reason are predicted to be modified others that are not predicted to be modified. And then basically you can validate this in vitro and some synthetic sequences where the modification is not predicted or. And then finally you can also even go in vivo and see that in the knockout is less modified and this position in the wild type is more modified. So what we're finding globally is that with this model, we can see very strong differences globally in the distribution of wild type and knockout. When we look into this with the base calling model so we think that there is actually quite some hope for RNA based calling models. But then, like generating this training sets for other modifications is going to be a bit more challenging so basically it's not just about having ways to train the data. But actually it's about also having the right data to train the model and this is also an important limitation that I wanted to highlight. And with this, oops, oops, sorry, I want to thank the people actually that did the work that I've been presenting here today, as well as the collaborators and funding and take you all for your attention. And I hope I wasn't too fast or too slow and everybody follow. Thank you. Well, we'll take questions towards the end. Our next speaker is the next speaker is Ben Garcia Ben Garcia is the head of the Department of biochemistry and molecular biophysics at Washington University in St. Where is lab dollars mass spec technologies related to chemical modifications of proteins and RNA. Ben. All right, thank you so much for the introduction and opportunity to be here and share a little bit about how we're using mass metrometry to sequence and characterize RNA. Let me go ahead and try to share this. Everybody. Everybody see this. Okay, good. Okay. And so today when I'm hoping to discuss with everyone is just kind of give a crash course update on some of the mass spec based approaches that we've been using to detect modifications quantify modifications and even sequence RNA using mass metrometry. So there's a lot of challenges with trying to analyze RNA by mass metrometry based approaches. But there's a lot to gain here as well. Mass metrometry is a fairly unbiased approach. And since we're measuring the mass of molecules we have the opportunity to detect many different types of modifications, as well as novel modifications. Ben. Yeah, or not in the slideshow mode, could you. Oh, well, it's not in slideshow mode. Let me hold on a sec hold on a sec sorry. Sometimes my computer does not like it when I'm hooked up to my dual screen. How does that look. That looks good. Okay, yeah, sorry. And so I'm just a pointer. Okay. And so, while there's a lot of challenges by mass. You know, in using mass metrometry. To sequence RNA, a lot to gain as well, because mass metrometry is is an unbiased approach that allows you to detect the masses of molecules and therefore we can detect, you know, theoretically any modification that might be present including multiple modifications. And then we could also kind of do this in a sequence specific manner by sequencing the RNA as well. However, there are issues with this approach in that RNA is difficult to separate by liquid chromatography which is often kind of used before coupling to the mass spectrometer RNA compared to other molecules is quite unstable in and out of the mass spectrometer. Even seeing larger RNA transcripts is quite difficult and most people in the mass spec field, you know, focus on metabolites and proteins. And so there's just not as as many resources, especially computational for characterization of RNA by mass spectrometry but my lab has been very interested in this problem for the last maybe three four years. And so we've developed a lot of approaches to try to enhance the analysis of RNA by mass spectrometry. And, you know, starting by, you know, just looking at mono oligonucleotides. We've developed a derivatization approach where you drivetize RNA that's been chewed down to one oligonucleoside or mononucleoside unit, and we drivetize it without a methane which is deuterated and I'll show you why this is important later. This is a very strict efficient strategy, very clean reaction, it's per methylation. It's been used before on many other molecules but what this does is it really enhance the analysis of RNA mononucleosides and typically, if you run RNA on a liquid chromatography column C 18 based you know, the, you know, the nucleosides all kind of be looped very close together kind of, you know, in a very small area on all up in the front of the gradient. However, with the derivatization you can see that the the nucleosides now spread out they're more hydrophobic. And so we get better separations and that's that's better for quantification. But this also helps us identify different types of modifications on the RNA as well. And so when you derivatize with the iodo methane. What this does is it will react with any electron acceptor such as hydroxyl group or a mean groups. And so you can see here we've driven ties the three hydroxyl groups and the, and you know the two hydrogens on the mean group are replaced as well. And this derivation of adenosine. But the nice thing is when we start derivatizing these different isomeric modifications such as a six methyl a or two primal methyl a. You can see that the different positions are actually modified here and so we have an endogenous modification methylation here. And in this case we have the dodges methylation down here in the two prime. And so when we fragment these molecules together or detect them or fragment them together on they give us signatures that allow us to differentiate these isobaric modifications that at the MS one level or just the intact level they look the same. But here that actually give us separation and with this we've been able to develop this derivation approach to tech somewhere on 60 to 80 modifications for most cell types. This allows us to really analyze, you know, difficult to characterize modifications on such as your dean and pseudo your dean. And so again, when we derivatize here's your dean you can see the different derivatizations and when we fragment we get this fragment with the ring getting the underization. But when we look at pseudo your dean, you can see here that, you know, we have two different types of our two sites on the on the ring that are now derivatized which we didn't have before so has a different mass and one of fragments has a different mass as well. So very easy to separate these are, you know, distinguished these even though by liquid chromatography they're difficult to separate. And so there's lots of different ways to look at RNA modifications by mass spectrometry a lot of people use targeted approaches, where you know the molecule that you want to study and you target it usually it's at the MS to level so this is at the fragment and so the target is known you typically do not take a MS once so you don't have the parent spectrum, but you'll quantify at the MS to level by isolation of the fragment detection and quantification of that fragment. However, if you're not, if you don't know what you're going to, you know, try to detect you can't detect unknown species. There's another approach called data dependent acquisition where something is known it will trigger an MS MS or a fragmentation spectrum and so typically you detect everything and the parent ions and so these are all different mononucleosides. And then you would select one of these and isolate it and fragment it and get a fragmentation spectrum to identify it. However, the problem here is that something that's low level, and maybe an unknown modification or species may not be selected for the second MS MS event will give you a fragmentation spectra so they go on detected. Lastly, there's a new approach in the mass spec field called data independent acquisition mass spectrometry. And here, you know, the idea is that we have MS one detection so we see everything just like we saw before, but instead of isolating one species to then fragment, we isolate a region of the mass spectrometry and then we fragment these things together. So the MS MS spectra are mix a composite of multiple species. And so you can actually create these little windows that you want to fragment everything that's found in these windows together and we've done this for our name to help windows that are specific to certain modifications and the advantage here is that if something is known, it will be abundant and you'll fragment it and you'll be able to pull it out of the fragmentation spectra. But even the low level species that normally would not be triggered for MS MS in this other approach here the DDA approach. These will be present there in these windows they might be at very low levels but we won't miss them they'll be present. And with power for algorithms, you can extract this data and detect and quantify these very low level species. And so we've been able to do this, you know, applying this type of approach which has been used quite a bit for quantitative proteomics and apply it to RNA analysis and so here's just an example of how here we have a mass of this one species, and there's several MS MS spectra here that confirm that it's real. So we can go back and remind the data but we can actually go back and remind the data for any potential modification or mass change that we would want to identify. And so in the last recent years or so, there's been a discovery that RNA can be glycosylated and so we can go back to our data sets and actually at mind these peaks that correspond to signature ions of different combinations of glycans on the RNA and other sites and then we can go back and kind of confirm these as well. So we've been actually been able to use this type of approach to detect novel modifications and new, you know, different types of branch forms of, you know, different glycan molecules on the RNA as well. You know, everything I kind of showed you before was just mononucleoside analysis and mass spectrometers are very good at sequencing peptides and proteins so can we sequence RNA as well by mass spectrometry and the answer is yes but it is very difficult. Part of the difficulty is that RNA doesn't behave well in the mass spectrometer so we've been kind of, you know, working on that as well. But the other real difficulty is that there are not very many approaches for the computational analysis of kind of this R and shotgun RNA types of analysis. And so we've developed the first infrastructure to kind of in, you know, adapted an open search algorithm that we now call a nucleic acid search engine that will take tandem mass spectra of a ligonucleotide spectra and match them to a database. And so this is kind of just a screenshot of what it kind of looks like so you can see here here's a small RNA transcript with a modification. Very similar composition but a different a ligonucleotide with a modification a different position, and so the program can pick these up quite well. We've been also, you know, adding more and more technology to the RNA toolbox. And so another area of interest in lab is to add iron mobility mass spectrometry, and this is a gas separation. And so we have liquid chromatography separation but now we want to include even more separation in the gas phase. And so this separates molecules in the gas phase based on their cross sections and shape. And so when we turn on this iron mobility we can get slight selectivity of RNA that are then separated from one another but this time in the mass spectrometer themselves and so this is what it kind of looks like when we turn on this fames and fame stands for field asymmetric iron mobility spectrometry. And so we have different voltages that we step through and NF is no fame so you can see with no fame everything kind of eludes out in one big peak and maybe there's a little tiny shoulder here. But when we step through different voltages we can get selectivity and start separating out different RNA species in the mass spectrometer which more or less corresponds to their size as well. And charge state distribution so we kind of get a charge and a size separation in the gas phase with this iron mobility. And because we are able to enrich for species that are really low level if we just throw everything in at the same time. What we see is an increase in their fragmentation spectra so here's a spectrum of two RNA oligonucleotides with fames off no fames or with the fames on you can see we much get we get much more fragments with the fames on so the sensitivity increases and if we get more fragments for the mass spectra we get more scoring species that we can identify. And so here's again here's our computational program this NACE nucleic acid search engine scoring here's with no fames. Here's with the fames on we see many more species and with much higher scoring distributions this allows us to dig deeper into the data and detect many more modifications I think that's just the tRNA digest, then we can with no fames and then you know just get much better sequence coverage and quantification of these species as well by mass spectrometry. In the last couple minutes I just wanted to share a little bit about some new chromatography that we're developing as well. We use a lot of C18 based chromatography but then we have to derivatize the ligand nucleotides and, you know, while that works. We've also in parallel been working on a different type of chromatography, polygraphic carbon chromatography and this has been used for a lot of hydrophilic species. What's interesting about this chromatography is that you can apply a charge to this chromatography and create kind of electrochemical field that will then retain different molecules such as RNA, and then with a polarity switch you can elute the RNA off and this is actually the basis for a lot of RNA biosensors and they use this kind of principles of adding kind of a voltage onto this polygraphic carbon material. So we do this type of electrochemical elution, but in an LC liquid chromatography kind of format. And so if we have a trapping column that has the graphite carbon, the idea is, you know, can we load and trap RNA, and then eluted out with a polarity switch change that we would put a voltage right here before it gets into the mass spectrometer. So in the proof of principle we've loaded RNA onto one of these pallet polygraphic carbons at a specific voltage. And when we start the HPLC gradient, we can see that there's very little RNA species that are eluted, but once we have a polarity switch, and have the same gradient of RNA species, you see everything kind of elutes out and this is kind of shown here in the heat map. You do get some RNA species at elute but once you have the polarity switch, you really elute them all out so you can trap any elutes RNA species into the mass spectrometer. And the last piece of data I'll show is just kind of a further characterization of this process so we have kind of our electrochemical elution RNA trapping and then we're going to, you know, increase the voltages and so we've been playing with just the stepwise increasing the voltage and you can see that we have different species that we trap any loop into the mass spectrometer and it really does trap any loop based on the RNA size. So you can, the lower voltages trap some lower RNAs and kind of step through and, you know, get selectivity of different RNAs that are eluted. So when we kind of combine the stepwise voltage with a continuous HPLC gradient, we get very nice separation of a lot of different RNAs more or less on size as well. It allows us to dig much deeper into the data sequence a lot more oligonucleotides, including many more modifications and so now we're hoping in the future to kind of combine this with the ion mobility is kind of like the ultimate platform for sequencing and characterization of RNA oligonucleotides and this just kind of shows that we get, you know, very very good sequencing, you know, through the mass spec data. So, yeah, really quickly I just kind of wanted to share, you know, some of the things that we're working on. There are a lot of challenges for RNA analysis by mass spectrometry but the upsides are enormous being able to detect a lot of different modifications including novel modifications and do it pretty quantitatively as well but that's going to involve a lot of improvements to liquid chromatography the fragmentation approaches. And even some out of the box thinking, you know, potentially, you know, ion mobility spectrometry and, and some other, you know, potential new chromatography such as this electrochemical elution chromatography to just try to help improve the dynamic range and the selectivity of the mass with a thermometer for sequencing RNA species and detecting the different post transcriptional modifications that are present. And we're continuing to build on a lot of these, you know, platforms that we've established with the goal of just sequencing longer and longer RNA transcripts right now we're kind of topping out at a probably like a 30 or 40 mer oligonucleotide but we'd like to go a lot a lot longer and detect even more modifications lower level ones as well. So with that just want to thank, you know, everyone for the opportunity to present in this workshop and looking forward to a good discussion. Thank you. Thank you Ben for an exciting talk. We'll move on with the next speaker. Our next speaker is a Shuo Huang is a professor at Nanjing University. His research interests are in single molecule sensing applications using engineered biological nanopores. Alright, so can you see the slides. Okay, so I'll just go ahead. So good morning, good afternoon and good night everyone. So I'm Shuo Huang, I'm a professor from Nanjing University. So first, I'm really delighted to have this opportunity to talk with everyone about our recent progress in nanopores sensing of ribonucleotides and their application modifications. So this is actually me, my expertise is only nanopore. And like this is today's outline. So I will just take a very short time to explain what nanopore is and how it can sequence DNA and RNA. And then particularly important, I will talk about this kind of new concept that people are not familiar with. It's called single and nanopore single molecule chemistry, and then how it could be helpful to identify different RNA and MPs. So, the network concept is very straightforward. Like, you have a core and then you have a pair of electrodes you can drive your analytes, electrically, or even by diffusion through the pool, you will get its characteristic events. So this is the first data that I produced when I was an independent PI. So this is a molecule passing through the pool and this is another. But normally a nanopore sensing scenario that produce this kind of signal doesn't give you too much information. So people are more interested to develop like more developed nanopores so that it can sequence DNA or RNA so that can provide more information. So the first nanopore in the world is biological nanopore. And the first biological nanopore is this half humanizing pore. It's very stable and robust, and it could be easily prepared by like prokaryotic expression, and later people develop more many, many more pores. And then eventually they form a very big family. So my expertise is with this kind of now called MSPA. And with MSPA actually we can do like so many things and has a very high dynamic range from like a small proton to like a tRNA. So you can do like a lot of stuff, but today our focus is RNA. So before I talk about RNA sequencing, I think just now people already talked about our nanopore sequencing. So in the MSP, now I used MSPA, I did some engineering of it, and then I produced a DNA sequencing signal in the beginning. And then this is what DNA sequencing signal look like from this nanopore. And if you use using like an Oxford nanopore device, you will produce pretty much the same thing. So I'd be very surprised if you have only four DNA bases, why do you produce like so many different combinations of signals? And the reason is because this nanopore doesn't have a sufficient spatial resolution to resolve each individual DNA or RNA base. Later, Oxford Nanopore announced this kind of RNA sequence, direct sequencing technology, but still it's still experiencing the same problem. You still can't decombolute like RNA one base at a time. So the signal you get is still like that. And if you zoom in into the detail, you will find it's still very complicated and it requires a lot of efforts of bioinformatics to decombolute its original sequence. So when you have like epigenetic modifications, this situation is getting even worse because your epigenetic modification will contribute to your nanopore signal and produce like unpredictable like event features. So now that's why we need like a new concept. So, back to 1997, my postdoc advisor, Hagen Bailey, started this kind of approach called nanopore single molecule chemistry. So normally people just treat a nanopore like that as a core, but actually he started a kind of concept to engineer this board so that it can have a reactive site inside the middle of the board. So this demonstration is like chemical binding between zinc ion and the amino acid side chains like that. So then they will produce like small and tiny amplitude nanopore signals. So corresponding to bonding and dissociation of this kind of metal ions. So at that time, people believe that actually it because you analyze very small so your signal is always very small. And people didn't proceed to further develop this technology. But actually, we actually discovered, pretty much by accidents that the network that can sequence DNA at the same time, it's a perfect network reactor. So this is the MSP in our war. And if you engineer this site to have like a single amino acid side chain that is reactive. So you can actually observe chemical reactions in a very precise manner. So for example, this is a resigning. You can actually put tetrachlorate into your system and then they will point to the resigning and eventually will oxidize the resigning into sulfoxide. So this kind of reaction actually takes three steps so you can observe each individual step like in atomic resolution manner with no problem. And to my surprise, these events look so big. It's like maybe a few tens pick ones or even larger, considering that your analyze is still very small. Why is that so big? It's because your MSP nanopore is actually focusing a lot of ionic current into like a tiny spot. So you amplify your signal so that you can get more information. So when this paper is published, a lot of people ask me the same question. So what is the significance of this work? So the significance is not to observe this reaction. It's because actually you have observed a pore that can report exceptional atomic resolution. So later, then further, I actually started like a different approach. I want to just detect very difficult to separate stuff like saccharides. You might be surprised like why are you talking about saccharides in our meeting? Well, later you will figure out why. But actually, these saccharides are very similar or are almost identically similar in the molecular weight. So sorbos, broktoes, galactose and Reynolds, they have exactly the same molecular weight. So I have to use like a special engineering technology to make this ball to behave even better. So we developed this kind of hetero-optima MSPA. So this is a hetero-optima. And because it's a hetero-optima, so we can actually modify this site with a soil modification site. So this is a phenoboron acid. And then the phenoboron acid will bind with almost any kind of saccharides and produce a unique signal. So this is for sorbos. Remember that, and then later you see like broktoes. They are quite different. And then later, galactose. Considering that it's producing like so much different distinct signals, you can actually put all every different kind of monoseccharides through this pool and you can get like extremely high-resolution data. So this is actually the first night. We got success. My students and I were very happy. So I recorded this moment using my cell phone. And you can actually see from the screen, these kind of events. These are based on chemical reactions between your analyte and your report. And if you do like proper treatment of your data, you can see that even for monoseccharides that have almost identical molecular weight, they can be almost 100% separated. There is no problem. And then later, we realized that we can use that to identify different RNA like NMPs. So in 2022, you probably see this kind of journal cover from Nature & Nello Technology. It's from our work. And then you also see this kind of research highlights and news interviews. We are also discussing the same thing. So in principle, we were using exactly the same technology that we used to detect different monoseccharides to do like RNA modification detection. So this is the same hydro-optimal MSPA. It's modified with phenolbrone acid. It's very easy and very efficient. And then later, this phenolbrone acid will react with this ribose part of the RNA like nucleoside monophosphate in a highly reversible manner. So basically, you are observing like binding and dissociation of like a nucleoside monophosphate to the pore and dissociation from the pore. And this is the raw data that we produced. So considering that you have like four different bases, G, C, A, and U, they will produce like four different distinct signals. And they have no overlap at all. So basically, if your analyte is 100% pure, and by mixing these four together, you will get almost 100% like a accuracy without any problem. To demonstrate like we are definitely using exactly the same core to do this kind of discrimination job. We use the same core and we just keep adding different analyzing to your system. So the first one we have is C and then later we put U inside and you can see this separation and then A and G. So there's no problem at all for like discrimination of this four different RNA bases. Then if we have this kind of resolution, then why not just go ahead and do absolutely modifications. So these are the afternet modifications that we can get easily commercially or by synthetic like equivalent. And so all together we have tried like 14nm pace and their overall accuracy about 99.6%. So the more you include, there will be like a slightly drop of your accuracy because some of them might be like slightly overlapping. And the speed for detection in like minutes, you will get like this kind of accuracy and efficiency with no problem. So here I think it's very straightforward to just show you the data that makes that has excited us a lot. So this is the raw data without any treatment. Based on your naked eye, you can actually see how different they are. And especially for like M1A and M7G, they always behave like a very deep blockage comment. But for like other afternet modifications like M6A and M1A, they all can also be easily recognized by even your naked eye or a machine learning algorithm. So to prove that this kind of afternet modification can be discriminated from their canonical basis. You can actually do like this kind of side by side comparison using exactly the same now for so if we mix C and M5C. So you can see the separation and G and M7G and many others. So I'm pretty sure that you are convinced that you can use that to recognize different average net and MPs. So when we submit this manuscript to National Nanotechnology, like our referees are generally very friendly and very supportive. But they give us like a different task that we didn't do before. So they want us to show like how we can deal with natural RNA like the whole stuff has to be completely natural. So, and then we tried like something simple. This is an Easter tRNA phenylalanine. According to this map, you can see there are so many different types of modifications. So we can use enzyme to degrade that into different NNPs and put the whole mixture into your manifold sensor. Then this is the raw data that we have produced. This color coded dots corresponds to the data that we know or like we have previously recorded before. So these black dots we mark that as UT1 and UT2, UT3. So we have actually four like clusters of events that we have never seen before. And eventually we realize that it corresponds to the M2G, M22G, T and Y. So there are two that we can't detect because these two has modification on the ribose hydroxide group. So it can also do quantitative. So this is the measured value and the true value comparison. You can see that is producing like pretty like accurately the option of modification and space composition in a highly quantitative manner. And it's matching with your true value very well. So recently we also further developed this technology so that actually we can use exactly the same core to measure different NNPs and DPs and TPs. And even there are different combinations of FGNAC based like this kind of digitized. So this is the raw data that show this kind of simultaneous discrimination and the statistic in a scatter plot. So to summarize actually we are developing like a technology that can almost 100% accurately identify different NNPs and their FGNAC modifications. So we are not yet there with sequencing but I think we are pretty close to that. And also eventually I would like to thank my students and my members and also the founding bodies for their support and thank you. Also if you are interested in that of course we can give them to you like on this platform for almost for free. Thank you. Thank you so for another exciting talk. We'll move forward with our next speaker the last session. Marcus Stoiber is the principal machine learning researcher at Oxford Nanopore Technologies where he has been leading a team to develop tools and models to investigate and detect modified basis from raw Nanopore signal. And how we can use and he has been helping several researchers around with using the technology for understanding disease and genetic disorders. Thank you so much. I'm really excited to talk to everybody. Yeah, I'm really excited about all the work that I've been hearing about here and all the work that emphasis especially on all these different tasks that everybody's talking about. I think my talk will hopefully synergize quite well with all of those topics. Cool. Yeah, so I spent a lot of time in my six years in Nanopore working on DNA modifications we're going to go over where that's led us in the state of play of how we detect modified bases in Nanopore. And then some some the other recent RNA upgrades that are soon to come and where we're going with RNA modified based detection. As people know, got a little messed up here that you can directly detect detect modified bases from the ionic current. And so the current state of play is that we have really good direct detection of methylated bases and we have really good models for DNA bases of five mc five hmc. And a lot of people in the community are introducing models for lots of other things, everything you get with Nanopore comes along with this so unrestricted relink flexible approaches and real time sequencing. So the one of the big work that I've been working on for the past about two years is this remora algorithm. So this latches on to this remora fish as the fish attached to the shark ears that how we named it so the bigger fish is the base caller, and the modified bases latch on so the remora algorithm latches on to the base caller and performs a different task of modified bases actually separates it from the base calling task goes along with a big theme that we've heard a lot about is that it separates the samples you can use for training the base caller from training the modified bases, which is really important. And it takes very little time over the regular base colors and feeds off those outputs. So just as a high overview of how remora works you see in the top here is an example signal of a full read for our CPG model we go in and find where we have a CPG call within that and you pull out that little bit of signal so that importantly what goes into the model is what you see right Let's see you can get that so on the bottom here is the sequence and the top is the signal so it's just this little slice of this big read that goes into the model and that means that all that that little bit of signal is the only thing that we need to match in a training samples that's really important for introducing really complex samples into training modified bases in the 904 base caller, and we get these high accuracies from this, we also employ scaling to give us a little bit more accuracy here. So one of the big themes that we've heard so far and this is, these are all for our DNA, but I think a lot of this translates to RNA modified base training here is the data types are super important. So these are the main five data types that we use. So we have the synthetically created all ago nucleotide so that we've talked about here. And we also have synthetic randomers so here I'll go into this in a bit more but where you have random stretches of bases either side of a known modified base in a small context. And then we have the biologically derived obviously native enzymatic modified bases and second strand doping. They have different advantages and disadvantages for training and modified base detection. So the modified the biologically derived native, that's what we want, right, we want to be able to detect all this in native samples that's that's the goal, but they contain mixed modified bases and no concrete ground truth. I would even challenge before we said that in DNA we know the modified base of a cell even in a cell line, you get a single site in the genome that's 50% modified in a completely homogenous pool of cells so why is that right, even the simplest cell lines are still complex in DNA. Second strand doping again so a little bit complex we have modified bases in all contexts but we don't actually know where the individual modified bases are. So it's useful for some tasks of training and not for others. Finally enzymatic where we've so we've used this a lot for five mc and five hmc detection, where you just get every C in a CG converted to five mc or five hmc. So our analytics group has done a lot of work to get this reaction really highly accurate there. So this synthetically created these have been really important for us, the synthetically modified oligonucleotides are really important to show that we what our accuracy is the even the ground truths in alternative assays are not accurate enough for us to know what our accuracy is. So we had to use samples like this to show that we're have higher accuracy than by sulfite sequencing. Finally, this is sort of our holy grail of training samples is this random or the problem with a random or is that you don't actually know the canonical sequence. So we've used our duplex sequencing where you sequence both strands of this molecule can see up in the top so you get the random canonical sequence. And now you know where your modified bases within that so this is really an important sample for us for training data sets, and it's important around the remora that we can take a tiny slice out of this as our base callers are trained off of much larger chunks of three to 500 bases, and you just can't have good solid ground truths for three to 500 base pair chunks so we can get 50 60 base pair chunks with these randomers. All right, I'm going to go through the DNA models relatively quickly. I think of less interest to this group but hopefully of some interest. So the big one that we released in December was a five MC plus five HMC so two modified bases at once this is sort of where we're moving to detect multiple modified bases and this is obviously incredibly important in RNA, but the problem becomes harder right it's easier to tell two things apart. So we do take a little bit of a hidden accuracy but if you collapse that back down and look at it like you would by sulfite we still get the same accuracy out of that. And again you can see in our all printed oligonucleotide that we have real this is how we measure all of our accuracies that we quote are all from printed oligonucleotide so not from an enzymatic sample not from our training data these are completely held out of our training to get these accuracy numbers. We get the slightly lower accuracy but we get really good when you get these highly modified you can see in the top is the ground truth printed so these are sort of we wanted to show how good we are when we get a more real life sample. When you get HMC and 585 MC the M is is, and I'm in the weeds a bit here but M is five MC and H is five HMC. And you can see that even when they're densely connected to one another and the signal is obviously affected across we as we've heard from several people. And you can still detect them with really high accuracy over 90% accuracy on the single read single base level, not stacking up over read this is on the single read level that we're still over 90% accuracy and these really complex samples. And, and even beyond that so you can show the initial one that we released a year ago on the R9 for the kit 10, we were a slightly behind by sulfite so you can now with the R10 chemistry that we're really moving towards for DNA. We're a good chunk better than the accuracy we see from by sulfite so we took the same ground truths and sent them off for sequencing effort by sulfite and we were showing that we're better than than by sulfite so we're really excited about that. So again, this is sort of where we're looping in a little bit more into the RNA modified bases is getting these all context models so that the enzymatic samples were used to train where we could modify every C in a CPG to be five MC. Much harder is getting a fight getting five MC with a ground truth in every single context obviously it won't be every single you know we're never going to get every single 50 murders the combinatorics gets too big. But you can get a sampling of that from these randomers where you print it with random bases and then you can derive the sequence and train models and we've seen really good results from that we haven't released these models yet but we're coming very soon with the five MC. And the accuracies are looking very good. Same same idea for six ma. We're getting really good detection of six ma on these on the ground truth printed samples. Yeah, sorry, same on six that was five MC six ma we're again seeing really good modified base detection we've been also just shout out to the developers at IGV for these visualizations of modified bases and, and everybody working on the file format behind this has been a huge effort to the last five six years. And finally, getting looking at new modified bases that we can't get models for as easily so this is a showing a doxo G up here at each of these sites and you can see shifts in the signal that the red is modified and the black is canonical. And these when you start getting into more complex modified bases you get you can run into issues so we end up with actually miss base pairing into an a doxo G so we were wondering why we were getting the wrong base in the middle here and it was actually because the wrong base was incorporated and so it was calling the right base and we had to fix our training data to have the correct base there and in the randomers so that it gets it does get quite complex. So we had some upgrades to this remora software. Some of these are, these are pretty boring for the most part, but makes remora work a bit better so this was released in December. Importantly we have simplified training so the signal comes from one file the base calls come from another and that's how you create and train it's really two files. And that's really how we create our training data sets is the signal files, the base calls map to the reference that produces your training data sets we've really tried to simplify this down as well as we could. We've also this is just a little, I, you know, love little bits of nanopore and we can actually sequence both strands of the thing and get duplex sequences you can actually get the status of modified bases on both sides we can actually really detect single molecule, hemi methylation. I just think that's very cool less applicable to RNA, but very cool. I was hoping to get this upgrade released before coming here. This was this meeting was a good kick in the butt for me to get working on RNA again. But we got we have per read signal signal visualization so you can get these plots from the command line with one command you pass in your pod five and your, and your map bam, you get these signal visualizations, you can highlight different motifs of interest. And you can also pull out every single read in here and the levels that you get from this all from this new API. And also, so this is a lot allows users to quickly produce test so this is just a simple T test on the level means from each read and you can quickly see that you the modified bases at the CG sites quickly pop out from a simple T test. But for RNA probably going to need a little bit more than that but this interface will make it much easier for users to work on that. So direct RNA sequencing what everybody here is really interested in jump into the good parts here. I don't think I need to go too much in but RNA direct RNA sequencing, we do second strand synthesis and attach adapters that sequence just the RNA from the three prime to the five prime and, and all of the again the big advantages of nano for sequencing come right along with RNA bases, and we're looking into that so again this is the big one coming that I've been working on a lot recently I'm trying to iron out the bugs we should have this really soon released is visualization of RNA signal. From from the remora so from the API and from the command line we can get these and you can get start to get your stuff right from the simple pod the signal file and the bam right all that comes in from from really simple commands, you can get really powerful results and people can build off of this. So here's a couple this is actually from the nanocompore Adrian ledgers at EBI and you and Bernie's lab. This is work they did that publicly available. We reanalyzed it recently. And you can see, again, red is the modified bases with each of these at the different sites here, but you can see the red and the black starting to separate these particular positions but there's much more signal in there than just this the shifts that you're seeing here that we're looking to pull out. So this is really just again the new beginning of RNA modified base detection. And with that, this was presented in December to nothing particularly new here but really excited about this new RNA kit that's coming out. I'm not production I mean I'm in the research so I don't know the exact release date of this or when this is going to get into users hands but really soon we're going to be getting much more data with much less input and much faster and the signal is much cleaner. So we're also from the release kit right now, we're getting much higher accuracies. So really excited for so three x improvements in output quantity with 10 x reduced input, and you get much more accurate bases. We're really excited for this reboot of RNA, and especially excited to hear this crowd and what everybody's going to do with it. So quick summary of the models we have built into Dorado the new base caller, all context five MC and six MA and I think those have a lot of lessons that we can put towards RNA modified bases, and the upgrades that were that we're looking at an RNA. So real quick just sort of like my RNA wrap up so my mom's in real estate agent so the first real real estate is location location location. So three rules of training modified bases sample sample samples. The algorithms are great, but it's really about robust samples and I'm really happy that that's a point of emphasis here, robust ground truth samples, and then modified base content matching the target. At nanopore we're going to release models that are really robust they have to be robust because I have to work for a ton of different situations but we also I think really importantly want to enable the community to be able to train these models from other samples so if you have a really specific motif that doesn't make sense for us to release that one because it's so specific. It's hard to communicate that we have 20 models for 20 different specific motifs or actions but we want to make it easier for the community to develop those models and have them as high accuracy as we have them when we release them. So looking at things like the modification modification diversity and proximity calling three mods right next to each other is different than calling one modification in isolation you have to be aware of that diversity of modified bases tRNA is probably going to be a dedicated solution where you have five mods right next to each other they're all different that a mod that detects six ma by itself won't work for that. Single and multi read, do you want to look for a particular site do you need per read what accuracy is needed to answer your biological questions these are all things that I think are really important to flesh out with this group. And then yeah obviously the potential algorithm improvements but sample sample samples first. And thank you for everybody. Thank you Marcus. Thank you for the seat here. We now have time for questions. I'll be starting with the questions online but feel free to come over. First question is from a committee member. How can the field be expanded so that there is competition for direct sequencing technology so the questions being asked are broadly for all the panel members. Anybody can choose to answer. Somebody online wants to jump in first. Please, if you have a question or anybody. Sorry, answer. Can you repeat it because it actually painted in the middle. Can you repeat the question. Sure. Question is gone now online so I have to try to formulate it. So I believe the question was what kind of competition should be brought out among the other companies so that there is competition for nano poor sequencing so that there is more. Developments in this broader area. I probably butcher the question a little bit, but the general context is. I guess briefly that I would suggest that we're just competing. I mean, at nanopore obviously competing internally. That we want to improve things and that there is a new dedicated focus on RNA and that we are very. I coming my background, I worked at mod encode and worked at RNA, you know, a lot of RNA studies and I'm very excited that that we have a dedicated pushback on RNA. So, that doesn't necessarily answer the question, but I think some internal competition from us inside nanopore to improve the technology. I'm excited that we're doing that now. So this timed up well for that. Any other panel members want to add to this. I don't hear any. So let's move forward. The next question, or there's, yeah, go ahead please. Hi, I have a question for you and another question for Marcus for you. I think your slides show very good performance for detecting the M6 a. We see a lot of slides showing that. How about the others multiplication, because we can see a few selected example showing the signal change or the error profile change. Knockout of result knock out those looks good. But how about the genome of transcript and was evaluation. What would be the second best or the suspects. Yeah, that's my questions for Eva. And the questions for Marcus as the, well, when we consider is detecting to detect the. data from the very raw data. That means that we have to keep the raw data's in our hard drive. Well, that's a huge size of raw data. I think it's a few hundred gigabytes per day. So, so any possible solution, your, your company thinking about us to solve this problem. Yeah. Okay, so yeah, thank you very much for the question. So yeah, so we did test a transcript on one and I mean the IDP snapshots of course are always of some selected examples right. Actually, the way of selecting it is because we chose examples that are high so committed to they are visible right because most of the M6 a sides actually are low stoichiometry. So by I would be hard to even see, but you that's why I added the densities which kind of showed that we're finding a median of like 10% let's say, modifications stoichiometry across all sides right. So some are highly modified some are lowly modified and median but that that was by there was a distribution right and then in the knock out we were seeing like something like one person which is probably our false or median false positive error, or the fact that there's some background in the knockouts, which is also a discussion that Sammy Jeffrey was bringing up as well in parallel in the different publication right so, because some of these knockouts are we're actually even using publicly available data right so I didn't generate myself actually the knockout that I, I showed the data for I use public data to illustrate the example, the usage of the base color. So, so that was with respect to the applicability the transcript and what I think it is applicable transcript on white. That's one point and the others about the applicability to other modifications, it kind of depends on, you know, like a bit on the training sense that you have. So, so this is actually like, I don't know. So we've been testing other things for example like we've been testing for example as for you as an alternative example but then there's other modifications that maybe are more tricky to generate the truth or, or even to generate them into any kind of synthetic context right so I think that that that basically for the limitation is at the moment in in in the generation of some data set right I think that what we have could work for other modifications as long as you had decent training status. Yeah, real quick for every nanopore, the storage, the data is in the raw signal. So if you want to take advantage of improved algorithms. There's not a whole lot we can do and there is extra data in the raw signal itself so even storing just the signal intensity levels for each base, right, you're going to lose information because then you can only apply the HMM type approaches and if you're not if your reference improves and you want to match to a new reference right and put signal levels on a new reference you can't do that right if you if you've thrown away the data so the thing I think is mostly the really important samples that you can't reproduce. Hold on to the raw signal for data that that's one of our big even internally right we struggle with our data queue and our servers. We keep the important samples around and we reproduce the ones that are easier to reproduce. Yeah. Thank you. There's a lot of questions online as well so we're going to alternate between online and in person. The next question is from a committee member the goal to sequence single molecule transcripts and map modifications will take time and resources what are the short term opportunities what milestones can be achieved in a year or two is there the market where they're to reach these milestones. I don't know if I can answer that one but to expand on it is what are the highest impact things. The easiest thing to say is, you know, m 6a and everything so you're in every context and those are hard. So what are the what are the defining I think that's really important here is defining what are the next steps what what would be the most useful small step. Right, so that we don't have to hold on and wait for the perfect thing that works everywhere we can work on what the next small steps I don't know what those are. I work on the algorithms for the most part have a have a past in working in the field but yeah so knowing what the most high impact for the most people, I think is really important. Anybody anybody else on the panel want to answer that here and so we'll move forward. So, so I'm going to go with the same philosophy and ask question one from Marcus and one from the academic lab. So question for Marcus is there's there's obviously a high number of RNA mods and the commit combinatorics of you know detecting that within a context of another of not only one particular sequence but also like all the other sequences gets enormously challenging so what I guess it would be nice to know what what are the most important features signal features that actually contribute to various modifications so you know within the machine learning you can extract those features and it would be nice to kind of discuss that more. I would say, and then I have a question for for sure. Hi, this is many. So my question is, regarding the RNA modifications that you've detected and I think this is a very clever system with the burnic acid. You know, but this lens the, you know, kind of you start to think about exo sequencing where we basically digest. You know, each base and then read it using the nanopore and then the question then becomes how efficient is that first of all the capture process of a monocletide into a poor but then the second question that you kind of introduced here is. And, and I don't know what the answer is that's why I wanted to ask you is what is the efficiency of the chemistry between the baronic acid and the, you know, the dial so let's suppose you had one molecule that definitely went into the poor, what is the chance that it would actually interact with that baronic acid. And I'm not saying you have the answer or you should have the answer but it's a good thing to consider because if it is very high efficiency then one can consider exo type sequencing where we basically digest and detect kind of like a mass spectrometry with the nanopore so I don't know. Long question. That's fine. I can go briefly. I would say we're actually going the opposite direction instead of understanding the features is matching the samples to what we want to detect right matching the all context models to what we actually want to detect in the sample. So it's a bit, it's not so much knowing we don't need to know what the features are, we just need to know whether it's modified or not. So letting the machine learning algorithm extract all the information that it possibly can and not trying to understand it, but matching the mod this sample sample samples matching the samples to this that what we want to detect that to me that's way more important than understanding it we don't need to understand the molecular dynamics of what's happening exactly we just need the most accurate modified based detection algorithms possible that's my take on it. Whether that's helpful or not. Yeah, so please somebody on the show. Yeah, okay, so so I can answer the question from many so you know, your voice is so unique I can recognize your voice even if I don't see you. And also, so to answer your question, I think the, the big advantage of our strategy is that you can recognize different RNA modifications, almost without any producing any error. I don't know how many an MP is passing through the pool because if I don't detect it. I don't know whether it's passed, or has been captured or not. So I don't know this answer, but I know you can actually tune this capture rate by changing the pH. So in our national paper in the si actually have like a pH regulation. So if you move to like pH nine or even further, your capture rate will be like a crazy, but your signal will be like more noisy. So that's why we choose like a more neutral pH in this demonstration, but if you're worried about capture rate. Well, we can always like modulate this kind of capture rate efficiency by balancing is like signal to noise ratio and your capture rate. And regarding the, the exo sequencing question. So I don't have an answer at this moment, but I think if we do have like a solid, like set up and the strategy to, to match the actual accident place activity and this kind of capture rate. I think at least it will be much more accurate and much simpler to detect different modifications than the current strength sequencing approach. But this read lens or like sequencing speed. I'm not sure, but we will see. And another advantage is that it can detect like basically can do sequencing in the noble sequencing manner. So even if you have like, like, already modification that you have never deal with. You don't have the database, you can detect a kind of bizarre signal that you have never seen. So that actually produce like much more information than this current strength sequencing strategy. I think. So I hope this can answer your question. Thank you. Last questions. One question from the committee member. I'll try to make it a two part because I am also interested in some are all modifications that have been identified in studies truly accurate. How can we reach better accuracy in our field. This also, I'm going to make it much broader as the number of modifications in a single base color increase the accuracy of the base color drops. So what is a reasonable number of modifications that you think would be able to be captured in a single base color and what kind of resources would you need to be still be at a 98 or 19 kind of accuracy. I think that answer is probably different for everybody in this room. I think everybody has what they want to use their RNA modified base color for and I think it really is about the biological question and answering your biological question with the right model. And knowing what is your biological question I think that's one of the problems is a lot of, especially with our DNA modified modified base color people get it and it's free. And that we're getting the point where that works with DNA where you can just get it for free go back later and say oh now I want to do a modified base study so I can do that. I think for RNA modified bases there's so many there's such a wide diversity that's a much harder task and that there do need to be specific models for everybody's task. If you have tRNA you have a tRNA specific model if you have you know m6a in all context that you know that's the first one we target that that would be my take. Any other plan members want to take. Okay, the next question. Hi, this question is for Ben because I thought someone should ask a question about mass spec. I got with an absolutely beautiful talk and I was really this is Chris include more by the way I was really interested in your I am MS work. In particular we had at the beginning of this meeting thought a little bit or someone asked a question actually think it was you asked a question about RNA shape. I think that mass is traditionally or not traditionally has been used a lot to look at the 3D shape or overall fold of proteins that can be giving you information like sacks. Is there any thought about trying to use this sort of technology to do that with modified RNAs. Yeah, thanks for the question. Definitely, I am mobility has has been used to, you know, protein side other molecule side to look at you know confirmations and shape and get very low res rough structures of molecules so I think that could be a natural, definitely a natural next step. The type of I am mobility we're using this fame is not the most highest resolution for that type of work. And so probably couldn't use the type we use because that's really more, more for separation of charge state and, and, and size, but there's other I am mobility with different gases and different ways you can do it. I think it could definitely start picking away at rough structures and shapes, and you know if the modification is in the right place that affects the shape or structure yeah no I think you can definitely do some work on the mass spec side to you know see the effect of post transcription modifications on RNA. Is it like I'm a mess system or. No, no famous is just a front end electrical system so it has. Yeah, it's a different type of I am mobility and that's why it's lower res because the cyclic allows you to, you know, do multiple passes. Yeah, you can separate for a long time and the longer you separate the higher the resolution. And so that's actually where you want to go if you want to use I am mobility to, you know, look at shape structure small changes, I think that is the best bet. Thank you. Thank you. Well, well, well passed to work 12 p.m. stop so let's thank all the speakers. 1215 Oh sorry, so I didn't realize we have more time. So yes, it was my bad. Yes, please go ahead. Thanks. Also another mass spec question for Dr Garcia. I'm also very interested in the work from Ryan Flynn at a proposal lab on the possible discovery of glycol RNA, and I wonder if you can elaborate more on the source of the material that you showed there that source the glycol RNA and any information you might have on those linkages. So how was the detection done. I guess the confidence in your in your results is would be, I'd like to hear about your confidence and results and what are you just detecting oxonium ions or how is that done. Thank you. Yeah, yeah, so I'm most of the work that I showed here and most for proof of principle they're either from like hella or 293 cells I don't know exactly. The data showed I think it's probably 293 cells and the linkage is a big question, because we haven't solved that one yet. As I kind of showed in the spectrum and we when we tried to kind of go in, we can tell it's a legal nucleotide, but it has a fairly large linkage on that not something small but something probably, you know, a couple hundred Dalton's or so. So what that linkage is we haven't quite narrowed in on it we're actually we are working with Ryan right now on some things so we're hoping to kind of be able to solve that he's come up with some nice ways for him to enrich these. That data I did not show this is just kind of data from from cells. But yeah, that is a kind of a really big question is the linkage and one that we're not quite there yet. Yeah. In terms of confidence I mean we've done a lot of controls and experiment experiments kind of didn't show a lot because we only had a few minutes but you know we've done a lot to convince ourselves that this isn't some kind of like contaminant coming from like just free sugars or, you know, like a glycoprotein, you know the postdoc I have working on this comes out of a big glycoprotein lab in Italy to Labria's lab so he really kind of, you know went through the steps to make sure that there's absolutely no protein contaminations at all in this RNA preps and other types of contamination so we're pretty confident. From our end and what we've done what I showed that, you know what we're seeing are, you know, some type of glycan modified RNA, and then in separate work with with Ryan and his lab. We see a lot of this the same patterns as well and he's even a completely different approach for enriching those a little new those modified a little nucleotide so we're pretty confident that we are seeing glycan modified arenas. Thank you Ben. Next question is from a committee member for Marcus what's driving Oxford Nanopore interest in sequencing RNA modifications. What's driving it is this room. People being interested in it obviously. And yeah I mean, what's driving it is especially the renewed interest because we're discontinuing the R9 poor. So the previous RNA technology was built off of the R9 flow cell so since we're discontinuing that it was a natural time to switch over and really dedicate the effort to do an RNA poor. We were built off of the R9 chemist the R9 flow cell and we developed an RNA kit that that obviously was the DNA flow cell right uses the same flow so which has a lot of advantages, but one of those isn't the highest accuracy RNA. The impetus for doing it now is is moving to the kit 14 the R10 flow cell for DNA, which doesn't perform as well as RNA so reengineering and really looking in the advanced research team has done an amazing work to look at the getting that accuracy bar up is a lot of work from from a lot of different experiments internally to find the right poor for RNA to really excited for all of these algorithms that what we can do with this new lower lower noise system in RNA. So if I understood correctly, Marcus you said the speed of sequencing has increased from 70 to 140 base pairs per second, the 10 times decrease in input sample three times increase in throughput. What about the cost. I unfortunately in research so I can't answer that. Thank you. Please go ahead. I have a question for the, for the nanopore folks on the panel so it seems like a major challenge going forward for de novo base calling of of modifications is going to be training data sets. What do you guys think is the best way to address that and I'm also thinking to our post lunch talks on things like all goes all the nucleotide synthesis you know what would be the ideal way to address this and maybe also what is the most practical way to address it. Maybe I will maybe we're ready even though that the question is addressed to the nanopore people. I think that here actually, it should be a joint effort between nanopore people and mass spec people and solid phase synthesis or whoever people right. Because for example, when we get an only go even in the case of synthetic only goes right. You make the assumption that the one that the only goes modified 100% and I personally think that's a wrong assumption. And actually, there was this paper by Shraga Sparks in AC4C where he actually took the time and with collaborators to actually, you know, look into the AC4C modification levels, and what he found was 50%. And when we're actually training our models we actually don't see 100%. So, is it because our models are inaccurate or is it because the only goes actually are not 100% modified. I actually think that it's, maybe I mean, I don't say the models are perfect but I actually think that the modified sites are on, you know, are not 100%. And this is actually a very important limitations that's why the training says that we use need to be or not properly validated by mass spectrometry in my opinion, because, you know, like, it's actually not as accurate as we think in terms of stoichiometry and here, because we get single molecule resolution, you actually want, you know, like if you label read as modified and it's unmodified that really messes up the outcomes of the of the predictions because you're driving crazy the algorithm by incorrectly labeling it. And so I think that this is an important thing worth discussing that I think needs to be a joint effort and as a solution. And that's, that's something that we're also looking internally is dealing with samples of lower like lower quality, dynamically through training removing samples that might be incorrect to try and not steer the model towards those and make the model trade on those incorrectly labeled samples of building things like that into training but no, it really is about the training samples and I think that's why we've been pushing so hard on the DNA side for these these synthetics, the randomers to produce the training sets per for modified bases and there are ways that we can sort of shoehorn that in for RNA, it is much more complicated RNA duplex is a much harder endeavor, but not impossible. So we're looking into ways to adapt the random or approach that that is working for DNA now to work for RNA but I definitely agree orthogonal validation is absolutely key. And, like I said, the native samples are generally what we want these things to work on, but the ground truths there are tricky, especially with RNA where everything is transient. Yeah, it makes it much harder to get real solid ground truth for RNA. So that is the challenge and the excitement. And the last question from a committee member what are the modifications that we can feasibly sequence on reliably five years out 20 years out. I mean, all of them. It's just a matter of the effort to put into them right is all of them are feasible. I mean for nanopore signal it's in the signal and the same as shoe it's in it's in the signal right. It's a matter of what effort you want to put into each one right and that's why we're pushing to make it easier for the community to do it. That we're going to push to get the samples that you need for the way we do it, and then the how you would extend it. There are different types of samples and knowing what's it what it's applicable to but the signal is in there we just want to make it as easy as possible for the community to try and extend what the work that we're doing. Maybe just elaborating a bit on synthetic standards so you need a synthetic signal to develop the base colors. So, not all modified arenas are even available at this point, a synthetic RNA molecule so could you elaborate a bit more on your random or approach. What's the length of your animals and how many modifications are embedded and how do you justify these are the perfect molecules that would capture the context. So they're definitely not perfect molecules so I mean that we're working very closely with synthetic oligo printing companies and I mean even DNA right for MC is very hard to print with the fossil amdic chemistry so that even DNA has lots of limitations. It's not a perfect solution. It's, it's the worst. It's the best worst solution we have so far. That's, it's not perfect, but it's it's something that applies to a lot of things. We just got kind of lucky that enzymatic CG is things most humans care about and we had a nice system to springboard off of. But yeah, I think there are definitely not challenges on the synthesis side and that's what yeah we're very excited to work with all the synthesis companies and the ones that don't have public products we're excited to work with everybody on that because it is really important. Yeah. Right, let's thank all the speakers. Thank you to all the speakers. I'm going to quickly say that we're about to break for lunch. There has been a small issue with the lunch order. The company who we ordered from messed up pretty badly so we think we have enough food, but we are not 100% sure if it is not if it is the case that we don't have enough food, which is possible. We're going to do a little rearranging of the schedule and try to get some more food in here after the next session. But we'll, we'll do some checking first so we really apologize. There was some miscommunication, but please grab food, if you can. And I came to start our next session. Thank you. Welcome back. I hope everybody was able to get some food and now we can get started again. So my name is Keith and I camp I'm a senior director of genetics and data science with invitae corporation based in San Francisco invitae, just for those of you who don't know is a large scale provider of genomic information, which is used by our customers for improving health care decisions. I should be moderating this next session on the research and development around standards for sequencing and mapping RNA modifications. As we've heard multiple times today already when thinking about how we progress towards some complete mapping and sequencing of all possible RNA modifications. To make it clear, the community will need a robust gold standard reference set of modifications with clear and concise quality metrics for confirming the presence of new and existing chemical modifications through these experiments. So with this in mind, we'll first hear from Sean Feng, the show BP of R&D with tri link biotechnologies. Sean Feng will be discussing successful approaches and limitations of oligonucleotide synthesis with chemical modifications. Hello. Can anyone hear me. We can hear you and we can see you. Welcome. Hello. Hello, we can hear you. Hello. Can you hear us? Hello. Are you able to hear us? Why don't we start with Gia while we work on getting champagne up and running. Okay, sounds good. Okay. So, should I start? Okay. Yeah, so we'll then move on to Xia Xing associate professor at the R&D Institute in University of Albany. Xia will be sharing with us his work, synthesizing and for methylated cytidines and using these modified RNA all goes for structure function studies. So it's going okay. Yes, we can see it. So, yeah, I'll start now. So, thank you very much for the invitation and the opportunity to speak here. And it's great honor to be here and share with you some of our work about and for MesoC. So, I'll also start from the essential dogma of molecular biology. And to this audience, I don't need to emphasize more about the key roles that RNA can play in gene regulation environmental interactions and also in human diseases basically everywhere and the new view of RNA as essential actors in cell has also lead to the great interest in RNA targeted drug discovery. So nature use two general strategies to diversify RNA structures and functions. One is through chemical modifications and based on this model mix and our modification database located in our institute. There are over 170 or maybe more chemical modifications to decorate RNAs in all the domains of life and many of them play essential roles in almost all the biological processes and it's also believed that these modifications are the most evolutionally conserved properties and relics from the RNA world where they may have enhanced the chemical diversity of RNA to protein and the other strategies that they can fold themselves into well defined structures that are mainly stabilized by both Watson Creek and none of the non-clinical pairs and also other tertiary interactions. Therefore, studying chemical modifications and those base-based interactions in RNA is important for the further studies of their biological functions, the development of new therapeutics and the research in the origin of life. So with this goal, my lab is working on a series of modifications from chemistry point of view and since I was changed as organic chemist and those are a few examples of our recent work were particularly interesting the synthesis, based paralleling patterns, structures and functions of those natural modifications and we're also interested in developing new molecular tools based on this chemistry. So today I'm going to focus on this M4 methylation work and use this to lay out our working flow. So as you know, methylation is the most abundant modification in RNA and shown here a variety of methylated restylates found in all the regions of tRNAs and the ribosomal RNAs as well as messenger RNAs and it's not surprising that many methylation work are closely associated with human diseases. For example, those M6A and 5C and 2-prime or meso restylate have been identified in SARS-CoV-2 affecting its overall virality and their unique right protein such as RBM-15 have been proposed as a COVID drug target and some of them have been used in RNA therapeutics as N1 mesozoid U in the messenger RNA COVID vaccine. For example, although several methylated residues have been widely studied in terms of their detection, sequencing, profiling, structures and the biological functions but many of them remain illusive. So we are interested in this M4C and it's common in natural DNAs and play key roles in gene regulation in RNA. This M4C mainly exists in ribosomal RNA and has the function to stabilize its folding and protein interactions. The right enzyme is MH mesolate C to M4C and further to M4 to C. Recently, M4C is found to introduce M4C into human mitochondria 12S ribosomal RNA and is involved in mitochondria protein synthesis providing a potential new drug target for the treatment of mitochondria disorders. In addition, M4 to C was uniquely detected in viral RNA from Zika and HZV viral and their infected cells. So we would like to study their base pairing and the structural features in RNA. And since this M4C directly participates in the wasn't quick pairing as shown here, one direct consequence of this methylated nuclear base is the effect on base pairing stability and specificity. The single methylation might be able to either retain or disrupt the hydrogen bonding between C and G depending on the conformation of the methyl group. While the dimethylated M4C should disrupt the CG pair with a different base pairing pattern and other mismatched pairing like those CT pair could also take place. In addition, the methyl group could also affect the enzyme interaction and recognition since they are in the major group. And so we started the work by synthesizing the two building blocks. Here's the synthesis of M4C phosphoramide from the celerated urethane, the activating of the C4 position, followed by the treatment of mesoamine and the the isolation provided this key intermediate S3 which was selectively like this celerated tritulated and finally converted to the imidate building block for the synthesis for the solid phase oligo synthesis. And similarly, we started the synthesis of this M4C imidate from the dimethylation of celerated cytidine. The compound was selectively this celerated and converted to the final imidate for the solid phase synthesis again. And both of the building block are well compatible with the regular solid phase synthesizer, DNA synthesizer and the purification conditions. And here shows a bunch of RNA sequences containing these two modifications and confirmed by OS HPOC profile before the purification showing the coupling yield are very similar to those of the native strength. Then we did the base pearling stability and specificity studies. Those TM data showed that M4C remain a regular CG pearling and has a relatively small effect on its pearling stability in RNA duplex. Then the M4C disrupts the CG pearling and the second can decrease the duplex stability. This is also a result in the loss of base pearling discrimination of CG with CACT and CC mismatch as shown in those like difference in the TM curves and also those number here. So, and we are very lucky to get some good crystals and solve the structures for both modifications and here are the overall comparison of the backbone conformation and the molecular packing of each duplex. I'm not going to go through the details of those data collection and refine statistics just show you the density map and base pearling patterns of these two residues and all the muscle group are placed in the base plane as and as expected a single methylation has very minor effect on the geometry of the normal CG pear, although the presence of the muscle group disable the M4 item from forming another hydrogen bounding from the side of the major group, which could be important for RNA protein recognition. Introducing a second muscle group cause more severe perturbation as shown here to accommodate the two muscle group and avoid the steric clash the CG pear is shifted to a wobble like pattern with only two hydrogen bound. So the shift also lead to the big change in the ramp down angles and also slightly decrease of this C1 distance which also changed overall stack interactions between the base steps. So this pattern indicate that this dimethylated like amino group in the structure may actually present as the iminium chitin only C4 position as shown here of course it's also possible at least charged chitin from form can switch to another pearling pattern only like one hydrogen bounding on the neutral or basic conditions. And then we conducted the MD simulations to study the dynamic properties of those hydrogen bounding patterns in the structure, this figure shows the distribution of the hydrogen bounding numbers between the base pairs indicating that both C and M4C have normal average three hydrogen bonds but the double muscle related pear has average 1.5 to two hydrogen bound meaning that M4C exists as a mixture form in this duplex and this figure just shows the average number of hydrogen bound for all the base pair in the duplex and indicating the structure perturbation caused by the dimethylation is mainly local to the modified basis and this also consists of with our crystal structure studies. Next we studied the impact of these two muscle related residues on reverse transcription using this model with both native and modified RNA template using fluorescence gel. When using NVRT which has relatively higher fidelity was used in the system, the reverse transcription reaction completed in the presence of all the natural DNTPs with both native and M4C RNA template forming the same fluorescence product. Well, the M4-2C totally inhibit the NVRT activity and shut down the DNA synthesis. When we use the HIV reverse transcription test which has relatively lower fidelity than NV, both M4C and M4-2C template can give fluorescence product in the presence of all natural DNTPs. That's the native one indicating the modification do not inhibit the HIV-RT activity and interestingly both muscle related residue are able to decrease the GTP incorporation but increase the TTP incorporation especially this M4-2C case. And so it can induce in a potential G2T mutation during the first transcription and this could also provide a potential way to sequence these residues. Right, so although a more in-depth sequencing profiling and in vivo studies still needed, we can draw some preliminary conclusion here with our synthesizer two building blocks so M4C can retain the regular CG-pairing pattern but M4-2C disrupts both stability and specificity. Both residue could increase the CT-pair and induce a potential G2T mutation during the reverse transcription using HIV-RT. When using NV-RT, the methylation could either retain or completely shut down the DNA synthesis. So this result indicating that methylation at M4 position of the cytidine could be a molecular mechanism to fine-tune base-pairing specificity affects the code and fidelity and increase the mutation rate during gene replication. And we also did a similar study for a few other modifications like M3C, 5-hydroxomethyl-C, 5-formyl-C, 5-cyano-C, and tRNA-dryanolation. So for a long-term goal, I hope we can just make some contributions to this RNA-IP structure research area by uncovering RNA functions through synthesis and some structural studies of those natural modifications. And currently, one of the major challenges is that many of these building blocks are not commercially or even not synthetically available. So a lot of work, including the nanopore sequencing, previously are restricted by those materials from nucleoside to nucleotide and to the oligonucleotide level. So but hopefully in the near future with more synthetic effort going on, the complicated landscape of those residues in life process will be much clear. And in the end, I just like to thank my students and my collaborators and also the bean line scientists from HCS and also the funding agents. Thank you very much for the attention. Thank you very much, Jean. So just as a reminder, we're going to hold questions until the end. And also, if you have questions that come up during the talks, feel free to post them in Slido so we can prioritize them at the end. I think we'll see if we have, do we have Shanfeng An? Hello. Yes. Hello. Can you hear us? I can hear you. Yes. Great. Thank you. Welcome. Great. Thank you. Can you? Other way, don't see you. Yeah. There we go. Yeah. Yeah. Thank you. I don't see you. The PowerPoint. Yeah. Okay. Great. Can you, can you see the screen? Yes, we can. Okay. Thank you. I think although right now, maybe, yeah. Patient mode. Or outline mode. All the right mode. Is it okay? The display settings. Everything good. The top display settings, you should be able to put in presentation mode versus outline mode. There we go. Perfect. Great. Okay. Thank you. Thanks. Sorry for the technical difficulty. Thank you, everybody. And good afternoon. Good morning, wherever you are. So first of all, I'd like to thank the workshop organizer for the invite. Here's my talk and I will focus on the oligonucleotide synthesis today. So I will start briefly talk about the many applications of the nucleotide. And then give a brief overview of different oligonucleotide synthesis method. And then I will focus on solid phase synthesis approach. Give some example of the new building blocks. Developing the field. And finally, I touch bases on future oligonucleotide synthesis. And touch bases about oligonucleotide, nucleotide standards. So oligonucleotide, based on application, we can divide them into two categories. The analytical application that oligos can be used for sequencing, protein mix, protein sequencing, or the primers or probes for PCR. Most of you probably know. And there are many therapeutic applications as well. For example, anti-sense drugs, oligonucleotide for the anti-cancer drugs. And the guided RNA for gene therapy. So in terms of synthesis methods, there are three different methods. The solid phase synthesis, chemical synthesis is still the choice of method of today. Basically, to synthesize oligo on solid support, use classical phosphamidide approach and also use organic solvent. Then there is also liquid phase chemical synthesis. So instead of attach oligo on the solid support is attach oligo most on the peg linker, for example. It also use phosphamidide and organic solvent. And then there is a new method, developed recently years called enzymatic method. It's a non-temporate base. So basically use engineered enzymes to add one nucleotide at a time. Pretty much like sequence in next generation sequencing by synthesis. There you read one base at a time. Here you add right one base at a time. Usually use three prime modified, three block reverse potterminator. And the reaction is done in the aqueous media. The major drawback of this approach is that not all the enzymes can take all the modified nucleotides, which is available today. So in terms liquid phase chemical synthesis, enzymatic chemical synthesis, they have their own challenges, but there's different presentation, different day. The overall goal of this approach is to have more environmental friendly process to make pure, cheap oligo nucleotide fast. So I want to brief describe oligo nucleotide synthesis process on solid phase. I'm using a common-scale, you know, RNA oligo synthesis with two prime TBDMS approach. So if I can show the screen here that we still start with nucleoside on the solid support. First step, the deprotection of DMT, you have five prime hydroxyl group. And then you cup a one nucleotide base at a time. And then oxidize the P3 to P5. And then you kept on the hydroxyl group. And this is the cycle of direction. When we are making hundreds, you basically repeat this cycle 100 times. And after you're done with synthesis, you cleave the oligo here from solid support, and then you go through deprotection and purification. So as you can imagine, the oligo synthesis yield is very much dependent on the synthesis efficiency of each cycle. And the graph on the left side, this is published by my colleague at the trial link. So the x-axis shows the oligo length, and the number on this line shows the overall efficiency of each cycle. And the y-axis here is the full length of the oligo yield. So imagine if you do 100 more, and your coupling efficiency of each cycle is only 97%, you pretty much end up not, you don't have much full length oligo end of synthesis. Like I mentioned, because each of these cycles have four steps, you really need high efficiency for each direction steps. So for example, if you have DMT, not 100% deprotected, you will end up have a minus one product. Or if the oxidation is not 100%, then you will have chunked species. So if I have not depressed in terms of the synthesis yield, I won't tell you more. So I won't spend a little more time on this slide. So this, the reason I won't do this is I hope for gave you a good sense of what kind of impurity we are dealing with in terms of oligo synthesis. So the quality of all the reagent has impact on the final yield and quality. For example, phosphamidide, the purity of phosphamidide, for example, if we have non-modified or modified base impurity in your phosphamidide, you're likely going to introduce those impurities into your oligos. And the phosphamidide is also in a moisture synthesis, not a very stable long-term. So they have issues there. And technical aspect of the synthesizer machine, you know, machine needs to be very reliable. It needs to be able to pump in and look it out very efficiently. And it also needs to keep it a dry environment. And then there's a solid phase, whether it's CPG or polystyrene, both are good for short oligos. However, if you're thinking about making long oligos, you really need to balance in terms of loading and versus the oligo quality. And then there are a lot of side directions during oligo synthesis. For RNA, for example, you will have two prime 3 prime isomerization. And then you have deputination. And as unwanted, the balsamid base modification from the during the synthesis. All this is going to affect your quality of the oligos. And then there's availability of modified phosphamidide as Jane showed my previous speaker showed you. You know, because the nature of the two prime hydroxyl function of RNA, not all amidide chemistry, you can scale up. So there's also cost associated with that. And then once you've done all the purification, then all the synthesis, then you have, you have to deal with the purification. You know, whether you do DMT on HPL, so even Joe purification. You know, today's technique is still very difficult to separate and minus one and plus one or other byproducts for, especially for the long oligos. And then the characterization. And you can use HPLC for purity, but LC mass for identity or sequence plus mass, but it still remain challenge for longer oligo characterization. So in general, the synthetic RNA organic nucleotides have lower yield and the lower quality than DNA organic nucleotides. Despite all those challenges, there are a lot of success being made in the last 30 years or so. Many scientists and engineers contribute to the success. For example, today, we are pushing for 200 more oligo nucleotide for DNA, 100 more for RNA oligos, even 150 months. So there are also many opportunity. You can introduce specific modification for oligos. For example, you can introduce fast for a style backbone for the anti-synth oligos or anti-cancer drugs, or you can introduce label bioteam fluorescence for better detection of the oligos. And you can introduce specific application, for example, clean cap, which I will touch base later. So there are also industrialized high speed, high throughput oligo synthesizing platform available commercially. And there are many GMP oligo manufacture around the world. They can make a kilogram scale oligos to support many pharmaceutical applications. So I want to show some of our works. And these three fast for amodides was introduced by our sister company, Green Research. I hope you all know Green Research. I think they are absolutely the best in the field for oligo amodides. So the first one here is the lipid fast for amodide. As it says, you can introduce lipid modified oligo nucleotides. Or there's amino modified fast for amodide. You can introduce amino modified to the oligo nucleotides. And this example here on the right is the RNA amodide that has fluorescence. So you can make fluorescent oligo nucleotides. At the tri-link, we are always working on to improve chemical synthesis method and the QC method for oligo nucleotides. Here I want to share with you a thermal oligo nucleotide. We are using this as a model system to evaluate the efficiency of 5-prime triphosphorylation. So on the left graph here is the HPAC results of the thermal oligos made by different synthesis conditions. The green one here is the thermal monophosphate. The blue one is thermal diphosphate and the purple one, which is our desired product is triphosphate. Because these are thermal oligos, you can see nice HPAC separation. On the table here right, it also shows that we can detect the mono-dietriphosphate oligos using mass spec. And the graph here in the middle shows that the triphosphate oligo yield with different synthesis conditions. And based on these optimized conditions, we then use this method, optimize the method. And we made 100 more oligos with triphosphate, 5-prime triphosphate. And you're probably curious what key components was used in mRNA production for COVID vaccine. And I'm very proud to tell you that ClinCap is a part of the first mRNA vaccine developed by Pfizer-BioNTech. To date, nearly 2 billion people around the world have received the mRNA vaccine for COVID-19. The graph here on the left is our ClinCap trimer structure. And the structure on the right is the N1 mesosuduril triphosphate. Tri-link also make this nucleotide product. So for the future, despite all the challenges, I'm very optimistic in an oligonucleotide field. I think with further development of chemical or enzymatic method, or the combination of both, I think one day we can push for synthetic oligos, DNA or RNA for up to 1,000 numbers. And the scientists around the world, like Jay showed you earlier, are working tirelessly to find newer, better, modified nucleotides. And I also think there is a great need to build and establish accurate RNA oligo standards. I'm glad that we have this workshop today. And until today, there is still a challenge to really identify and calculate the long oligo, especially with the modifications. In addition to the mass spec, I think sequencing can be a great tool as well. So here is my colleague at Tri-link. They are doing oligonucleotide synthesis and QC method development. And also I'd like to thank you again, thank you, organizer, to allow us to share our Tri-link work. Thank you. Thank you very much, Shantheng. Okay, next up, we have Jean Yao, professor of cellular and molecular medicine at the University in San Diego. He'll be discussing hurdles for implementing large-scale sequencing and mapping studies in the context of data and computing resources. Hi, can you hear me? Yes, we can hear you and see your slides. Thank you. Yeah, thanks for the invite. I'm delighted to be here on this workshop. I'm going to share some of our experiences in thinking and developing RNA standards for what we call RNA interactomics. Some sort of disclosures of financial interest and then acknowledging the funding. A lot of the work I'll talk about today will be NIH funded and code and core related research. Okay, so in the field, there's been a great resurgence of interest in human RNA binding proteins. There are a fair number of these that have been discovered by many different methods over the last decade, including mass spec methods and computation methods. And so a large fraction of the genome turns out to be encoded to encode these RNA interaction or RNA binding proteins. And RNA binding proteins obviously modulate every single aspect of RNA regulation from splicing to localization, decay, degradation and translation, including also RNA granule formation, small RNA biology processing and target recognition. So just specifically for this specific workshop, I also want to remind everybody that the writers, erasers and readers of all RNA modifications are basically RNA binding proteins, right? And these modulators influence RNA in many different ways, including, you know, as I mentioned, decay, degradation, stable, you know, translation. And so one other slide I want to remind everybody is that there are many methods to obviously identify and detect RNA modifications, you know, many of them are sequencing based methods that leverage antibodies or mis-incorporation or chemical probes, right? And many of these are very similar in our minds, right, as the RNA mods that we've already evaluated in terms of their data quality and their saturation and features of identifying these mods using computational methods. And of course I am not going to speak about the non-sequencing based methods to quantify these modifications. Of course, you know, you have chromatography and mass spec approaches to quantify these different mods in a single RNA sort of basis, less so transcription white. So today I'm going to focus primarily on sort of illustrating and examples from transcriptome-wide endogenous RNA substrates that are recognized bound, modified, you know, by RNA binding proteins more broadly. But of course, as I reminded everybody, you know, all the modulators and writers and erasers of RNA mods are essentially RNA binding proteins. So in the lab and in our mini consulting, we've been very interested in these protein RNA maps or RNA interactomes. And these maps are actually terribly helpful in not only telling us what happens to the RNA as they are bound and when they're bound and when they're modulated, but also they can be helpful in the design of a sequence-specific modulators like ASOs or SRNAs. And in fact, more recently in our own papers, helps us, you know, think about mechanism action of small molecules that interrupt, you know, RNA binding proteins, readers and so on, or RNA mods, right, and help us, you know, also predict the function of these RNA binding proteins as novel engineered effectors. And so, so in my group, we spent quite a bit of effort in developing technologies that enable systematic studies of RNA binding proteins and RNA, similarly, readers, writers and erasers, right, and so we've improved on cross-linking IP methods that allow us to IP a given RNA binding protein digest that we already have protected RNA and then sequence the RNAs are bound to the RNA binding protein. And these methods have enabled us to publish, just within my group, you know, over 200 papers and in describing the functions of many different RNA binding proteins. We've spent quite a bit of effort in developing other technologies to tell us where RNA binding proteins are bound and directed with RNA, and these are now getting to single-cell isoform-specific levels, right, and so provides us a new layer of regulation and resolution, and also scalability in identifying these protein-RNA interaction maps. And so today I want to sort of share with you the lessons we've learned about standardization of a lot of these approaches, and quite a bit of it is encapsulated in our initial capstone paper here. This is a large collaboration with multiple labs, including Brent Grafley's lab at UConn, Chris Burgess Group at MIT, Eric Locoghe and Montreal, and then Shungan Fu, probably at UCSD. And so this paper represents a integration of, you know, multiple technologies that allows us to then, you know, integrate the different data sets, and then obviously share the interpretation of this data of biological insights, you know, with the community. But of course, you know, these sort of larger projects require a fair bit of communication and standardization in reagents, for example, and we can spend a little bit of time on how we, you know, we develop these frameworks here. We had to, you know, develop these consistent biological and experimental pipelines that were then ported to different groups, I think, in the world at this point, in fact, even to companies that we have started. We've developed fairly consistent data processing pipelines here. We've all shared with community data quality standards and then we spend quite a fair bit of time internally worrying and trying to address better facts and I think some of these lessons may be helpful in, you know, in many, many aspects in the RNA modification space. And so just, just, you know, for this, you know, to illustrate from this paper that we had, research paper that we have published, we focus our attention mainly on two cell lines, of course, now within the lab we work on many different cell lines. But one lesson to share here is that these lines were generated, we spent a fair amount of time carotyping the lines and making sure that we batched the lines. Every single experiment that we publish, you know, we will keep track of the exact batch number of who grew the lines who split the lines and what freestyle conditions, you know, and so some of these information helps us to control for erroneous data and and ensure data quality. But these are the lines that we focus most of our assets on. And then in our specific publication here, we had multiple assets so you can see that, you know, we get obviously eClip assay that allows us to identify protein interactions in vitro recombinant protein interaction assays like RBNS that Chris Burgess lab had developed a lot of our antibodies was subjected to a fair amount of validation. Before they were then leveraged by a local lab in Montreal to do IFs for about 300, 400 RNA binding proteins. And then a brain gravestest lab had validated a lot of SS RNAs that we had, you know, said to him might knock down RPPs and do large scale RNA seed experiments. And we also had some of these antibodies that were leveraged to do chromatin IP, followed by sequencing that the full lab had performed. And, and this large scale integration required a fair amount again fair amount of standardization and quality control before release data sets to the public. To date, at least in the publication, we have generated over a thousand over replicated data sets, representing about 350 something RNA binding protein. And here we sort of tells you that, you know, shows you a bit of the diversity of the RPPs. And there's a function localization. So what domains they have and what experiments that they were subjected to. And then just a quick sort of slide thinking about how we integrated data right and so once data sets are released with specific release criteria that I'll go over, then they then they're put together a variation across multiple modalities. Here we have an example where we have RNA binding proteins that are bound to this specific RNA PVP one tier one in the two different cell lines. We will integrate this with knockdown data from again the same cell lines where we knock down this RNA binding protein PVP one and then tier one and then we subjected the obviously the RNA to RNA seek analysis. We can see below here that we have any binding sites for these RPPs that were flanking exons are ultimately spliced right and we can overlap that also with the idea which I'll mention in the next slide peaks that were extracted from your data sets, and also motifs from the in vitro recombinant with our BNS data sets right and that's just another example on the bottom right. And so getting back to standardizations right. We had acquired, you know, actually at this point more than 1500 antibodies for these RNA binding proteins and a fair number S RNAs, and every single reagent here prior to the use of these reagents generate data was subjected to quite rigorous Western analyses. In some in many cases also mess back analyses to show that you have single bands the correct molecular weight right the antibody recognizes the primary target. As the most abundant substrate. And so the obviously can also knock down the, the proteins and then, you know, the loss of obviously that the bands on Western blocks tells us this, you know, specificity of both the S RNAs and the, and the antibodies. And so, you know, these criteria for antibody validations were in terms of a, you know, socialization of this were rigorously discussed in several working groups in the end quote consultium. And the working groups together develop white papers to, you know, present what is a fully validated or release antibodies right for this specific project. And all of these release criteria were actually published in a molecular cell paper I think 2016 and and has been used as models for labs are validating new antibodies for any binding proteins or that we had not validated for example and but they're using all the same criteria. And so you can see we have primary at validations where the specific band had to be some fraction of the total band signal. And that you would do, you know, secondary validation knocking down the RPP and, and how would you show that antibody actually validated with a knockdown and so on. So, so these were part of our release criteria for these antibodies for the E clip protocol we had several different standards here we had a analysis standards. Using a pipeline that we had developed in the lab and then the entire pipeline was then ported to the data coordination committee group at Stanford as part of the end to consultium and then the pipeline was then, you know, hardened in some sense there for release to the and so they were, you know, at least a, a entire, you know, workflow reproducibility step there that enable us to be pretty sure that the pipeline actually works at scale. And so again one go into details but for each part of the pipeline there were specific output, you know, here, for example, different bound regions in green. You know what are the number of usable reads, what are the clusters and input normalized clusters. And then there were very specific thresholds in which we would then require data sets to be at, for example, what are the minimum read cutouts that we rationalize with a fair amount of preliminary data from our group and other groups. What would we consider a informative peak. And so we computed these with information theoretical metrics we what was the idea or irreproducibility discovery rate cutoffs that we that we develop and then release and all of these were released as part of the E clip methods paper as well as the nature paper describing the implementation of this data set right so so we feel like a lot of these methods again were discussed heavily with the community in a community sort of informed method way. And then we had you know, many of these different criteria for release but also internal and internal release criteria. And this slide here, you know, again, it was the RNA seek data set releases, I want to go over all the different, you know, obviously the different modalities. But these were again, you know, data sets that were released with spike in information with added, you know, at read length criteria, number of aligned reads and, and the data sets were only released when they had a correlation between replicates of one and nine. So, so these were again, initially developed by the groups but then subjected to a fair amount of discussion. You know, by working groups and ankle consultium and in the community with key opinion leaders and then we release all these as soon as they were generated actually not not just in the publication form. And so just in general, what was helpful for us, we develop our own, you know, lab QC method metrics. And share these metrics with with the community, which were then evaluated by the data coordination committee. There were expert committee reviews for many of our metrics and results and then there was this iterative cycle for whether or not something would pass QC and before they were then released to the public and this was release of data sets were generated even prior to our publication of the data sets right. So the last couple of slides, you know, just to point out, this has been going on now for six, eight, maybe, you know, maybe more eight money eight years now and we continue to generate data sets in these different cell lines and now including obviously, you know, as more and more antibodies get available and as and we're starting to sort of separate antibody space so starting to move into tagging all the RPPs and so we've generated, you know, at least operating frames now for a couple of thousand or any binding proteins and and and this now includes a majority of the readers of our mods and hopefully, you know, more writers and erasers as well. So all the data sets are available on the encode portal and and individual labs, like our lab are starting to develop new more interactive data portals for people to utilize the data sets in an integrative manner, not just, you know, one data type at a time but also integrating with different modalities and then, of course, there's a fair bit of effort and also storing all the imaging data not the sequencing data and so this is been done with by Eric Lucas lab and and these images are now being heavily annotated again by every look at but also the community of which RPPs are in the mitochondria and so on. So, so this effort is sort of iterative and every also requires a lot of community in, you know, insight right. That Brent greatly generated, you know, trying to sort of summarize where we're trying to get to is for every RPP and where these things are localized what they might do developing a really comprehensive map of these protein iron interactions. And just to summarize here, you know, experience, I think data standards and computational data centers for both the experimental and computational workflows are really, really important. I think they, they are have now actually been adopted by not only academic labs but also companies commercial companies that provide some of these assets for the community. And so when biotech and pharma companies utilize these datasets generated by us or companies that provide them as services they basically basically copy and paste a lot of our standards right and, and, and these release standards are then used as like you know criteria for what what is published in in fairly strong journals right and, and these are all predetermined release criteria but they're iterative as well right so they, they go you know as technologies improve the community. Work with us to reinforce you know to sort of inform us right what are criteria that can be improved can be changed, and many of these are shared again as I mentioned already in publication the white papers. There are a number of criteria adoption in at least hundreds of these follow up papers now in the community so so with that I'm going to stop and and thank you for your time. Thank you. Okay, and finally, we have will hear from Mark Lowenthal from the National Institute of Standards and Technology. He is in person and so he will be discussing the quality quality and measurement considerations for developing RNA gold standards. Okay, thank you. First, thank you for the organizers and the opportunity to speak here today this is very grateful for that. Okay, so this is going to be a little bit of a different talk this is going to be less of a data intensive talk and talk about what a metrology Institute thinks of when we when we use the term standards or reference materials reference standards. The National Institute of Standards and Technology just up the road in Gaithersburg, Maryland. And we are actually a federal institute in the Department of Commerce. So our mission is to support industry. And primarily the division that I'm that I work in works to support by manufacturing initiatives and works to support clinical chemistry applications and health and disease. So of course, pandemic sparked a big interest in RNA standards. So there's the word standard can can mean a lot of different things and we think of it in different different buckets. When we think of different types of standards, we can think of a quantitative standard, which is, I think how most people tend to think of a reference standard. And that is a specific identified measure and identified with an exact mass fraction or molar concentration with an expressed uncertainty budget with a description of the purity and impurities and what else might be in that reference standard. Also, we can think of a standard as an identity standard. This is more when we talk about reference materials rather than standard reference materials at NIST. We talk about identity standards. These are materials that we have great confidence in the identity of what's in there. We don't necessarily know the degree of what else is in there or an exact quantitative budget for that that measurement of interest. We also develop data standards. And when we think about data standards, we think about things like the NIST mass spectral data library, where they produce libraries of tandem mass spectral data. And you can buy these libraries and if you're looking at proteomics or metabolomics or whatever omics you're interested in, there's a library available where you can have MSMS fragmentation data and you can match your fragmentation data to these libraries to get confident identification in what you're measuring. And another major bucket are activity standards. We don't typically work in activity standards. These are typically done by organizations like the WHO. Activity standards are not traceable to this international systems of units so they, there's some limitations there. So when we think about attributes of reference standards. The most important parts of reference standard is for that material to be homogenous and stable. So we spend a lot of time in demonstrating homogeneity and stability for homogeneity. There has to be something that you're determining homogeneity of. So, at some degree, there is a quantitative aspect to to even identity standards for demonstrating homogeneity. Stability needs to be fit for purpose. So if we're going to ship you a reference standard and it's going to sit on your loading dock over the weekend on dry ice. We need to demonstrate stability at those conditions. If you need something that's stable for 10 years in your minus 80 freezer we need to demonstrate that as well. So all reference materials need to be demonstrated to be homogenous and stable. Commutability is another mandatory aspect of a reference material so this group seems to be heavily focused on mass spectrometry approaches and the nano poor approaches and of course there are many other sequencing approaches. But commutability is a property of a material. And we need to demonstrate that our materials are fit for purpose that they're commutable with the different. Metrological analytical approaches that are being used. And of course traceability so reference material. In order for it to be something that's useful throughout the community we need to know. What that linkage is back to an international unit for example a kilogram. So traceability chain needs to be demonstrated for a reference standard to be useful. Of course we also need to consider reference standard. Is it going to be a primary standard just in an aqueous solution of some sort. Like a synthetic standard or are we talking about a reference standard that's going to be in a matrix and then if you're talking about a matrix. That gets very tricky very fast. And pools versus individual specimens is more on the clinical end of things but when we're talking about cell lines as people have been discussing earlier. That also needs to be considered having trouble moving forward. There we go. Thank you. So the first thing we need to consider when we're developing our reference material is the application of that standard what is the fit for purpose of that material. Is it an identity standard for instance we need to identify M6 a well in what matrix in what material is that is that going to be for we developing an absolute quantity standard or is this going to be used like it might be in buying as a to assess batch to batch comparability as a quality control. Making a reference material as as a quality control material is difficult because highly characterized reference materials have a higher cost than what you might typically use for a quality control material. The reference materials are very useful for method method transfer as well. So, being able to define what is the application of a reference standard is in a central first step. So this is an example of what a traceability chain might look like. We typically start with the definition of the international systems. For example, a unit for example a kilogram and that can be the traceability chain goes from the bottom level which may be in a clinical sense maybe a patient sample. And we can define that traceability chain back through primary and secondary reference materials primary and secondary reference measurement procedures. We'll be able to clearly define this traceability chain in order to to demonstrate the utility of the reference material. So there are a number of challenges that people have already brought up with establishing what an RNA reference standard might look like. It's a big community with lots of different applications. So, how do we, how do we determine where to start because everybody needs something different for their unique work and some of the challenges are cost and time. The first step is always going to be defining the measure and so a measure and isn't is not an analyte it is an analyte in its matrix. And with a defined measurable quantity. So, if you're talking about RNA in a cell line or RNA in a buffer or RNA, every different RNA is going to be different. You need to define to find first be able to define what that measure and is, you need to establish when the characterization is complete on these are large complex by molecules and very complex matrices. So, you can go on forever characterizing these materials. So you need to understand upfront what it is that you need out of material what's fit for purpose what is it, when when is it going to be sufficiently characterized stability I already mentioned, and impurity analysis, we need to be able to estimate short hours and long nurse for some of these therapeutic products. We need to know how much residual protein or DN, or other nucleic acids are in these materials, single or double stranded RNA or double strand DNA and our products. Those things can be part of a reference material. But we just need to know what they are and how much are there in order for them to be useful. And of course cost is going to be a big concern. Typically, at at this for example, the reference standards that we produce are are not going to be anywhere near as cheap as what people would need for that that you could buy from from same Aldrich or or something to that effect. Because they have a lot of this characterization involved in their manufacture. So, is the consumer going to be able to spend the cost it takes to use these reference materials for what they need it for. Where do we get the materials. You know, can we get enough of the material to make enough reference reference material to last. You know, five to 10 years we typically aim for five to 10 years. At a minimum of what we need to manufacture to have a useful reference material. If you if you make something and it's, and it's gone in three months it's it's not much use because typically these by manufacturing companies need to use the same standard for decades. We will make, for example, we have a BSA reference material that's been used for decades by many by manufacturing, and we're on, I think 927 G at this point so we have a traceability chain from G to F to E all the way back up to the original material. And times, especially in this community is is is a big concern because the science moves so fast. And I'm sure some people here are also government employees and they know how slowly things work in the government. By the time we come up, say, for example, we're going to make an mRNA reference material. Well, as the field already moved on to to circular our mRNA at that point or to some other new technology and so we need to be forward thinking in what we're making so that it's going to be applicable for for a longer period of time. We've already talked about this quite a bit. But in addition to modifications to to the bases. When we're talking about therapeutic products we need to also think about modifications to the phosphodiester backbone be able to characterize and quantify these modifications as well. So I'll just quickly talk about some of the work that that we're doing in this area. We are in the process of developing methods for illegally tied quantification and and qualitative analysis. So some of the approaches that I'm using for quantitative analysis, mirror what we've done in the past for proteins and the tablets. For absolute quantitative approaches what we can do is start with an RNA or DNA sample, and we can use stabilized hope labeled internal standards of nuclear bases, and we can hydrolyze our sample into individual nuclear bases. And this works for any covalent modification to a nuclear base as well. So we can do liquid chromatography liquid chromatography mass spectrometry to separate and detect the analytes which are the nuclear bases themselves. And then we can calibrate the measurement to to highly well characterized higher order nuclear base standards pure standards. And so we have an internal control and an external control. And then we can from the concentration of the nuclear bases that we determine we can infer the concentration of the intact oligomer knowing the sequence of that oligomer that we start with. And so this approach has been used for RNA and for DNA for mRNA encapsulated in lipid nanoparticles. It's a robust approach a lot of people are using enzymatic approaches for this, but we found that a chemical approach works very well and very cheaply. We're also doing qualitative analysis of RNA, not just mRNA but some of the smaller therapeutics the asos and si RNA and those, those guys that. So, we can approach this as a bottom up sequencing approach, where we do an iron pairing versus phase LC MS approach. But for some of even the smaller guys, even up to guide RNA, we can do this in a intact analysis using high resolution mass spectrometry. And some of the things that we consider our critical quality attributes of these RNA therapeutics. So for example, we're interested in being able to use mass spectrometry to develop assays for the determining the identity and the occupancy of the five prime cap, the length and distribution of the poly a tail. LC UV or LC MS are great techniques for the stability and the degradation assays. And of course mass spec allows you that direct measurement of these modified bases whereas approaches may not do that. And of course LC MS is going to be great for this product related impurities. And just real quickly I'll show you that we can sequence MRNA. With enzymes like RNA is T one, these need to be timed digestion reactions to result in larger, a ligand nucleotide fragments that are amenable to sequencing that have some unique identity. Rather than chewing them down to three murders and four murders that do not have have any diagnostic utility. So, what does a reference material in in the RNA world look like we're still trying to figure that out. One thing that we're proposing is an MRNA like a platform MRNA drug substance and some colleagues are working on MRNA drug products, which would be lipid nanoparticle encapsulated MRNA. Those have modifications, but of course this this group is looking at other things but for example, there are modifications to these drug substances like the caps and the tails and the phosphorothioates and methyl one pseudo urethane or whatever your your base modification is. There's, I basically wanted to share the slide to show you that when we're making these large reference materials there, there are a number of different things that go into characterizing them. And it's, it's levels of confidence that we're concerned with. And, yeah, so so that's basically it. Just wanted to give some background on on how how a metrology institute approaches this and hopefully I can answer any questions. Thanks. Thank you. Okay, so it looks like we have about 15 minutes for questions. Thank you if you're in in person to get behind the microphone. I'll call on people. I also we also have a number of questions and slides so I'll work through those as well and I guess I'll use the previous approach of alternating as we go through these. So maybe we can start in person. Yeah, I was really interested in the illusion that you made to the mRNA vaccine world. This is for Mark, by the way. And, and so I wonder if you or anybody knows what is actually the QC that people do on the RNA that goes into an mRNA vaccine. And what would be the impact of actually making people do everything on your last slide before a vaccine was allowed to be put in people. Yeah, great question. So that slide is the inclusive of what I borrowed from a USP monograph. There were a couple lines in there that I found in other prominent publications but the most part that was, you know, so at this point, as you know there's no FDA guidance on specifically on all the things that need to be done. We are not a regulatory agency. So we work with the FDA and we work with our stakeholders that the biomanufacturing community as well as the FDA and all the other stakeholders and we try to work together to establish what what that list should look like and then we try to make materials or reference procedures and encourage someone to use those when they go through the FDA approval process and once they are used in that manner, then the FDA has some foothold say they did it this way. Let's encourage other people to do it that way. But we don't. Moving on, we'll, we'll take a question from Slido. This one is from Robert Ross and is for Sean Feng from tri-link. And I'll just read it. The helm group has recently shown the applicability of like gating oligonucleotides into full length mRNA. Can you comment on this approach for the creation of longer oligonucleotides and therapeutics? Yes, you know, yes, there are publication out there. You can like it, you know, oligos with a DNA oligo RNA oligos together. The key issue is the efficiency, you know, and how can you purify afterwards. And yes, at the tri-link we also look at those approaches as well. But yeah, it can be done. We need more work. Okay. And I guess maybe an extension. So do you, I guess that is obviously going to allow for faster extension of these. Do you think that is going to be the case? So, so it's two pieces, right? So you have, you know, one piece oligo, one piece, you link them together. So, you know, many things involved that ligation enzyme, if you do it enzymatically, you know, if you into the clinical chemistry, you know, people also publish that approach. I think in terms of, you know, for chemistry, you know, that method-wise you need to figure out what's efficient. But also you need to make sure that either that if it's RNA, for example, if you're, you know, people thinking about click the oligo tail to the mRNA. And then you really need to figure out that, you know, that mRNA in terms of protein transcription translation. You know, so a lot of research still need to be done there. But yeah, it's possible, I think, but needs a more work. I think quite a few groups are working on that. Okay, thank you. Okay, we'll take the next question in house. It's actually a question for Chen Feng as well. So my question is regarding the cost of synthesized oligos for RNA oligos, they're substantially more expensive than DNA oligos. And the question is, what does it take to bring down that cost? And is it a question of, you know, making a combinatorial approach to making multiple oligos at once? Like, how can we get down that cost to more or less like DNA oligo costs? Right, yes, it's a very good question. You know, the oligo, RNA oligo, you know, unfortunately by nature of chemistry is less stable. And the yield and the quality are, you know, generally lower than the DNA oligos. But in terms of amdide, you know, that there are enough, you know, standard amdide for RNA, at least as far today, they're about a comparable price compared with DNA amdide. But I think it's the purity, right, dependent application, you know, especially for the RNA oligos, I think, you know, you need to really, for example, guide the RNA. So you need a really full-length purity, you know, people pushing for, you know, guide the RNA 100 base. And, you know, instead of 50, 60% purity, people asking for 90% purity because the 5 prime and the full-length, it's very important. So it's really application-dependent and, you know, high-supporter method that you can push down to, you know, making minimally RNAs. So I think that it's the calculation and, you know, purity, you know, purification, characterization are the more expensive potential, I think, for the oligosynthesis. I hope that answered your question. Yes, thank you. Okay, so next online. This is probably a question for any of the panelists. In the context of RNA standards, it's been reported that only 10 or so modifications can be synthesized. So what are the barriers to end and maybe possible solutions to synthesizing the rest of the 150 plus modifications into phosphoamidide or onto oligos, nucleotides by enzymes. Okay, for me, question. Okay. Yes. So, you know, you think about the sequencing bit, right? So the Illumina sequencing, you know, they have this enzyme can pick 3 prime and they don't mess up group. So, you know, unfortunately, you know that the enzyme is, you know, and the engineer process, you know, it's a very, the key is that the answer needs to be very specific and accurate. Also, and also fast. So, you know, I think it's a more involved enzyme engineer, I think, you know, you probably need to start with high throughput and then screening, because, you know, and that nucleotides available. However, the enzyme, you know, can only take, you know, few on modified nucleotides, especially for the oligosynthesis, synthetic genomics. For example, the nucleotide is 3 prime blocked. So because you introduce one base at a time, you know, that engineered enzyme, you know, has issue in terms of speed and the specificity. So it takes, it's good to, but few days, you know, moving forward, but takes some time really to have good enzyme, you know, maybe multiple enzymes, maybe some enzyme can take a standard nucleotides, maybe more as an engineered enzyme, in case that takes a modifier. So it's, it's, I think it's evolving process. And yes, it's a direction of few days going, but unfortunately not today yet. Okay, thank you. So some more improved enzymes and the ability to generate more enzymes. Anybody else? If I can add one point. So for the phosphoramidide building block, those, the rest of 150s, it's actually not that difficult to see success from the organic chemistry point of view, but the problem is, we don't have the big driving force or the biological significance to make those building blocks. A lot of time, those, those rest of us have been just detected and sitting there, and nobody know what's going on. And if we just think success, we don't have a good story to, to, to, to report that. And when we write the grant has no biological significance. So I think there's, we need have more, like, like functional study and then this can be like largely driving forward. That's my thing. Yeah, great, great point. You know, for phosphoramidide, I can also come, you know, so it's the scale up, you know, it's a balance, right? You want the low cost, you want also availability. So, so, you know, some of the chemistry, you know, if you make one gram is going to be very costly. You know, if you do kilogram, the cost can go down. But if you make kilogram, you said for 10 years, you know, the company is going to make that so it's just really balance. Yeah, exactly. For, for example, a lot of nucleicide you can, you can request the customer synthesis with like five milligram is easily go up to $2,000. But it's good for the, maybe for the mass spec, but for the oligo synthesis, five milligram is nothing. So just those are like causes also a huge issue here. Hi. Thank you. I guess we can move on to the next question. We have someone in house. Yeah, I have a question for Eugene and one more general to the audience. We have an encode data, the database of RNA binding proteins, searchable by inter atomic search engines and thinking about cytoscape or something similar. And for the general audience, do we, do we have a searchable database of hotspots of modifications. I'm mentioning M6A, but there are many others that I'm sure that the community is interested in. We have a central repository where we can go in and look for what is a high frequency spot where M6A happens. Yeah, so the ankle data are all available on the encode portal right but I think, I think, you know, we are developing more interactive searchable portals there I just pasted one that that's from my lab that just came online actually just live yesterday and so. We're trying to get better and better at getting people to be able to search for a specific RNA and gene and then be able to search for all the RPPs that bind there including including readers and writers of different mods right and then we have not integrated RNA mod, you know, data there yet that's not hard for us to integrate once we know which data sets are actually good and useful for integration. Hopefully that's helpful and then yeah and I think you know the goal is also to build cross platform interoperability and so for example, you know we can start putting things up on other websites like cytoscape for example and just come up with different metrics for how you compare distances between indirect, indirect tone profiles for example so, so I think there's a there's a big push, but I think it's too needed right to to sort of generate this one stop shop sort of portals for you know RNA mod and interact on information. Yeah. And M6A is the only one that I'm not. Jane, this, this is a question for you this is Brenda Bass. So how did you choose your cell lines. Hey Brenda, good to see you. These cell lines were chosen, you know, as part of the encode consultant right so these were, you know, practically speaking I think Stephanie is there can maybe so addresses but practically speaking I think these are the lines where there were a lot of previous encode data already generated from the, you know, decades of work in the epigenomics and DNA binding field. And so we, you know, we were really asked to populate the same lines of our RPP data sets. Of course I think these are not the only lines that are interesting obviously there are many many lines are much more interesting I think. And so part of the effort, you know that we add in community needs to do is to, to port assays and protocols and generate more data in other lines. Yeah. Brenda, is that the question. Yes, that's that's exactly the question and then one more that you did a great job of talking about quality control and points at what you needed multiple validation. And I'm just wondering if you have any take home lessons you know, you certainly, there are steps that have to be really carefully controlled, but you also have to think about time and cost. You have sort of a take home for things that require a lot of attention and those that don't. I think I think one of the big take home messages I think as you really, as you really pointed out it point out it's like the balance between, you know, generating data in a in a manner that's helpful for the community now versus, you know, the best and highest quality data sets for we had to compromise on the number of replicates right so so for the clip experiment so we in the encode data releases we did to replicates IP and one of the size match input. I mean in my own lab for our own papers where we have time, you know we generate three replicates and more each one has one input control right. But for the encode data we couldn't do that and also meet the timelines for generating large data set for release. So I think part of this discussion is an iterative message has to be there right I think I think it's important to also do both right you know enough data at some level quality that people become interested in it, they use it and and and when they want more of maybe the same data sets we can then go back and generate more maybe you know more replicates you know with more controls across more cell types and cell lines right. And so that was like a big take home like we had to compromise on some things, keep the quality there but, but not do as many replicates or cell lines as we would have wanted right. Yep, thank you. Thank you. I think we can take one last questions, Steven. Sure. Hi. All right. Hi. So, I have a question. So both Gia and champion mentioned that it is possible in fact to synthesize all of these oligo nucleotide standards, but the lack of drive is related to disease like disease relevance, which I think is a really interesting point and I want to push on it a little bit. I'm sort of wondering if there are other reasons why you would want to synthesize these and what those are and then if there is another way that you could convince a group, whether it be a federal or another group to actually go through the process of making these standards so that they're available to the public. Is there a mechanism that you could think of that would be worthwhile. Is there a group that you could think of that would be worth implementing that. That's a great point. I was thinking of that, actually. So if some like funding agents like NIH, they can find a like platform to have a lot of participating lives, and some people have the need for over there and we can take the task and just identify the synthetic root and make it oligo and deliver to you. So in this case, everyone can be beneficial. I think this is definitely doable platform. I really, I think it's possible. Yes, I can also speak for trying to, you know, we are unique, we make a nucleotides, we make oligos, RNA oligos, we also make mRNAs. So, you know, we have a strong group analytical service in house as well. Yes, we definitely would like to contribute. Yeah, whichever ways. Yeah, I think definitely be happy to help. Okay, well, I guess we can thank all the participants for their talks and answering the questions. Thank you very much. And I think we'll wrap up this session and move on to the next. Okay, thank you. Good afternoon, everyone. I'm Mary Majumdar from the Center for Medical Ethics and Health Policy at Baylor College of Medicine. And I'm delighted to be here to moderate a panel session on lessons from large scale collaborative research and debtors. In the course of this session, we should have opportunities to discuss major scientific initiatives in the US and abroad both past and current to understand the successes and also the major challenges of research initiatives and to apply lessons learned to the possibility of a large scale effort related to RNA. We have two panelists and each will give approximately 20 minutes of opening remarks and then we'll have the remaining time for joint Q&A. And I will begin by introducing our first panelist. Bob Cooke Deegan is a professor in the School for the Future of Innovation in Society and with the Consortium for Science Policy and Outcomes at Arizona State University. He previously directed Duke Center for Genome Ethics Law and Policy, and he's the author of The Gene Wars, Science, Politics and the Human Genome and over 350 other publications. Dr. Cooke Deegan. Thank you, Mary. And it's like coming back home here. I've spent two years working in this building up on the second floor. So thank you for having us and I'll just dig right into what two of us have been asked to do is think about analogies and disanalogies to a proposed major initiative. And so we're going to be going over some history. Part of its political history and part of it is technical history. So I'm going to start with the history of the Human Genome Project and then I'll mention some other projects. And this slide shows there were three origins of the Human Genome Project, three people that had the idea. Robert Sinchimer is the guy in the middle with the blue shirt. And up to his left is Nobel Laureate, Renato Del Beco, and just below Bob Sinchimer and below Jim Watson is Charles DeLisi. And Charles DeLisi is actually the guy who started the Human Genome Project. He was reading a report from the Office of Technology Assessment about techniques for measuring heritable mutations in human beings. And the technical question was, is the technology up to doing that? Can you tell whether people have inherited mutations from their parents? And the real focus here was actually the studies of Japanese citizens who had been exposed to the bombs in Hiroshima and Nagasaki. And the Office of Technology Assessment that we'll loop back to in a minute had written a report on the technical prospects for measuring heritable mutations in human beings. Charles DeLisi was at the Department of Energy, but his background was really interesting. He was a wizard math. He had worked at the National Cancer Institute and he had worked at the Los Alamos National Lab out in New Mexico. The home base for the scientists that were involved in the Manhattan Project. And the physicists and mathematicians who hung around after the war doing a high tech was bank science that eventually became the home to the first US based nucleotide sequence database GenBank that was later moved to NIH. So DeLisi had this background in what would now be called computational biology. And he thought it would be a good idea to have take the new technologies of DNA sequencing. And create a reference sequence as a tool for doing all sorts of wonderful things for human health and all sorts of other applications. And more importantly, he was working in Germantown at the Department of Energy science offices headquarters. And so he had a budget so he could deploy some money to do to get this thing started. And so the origin of the human genome project as we came to know it actually came out of the Department of Energy, not the National Institutes of Health. And it kicked off a bit of a struggle that I'll allude to later. The other two origins were Renato Delbeco, after doing his pioneering work in cell biology had become the head of the, the Salk Institute out in California. But he was doing cancer research. The argument was we need a reference sequence of the human so that we can detect what mutations are being passed through cells that are causing cancer. He wanted a reference sequence against which cancer genomes could be comparison compared because you can't do breeding in humans. So he actually proposed that actually at a talk here in Washington at the Italian Embassy in 1985. He didn't went nowhere but he published an article in Science Magazine in March, and propose this idea. And the third origin is completely different. Bob Sinchimer had been a biologist at Caltech, and had been working on Phi X 174 one of the early origins of molecular biology, and had realized the incredible power of having a sequence that was one of the first organisms ever sequenced at, at Cambridge and Fred Sanger and Bart Barrell's group. And it became an incredibly powerful tool for understanding the biology of this virus. And he had this incredibly painful experience of having gotten a $36 million check to build a telescope in Hawaii that he had to give back, because the Keck Foundation step forward and paid for this telescope. And he had to give back the money to the donors. And he actually decided, Oh, well, you know what, we're used to asking for big money for telescopes. Why don't we do that in biology. And he went back to the family having given their check back and said, Hey, would you guys be interested in sponsoring an Institute here at the University of California Santa Cruz that would create a reference sequence of the human genome. He went nowhere. He wrote some letters to NIH and they said, Yeah, put it through peer review. And you can imagine how a $36 million proposal in 1985 for a human genome Institute would would fly at NIH. It's a little it's a kind of a big R01 right. So, that went nowhere, but the seeds had been planted. And actually when delici came up with the idea. He could do something about it. And when NIH got involved by a kind of a circuitous route, Jim Winegarden, who was then the head of the National Institutes of Health was at a cocktail party in London, and somebody came up to him and said, What do you think about this DOE idea of the human genome project doing a reference sequence for the human genome. And wine garden said basically I thought that was like asking the National Bureau standards now missed to build the be one bomber. So this what that set up, and I'll go into that in a minute. What this set up though, was a kind of a healthy competition between the Department of Energy and the National Institute Institutes of Health for leading this sexy high tech bang project in human biology. The other folks on this slide are folks who came in a little bit later. You see Jim Watson and the upper right he was the original director of an office then a center and then finally after he left it became an Institute at the National Institutes of Health one of the sponsors for this project, the National Human Genome Research Institute. The bottom left is actually the guy who's, who's, who we all bow in a basins to it's Fred Sanger who got to Nobel laureates, one for sequencing proteins and one for sequencing DNA, and amazingly humble and and sweet guy who completely transformed the field and believe fervently in open science. And that those spilled over into the human genome project in a way that I'll come back to in a minute. And then finally at the bottom right, you see one of the culmination moments. There have been probably three dozen of these, but there have been the human genome project has come to an end several times. The very first one was the 2000 announcement June 2000 announcement in the White House with Bill Clinton in the middle there, standing safely between Francis Collins and Craig Venter who had called the temporary truce between Salera genomics and the publicly funded human genome project. So, what do we what what can we summary summarize about the history. Well, the original impetus for the human genome project grew out of what was happening in two miraculous technologies and it was the convergence of those two technologies. One was DNA sequencing, and actually also recombinant DNA and physical mapping and genetic linkage mapping that were progressing in tandem. So the mapping of DNA, but the other essential technology was actually information technology and computation, because these sequences are completely useless and you have unless you have a computer helping you think through what what what you're actually looking at. So, the human genome project was the convergence of so remember the Macintosh came out in 1984 the idea for the human genome project came out in 1985 DNA sequencing had been invented in 1975 to 1977 by Fred Sanger at Cambridge and Alan Maxim and and Wally Gilbert at the other Cambridge in the United States. What happened when this idea was in the air was that NIH began to kind of maneuver to get itself in position to support the science, and they did an analysis of what grants they were supporting it turned out they were already supporting about 300 million grants that were related to human genome mapping and sequencing, compared to the DOE budget that was about $5 million that year. And it started a series of meetings for for two and a half years we were going to at least one meeting a month about should there be a human genome project. And I've picked this the coverage from science that started with Roger Lewin and then shifted over to Leslie Roberts because they did a marvelous job of capturing it. The point of showing that though as you can see that this was not a conflict free idea. First of all, within the technical community. There was a lot of disagreement about whether the technology was up to the task whether it made any sense to spend all this money on a reference sequence anyway. And by the way, if you're going to do it that way that's not how we do biology. We do biology through R01 investigator initiated grants lit 1000 flowers bloom, and they will eventually produce a reference genome. And the argument against that was okay yeah right show us when NIH has ever done anything that systematic or anybody else in biology so this was going back and forth. And of course the Department of Energy in that argument had a bit of a head start because they had been home to the Manhattan project. The largest technical social technical effort that had ever been mounted and culminated in the atomic bombs. So look, really interesting history right we start with the atomic bomb and we loop back to the human genome project. And then it comes back to Los Alamos Livermore and the other national labs as well as the National Institutes of Health. By 1990 it finally got launched. But there was a process in that window between 1985 when Charles to Lee see and David Smith at DOE had this initial idea and 1990 when it officially started at five year interlude. There were a lot of meetings and a lot of procedures that went into it. These two reports came out in first February and then April of 1987. The one on the left is I think probably the reason for the name of this committee, which is the mapping and sequencing the human genome. And the history there is that Bruce Alberts had written an editorial in cell saying, we don't want big biology, biology is inherently small science. And he was picked to be the chair of the committee that was supposed to decide whether there should be a human genome project is a very, very sagacious political move on whoever made that decision in this in this institution. And basically we're appointing an opponent and Jim Watson was on that committee, David Botsy was on that committee, Charles Cantor a whole bunch of high flying molecular biologists, mappers and sequencers were on the committee. And that committee is actually mainly responsible for why the human genome project turned in turned from being a point of vigorous contention with biology into a point of consensus you know, maybe this is worth doing. As long as we don't just focus on humans but we include other model organisms and construct their genomes. And if we don't put all of the weight on a reference sequence but in fact we think about building tools along the way, like a genetic linkage map, like the sequencing technologies, like the databases that are needed to support the infrastructure for making sense of this stuff. And the physical maps laying out the segments of DNA to be sequenced. So the National Academy of Sciences, I think can take the lion share of credit for turning this idea that was inchoate rather nebulous in 1985 into a program that could actually be implemented by scientific agencies in Washington DC. That left, there's one issue two issues actually that the National Academy didn't do a great job of handling. And one of those was the debate between NIH and DOE you have these two agencies, both of which have legitimate claims on the prize. The human genome project and no real way of resolving the differences. So there was actually a bill that was passed by, I think the vote was 88 to three in the US Senate that would have formalized a commitment to have the human genome project co administered by the Department of Energy and the National Institutes of Health. When it came to the house, fortunately, Senator representative John Dingle said, he called up NIH and DOE and said, you know what, we could pass this law because they passed it in the Senate we could pass it in the house. But I'm not sure that's a good idea on his staffer, Leslie Russell, pulled up the heads of the agencies and said, couldn't you guys just come to an agreement yourselves because when we put this in statute it's going to be inflexible it's going to be there for all time it's going to be hard to change. Why don't you guys just get together. So that resulted in a memorandum of understanding between the NIH and the DOE, which was a very logical conclusion was probably better than either agency running it alone. It originated in money flowing from both agencies. It was about two thirds NIH one third at DOE at the beginning, and now the DOE component has drifted down, and the NIH now has its own Institute. The other issue that I'm alluding to that may be more important now than it was then, but it was pretty controversial even then was intellectual property and patenting. And I will say that in both of these reports we fumbled our handling of that issue. I won't say a whole lot more about that unless it comes up in the discussion because I don't know whether that's going to be an issue for the area that you're talking about. I don't know the area well enough, but intellectual property and patenting the by dole act was new. It was being implemented and the Department of Energy and NIH had very different attitudes towards intellectual property and how it should be handled in public private partnerships between academe and and industry. Here are some features that clearly the sum here is this is an idea that Cothold was really sexy was really obviously related to human health and could be taken out to the general public and people would get excited about it. Public includes the US Congress and the agencies. And so it became the most prominent science policy decision that was in discussion for a good two years and culminated in the budgets flowing to both agencies. A couple other things to observe about it. Unlike I think most of the science we've heard here today, there's actually one central goal. One sequence of the human genome, and then there were subordinate goals, which was to sequence other organisms and develop the physical maps and genetic linkage maps and databases needed to get there. But it was a relatively focused objective. Moreover, there was only one technology that was really available to the job. Remember I said that the idea happened in 1985. The way you did DNA sequencing in the United States was generally P 32 gels that you'd put in a freezer for three days and you would be able to read out if you were lucky 100 base pairs. Was that going to generate a genome of 30 of three billion base pairs no way. But by 1987 the Caltech sequence sequenator that was then picked up by applied biosystems had become the dominant instrument. And it became to seem like, well, maybe if we automate this stuff, we actually can generate enough sequence to create a reference sequence of the human being. So, that began in 1987. And also the mapping and sequencing technologies, the way of handling large fragments through cloning in bacteria and in yeast, handling hundreds of thousands of bases instead of the, the lambda size sequences that we can handle with cloning technologies up until then. We're all developing in parallel and really important. The United States and Europe had already reached an agreement on how to store DNA sequence in databases. The EMBL actually started it and then NIH picked up on it very quickly in the late 1970s. They already had an agreement and then later the DNA database of Japan joined what is still the international sequencing database consortium where the data are mirrored across well they aren't exactly mirrored but they're shared across the three major databases for the whole group. That agreement was already in place for storing the data when the human genome was prod was started during the genome project, the transition was made from Los Alamos National Lab to the DNA, the two to NIH basically to the National Library of Medicine for to the National Center for Biotechnology and information at NLM. What I think is probably different is there were really only four five countries involved in the human genome project in the transition to full scale sequencing. That was the US and the UK, which paid for 91% of the sequencing, and then Germany, Japan and France, and then China joined in 1999 did about 1% of the final sequence. The drivers were the National Institutes of Health and the Welcome Trust in the UK. And at DOE they created a new joint genome Institute in Walnut Creek, California, that kind of harness the talents of both Livermore Los Alamos and the other national labs. Another feature that I hope you will be thinking about in connection with RNA work is the open science ethos that pervaded the human genome project just just to notice though. If this project had emerged from human genetics, which was actually my field when I was in medical school and internship residency, everybody horted their data, you would create a pedigree. You would do work with it, but you would hoard your data you would not share it. And you would mine it for your whole career. That was almost completely the opposite of what was happening in nematogenetics. And it happened that these two guys that you see lounging at Marco Island in Florida for one of the genome meetings years later. It's Bob Waterston on the left and John Sulston on the right. They, as the human genome project was making its transition to full scale sequencing in early 1996 the grants had been the notices of words had gone out the money hadn't started to flow. And there was a meeting held in Bermuda was held in Bermuda precisely because it was halfway between Europe and the United States. And it was in February so the weather was miserable, and the hotels were cheap. And they got representatives from all of the major labs that were going to be doing the high throughput sequencing for the human genome project in one room in one hotel, and hashed out a whole bunch of policies. And one of those was, if you're getting the money from the government or from the welcome trust. You're going to share your DNA sequence data at the end of every day. It was a really radical pre publication proposal that in fact was vigorously opposed by several of the people involved in the project including Maynard Olson and Craig But they managed to get it through when what you see on the right there is John Sulston scribbling on the whiteboard. It was a picture taken at Bermuda in the 1996 meeting that basically said you're going to release your sequence every, every day. Why did they do that. Well one was the ethos of open science. What was the politics of the human genome project this was big money going to a few institutions, and those institutions didn't want pushback. They wanted to make their data available to many many other labs so that those labs wouldn't see the money going to them as being direct competition with the small lab work. It was a it was a partly a political decision, but it was also a very practical decision. How are you all these groups were saying I'm going to do this part of chromosome seven or chromosome nine or chromosome 19. How do you know they're doing it and how do you know whether the quality of the data is any good. Are you going to wait for a publication five years later 10 years later. No way. They needed a way to give feedback to their own management decisions about who's going to sequence what and at what quality. So if you're going to make promises we're going to be able to hold you accountable for those promises by looking at your data. But it turned into kind of a spiritual as very patrinos from DOE raised it as a spiritual commitment to open science in this hub and spoke mechanism for doing science. There was also a patent provision in here that actually got dropped before the as it made its way through NIH and the other institutions. One thing to note however, is this was a commitment to sharing data every day. And the NIH, the DOE and the welcome trust all agreed to it relatively quickly, but it took two years, and it required changes in policy in Germany, Japan and France to abide by these rules and in fact they had to send some nasty letters. Francis Collins area patrinos and Michael Morgan sent nasty letters to Japan into Germany saying if you want to say you're part of the human genome project you got to play by our rules and that's deposit your data every day. What was the sticking point the sticking point was all three of those countries had agreements with industrial partners in that country to get privileged early access to the data and daily deposit the data was going to violate that agreement so they had to get that tweaked and changed create an exception for data for Germany, Japan and France that took two years. And that may or may not be relevant to some of the considerations, you think of. So Bermuda principles are in place by 1998, but another thing happens in 1998 which is applied biosystems has developed a new instrument that is no longer dependent on slab gels but switches to capillary gel electrophoresis much faster. And it's higher throughput sequencing from these machines just as accurate. And as they were floating that machine they said hey why don't we do our own human genome project they went to Craig Venter, who was one of the pioneers of using the ABI machines at a time when they were not so popular. He had left NIH and he had gone to the Institute for genomic research up here in Maryland. And they brought him on to head up a company that a few months later was named salara that then made the human genome project. Once again, the most prominent science policy topic of discussion for years, and for the same reason of intense rivalry. In fact, this time it was really nasty between salara and the publicly funded genome projects. But it turned out there was a hearing in the House of Representatives, and the open science ethos of the Bermuda principles was absolutely essential to keeping federal funding us federal funding for DOE and NIH, going, because they could say hey we've got these open principles. So Salara is saying they're going to make their data public but will they really, it's a company and they're going to, they're going to behave according to the best interest of their stockholders not necessarily the best interest of the country. That was really crucial so the Bermuda principles that were adopted for practical reasons, and then became spiritual reasons became very practical in 1998 with the emergence of Salara. The techniques were very different. And we don't need to get into that but basically the human genome project broke the genome down into clone sized sequenceable segments of DNA, whereas the salara sequencing technique was to just take hold the whole genome and sort it together. And to assemble it in the computer. Ever since the human genome project. There have been any number of proposals to do ideas to pursue ideas in biology that try to capture the excitement and the political will of the human genome project. And a couple of them have worked really well and a couple of them have worked not so well. The human genome diversity project was cooked up by Luca Cavalli-Sforza he wanted to sample human diversity all over the planet. He and Alan Wilson didn't agree on how to go about that. And moreover, they really didn't anticipate how the world would react to trying to do DNA sequencing and high throughput sequencing centers in Europe and North America when the samples were going to be coming from all over that came known in some circles as the vampire project. And it has continued at one level or another but never under the flagship of the human genome diversity project. We can get into that the decade of the brain came the brain initiative came some other current projects the human pan genome reference consortium and the earth biogenome project are doing trying to do much better reference sequences in the case of HPRC the human pan genome to do a much more thorough human reference sequence kind of in the lineage of the human genome project and earth biogenome project is trying to do for all nucleated organisms, what has been done for the human genome which is have a very high quality reference genome for all the critters that have nuclei in their cells. Then, at the political level we had the cancer, the precision medicine initiative that President Obama announced in the cancer moonshot, which has gotten re infused with the Biden presidency, and, and now I guess this project. Here are some analogies. Everybody wants money and resources and infrastructure coordination. And usually if you're flying the banner of the human genome project, you want the excitement and the political will to sustain your project for a long period of time. But times have changed. The economics is no longer the new kid on the block it's now everywhere. And there is no one dominant technology I think I've counted, at least a dozen technologies that are essential to what you've been talking about today. You don't have one company making one instrument that everybody is using. I think the world of government industry, academic relationships have gotten a whole lot more complicated and I suspect that will influence the conduct of this project. Moreover, I mentioned that the US and UK accounted for 90% of the original reference sequence I don't think that's likely to happen in any broader biological field these days. And, oh my gosh, think of all the stuff you're going to do with RNA. It's a much, much broader set of goals than one reference sequence sitting in a database. So I wanted to end with some questions that you those of you, especially on the committee in your writing of a policy report. I think these are some questions you want to address which is why do we want to do this now. We want to make sure that all of the information not only of the technical prospects which are clearly going to be really important, but also, Jane, you're thinking of your grandmother a Thanksgiving, why should she care about what you're doing. So, why is it important and why are you planning to do it now. Is it going to be global or national. So what you're promising to do with as much precision as you can, and identify the communities from which it's emerging and the other communities that are going to be affected by it. And those are equally important. There's a natural tendency and thinking about science policy to focus on the emerging technology from the stakeholders who generated it without thinking too carefully about the other constituencies that are going to be affected by it. So that's a that's a thorny political problem. And then who's going to care. What's going to be different if this happens versus if it doesn't. And how are you going to know. So I'll end there. Thank you. Thank you. That was fabulous. Our second panelist is Mark Helm who will be with us on video. Can we go ahead and introduce him and I'll go ahead and introduce him. So Mark Helm is an associate professor of pharmaceutical chemistry at Johannes Gutenberg University in Mainz Germany and leads the helm group at the Institute of molecular biology. The helm group integrates disciplines from chemistry to biology, physics, bioinformatics and pharmacy to advance research on nucleic acids in particular on RNA. Thank you. Am I on can you hear me. We can hear you. All right. So I have been asked to report on German research consortia. You will find they are several orders of magnitude smaller of course than the ones that Bob was referring to. I think that's a given. And the slide shows the icons of those two. Maybe in between now show you some of the struggle and what went forward and backwards. So we're looking at two fundamentally different funding schemes. The first one is SPP 1784 entitled or the title is chemical biology of native nucleic acid modifications. It ran to funding periods from 2015 to 18 and then from 18 to 21, where it then ended and this was smack in the middle of the corona pandemic. And because of that and the lockdowns the DFG or principal funding body for such things in Germany actually gave us an extension of a couple of months. And then at about the same time started our map, which is by its structure completely different and it is now in its first funding period of four years. But our map can run as long as 12 years. Of course pending positive evaluation for subsequent funding periods. So these two programs are from their intention fundamentally different. The SPP could be roughly translated as a priority program and its goal by at least by the DFG are to support emerging fields, something like hot topics or something but something which is not yet established. And in contrast the CRC's collaborative research centers in Germany also abbreviated as SFB. They are meant to induce a local focus in certain universities and their scientific environment. So with the priority program, the hope or the ideas that you insight, PIs that maybe do not work in exactly the fields since it is emerging but get them interested in joining and maybe dedicate some of their own house resources. The funding system in Germany is a bit different. There's a little more house money available, if you will, which is sometimes seen as an endowment. It's not quite the same, but most PIs have a little bit of that available and can choose to invest it in a given research direction. And that's the intention here also to obviously former to foster corporations and maybe follow up consortia. In a little bit of contrast, the CRC's want local support, not nationwide because the SPPs are nationwide. The rest of it is kind of the same, but you want to really induce a local focus. The SPPs have a separate pot. There's two rounds each year where an idea is written up to apply for such a pot. And if you win the pot, the money is dedicated to your particular topic. One such pot is 10 to 11 million, so it's not, like I said, outrageously much, but to start up a topic, it's exceedingly helpful. As you saw before, the program ran a total of six years divided into two funding periods. And once the pot was dedicated, and that's a crucial difference to the CRC's, there is a nationwide call for single proposals. So any PI in Germany could submit a project as long as it was roughly geared towards the title. And then there was something like a player versus player decision where everything was evaluated, everybody against everybody else, and the top projects were funded. So it was a player versus player competition, if you will. In the SFB, there's also an extra pot, but there you have different funding periods. So as I said, possibly up to 12 years, it is in the overall 30 to 50 M euros. It can be one or two, maybe three cities as a focus. And here, the slogan that I think of is, you live together or you die together, that is, you apply as a group and the entire group succeeds, or nobody does. So this is fundamentally different from the player versus player in the special priority program. So at a closer look at the SPP, we turned in the principal application. So that was the quest to dedicate a pot of money to nucleic acid modifications. And that was mostly RNA in 2013. And that was under the impression of two defining papers of mapping M6A by what we now know as MIRIP about a year earlier than that. The application was approved and then there was a call for single projects in 2014. Subsequent to this ran those two funding periods that I already mentioned, and then there was a corona based extension until about a year ago. The leak questions that we asked were with respect to nucleic acid modifications to be seen in atomic detail that was meant to get everybody used to the idea that these things that we look at aren't letters. Or a nuclear basis, but very specific chemical alterations, which was sometimes overlooked by part of the community at the time. And along these lines, the principal questions were where do we find modifications. So there was a definite identification and mapping compound in there. How do they get there? That is, which enzymes do deposit the modifications and then why is the obviously the question for biological function. The outcome was that over the six years there were 38 PIs to receive funding, typically a PhD student and consumables for three years or something like this. The focus as was defined in the title was on chemical biology. Hence the notion of atomic detail specifically requested was or encouraged, let's say was application of mass spec of sequencing, but also of chemical organic synthesis say phosphoramidides or some such structural biology, because it would provide insight at atomic detail resolution. And it also had a significant component on biochemistry and molecular biology, typically enzymeology, and then moving all the way into cell biology. In the letter case, we encouraged PIs to apply in tandem projects where one PI would cover the atomic detail aspect and the other one would go into biology, say knockouts or imaging or some such. Altogether what came out were 250 papers, which I think is quite honorable. 100 of them were impact factor or better than 10 and five and what you would, I guess, call the big five so nature science cell and so on. So, from that point of view, I think the turnout is terrific, but it was also very much based on what I mentioned before that people were investing their in house resources into the topic in what you would probably call synergy. 14 job offers resulted to people who had received funding. And I flatter myself by thinking that it was also in large part because of this funding, and that included five junior PIs. And then there were follow up SFBs or CRCs, where the notion is it cannot be on the same topic. But the field had by then six years later advanced quite a bit. So the details of those new research initiatives were not really on the same topic and had moved with the field. So there is our map TR 319, which is two centered part half of it is in Mainz and the other half is in Heidelberg. I will give some more details in a minute or so. And then there is an SFB in Munich led by Thomas Karel chemical biology of epigenetic modifications where maybe a third or so is on RNA modifications and the remain on protein and DNA modifications. So here are the people who participated. I'm obviously not going to cite all the names, but people in the field probably know at least half of these people, maybe more. And like I said, this was all of these people are still up and running in the field and in science. So then our map started officially in July 21, and we're now in somewhere in the middle of the first funding period. The process involved a pre proposal couple of pages submission in fall of 2019, which was then evaluated in Bonn at the DFG. So we had we were five or six people to go there and answer questions. We were then encouraged to submit a full proposal in November 2020 and were evaluated by video conference. Normally it would have been on site and in person, but due to Corona, this was all done by. Well, like you see me now right by video conference and then the funding started in July. Intermittently, we had a larger meeting. The German Society for biochemistry and molecular biology has this flagship meeting in Mosbach and in about this time last year was the handover. So the SPP kind of left and our map kicked off. You can see we had quite a few speakers from the field here to the left on the list. And these are now the people that are part of our map. Again, homies know many of those. A lot of us will be meeting in Ventura in a couple of days, I guess. So these are on two sites in Mainz at Johannes Gutenberg University and then in Heidelberg at obviously Heidelberg University. And then there's the German Cancer Research Center, which also hosts some of the participants. We have a steering advisory board. Eva Nouveau is I think also engaged in this workshop. The technologies you will find represented also here in the board reflect to a degree the intentions and philosophy in our map. So we put a lot of stock in technology also in quality control to establish foundations, which I think are very important in terms of analytics to support biological research. So here is a structure. What you see in green are projects of people who know their modification and think these modifications will to some level influence RNA modification, excuse me, processing. So that's why the area is called modifications look for processing. And then the B area here in blue is the other way around in the middle. You find some projects which are already homing in on specific details of how modifications and processing influence each other. And in the end, this is what defines the API transcriptome that we're all looking to, right? It's not the modification alone. It's not the transcriptome and the length of the RNA. It's the combination of both. And then on the bottom is our technical or technology foundations, if you will, with three pillars. I would say there is a substantial portion dedicated to mass spec that is to the right, CO3, mass spec in both proteomics in nucleosites and in oligo nucleotides. And then we have in the middle data management and data science project. And we have significant efforts in developing new sequencing methods. And there is abundant experience in all of these three fields. And that is why we are very conscious of the topic under discussion here. I think how can we get a handle on, well, storing it all, supervised quality and make it accessible to everybody at the same level. I apologize for this, but I just cannot help myself. This is a result from a paper, an early paper from my lab mapping and modification M1A. And you can see some quality results to the left sensitivity, sensitivity, specificity, and so on and so forth, which all were at like 95% perfect. Under normal circumstances, 95% is really good. At the time, we said, well, that's okay if you want to look in tRNAs and ribosomal RNAs. But no further. Actually, the reviewers, one of the reviewers said, so now you take this mapping method and you apply this to, do apply it to a transcriptome, the full scale. And we said, absolutely not. And this is something that I've been showing. Yeah, now eight years ago and several conferences, I'm still not sure if it struck home with everybody that if you have 95% perfection in your mapping parameters, and you're looking at a transcriptome of maybe 10 to the power of seven nucleotides, you expect 10 to the power of four to 10 to the power of five false positives. But if you look at the various mapping papers, strangely, that is just about the number that many of them come up with. I do make an exception for M6A because we know there is a lot of M6A and messenger RNA. The colored stuff on the bottom right is where we need to get if we are serious about this and that will take a lot more effort than single groups are currently prepared to invest to make a mapping method really, really solid and reproducible to everybody. And that is something that we try to address in our map as well. Here's something from the mass spec department. These are structures of modifications that have not been published yet. There's one exception, but these are not in your book, at least probably not. These occur in everyday RNA, mostly in E. coli, but they will mess up your mapping method if you don't get ahold of them. So, some of these will give signals in nanopores. Some of them will give signals in Illumina sequencing or in reverse transcription until we have found all of them and know at least how much of these there is, there will still be quality issues. Fortunately, you can see them by mass spec, at least we can find them, but it is hard to position them. That's really a challenge. And then finally, our map is also heavily interested in nanopores. We are just issuing a challenge and we hope we can collect a large part of the community because there are really many people who try to make nanopore work for mapping RNA modifications. But if the previous history of what I just told you is any indication that it'll be some time to come before this is so solid and standardized that we can all use it without being afraid of artifacts, if I may say so. And with that, I'm at the end and thank you for your attention. Thank you. That was terrific. This was a big topic. So, unsurprisingly, our speakers took some time and I'm glad they did. Can we borrow a little bit from the break? Yeah, so we have about 10 minutes for questions. So, please feel free to come up to the microphone or use Slido and while that's happening, I will read the first question from Slido. So, this is about 4 questions and 3 of them are directed to Bob, but the last one I'd like to direct to you both. So, I'll read them all. First, a statement that getting industry engaged in sequencing RNA modifications is key. What do you think motivated Craig Venter when he created Solera profits from, excuse me, profits from patenting sequences that someone's a tongue twister, something else. And then the final question that I'd like to direct to you both is what can we do to get industry engaged in RNA beyond, I guess, the current level of involvement. So, do I need to press something here? Yes. Okay. So, in answer to the question about what motivated Craig Venter, I would put the first answer that comes to mind is ego, but the second answer that comes to mind is the incredible possibility of. Taking the technology that he favored, which was shotgun sequencing and demonstrating that it would work. And actually the idea for Solera didn't come from him. It came from my concapillar and the folks at ABI who then went out and got Craig Venter interested in doing it. But it was a great big adventure. And actually, you know, he kind of shook the cage at DOE and NIH and the welcome trust and inadvertently probably got the fractious forces of the public genome project to actually start cooperating and really accelerating their, their efforts. So, competition between NIH was really important and with between Solera and the public genome project was also really important and I think probably healthy for all of us in the long run. The question about what was it that ABI was expecting and that was they were going to sell their instruments to the public genome project, but they were also going to be able to demonstrate incredible power from using their instrument to all sorts of users. And this is at a moment when it was becoming obvious that DNA sequencing was the new technology was the hot kid on the block, and it was going to be directly relevant for biotechnology and drug discovery at the very least and all sorts of other applications. So, I think they wanted to expand their market and they were thinking that this was a good way to do it and, you know what, it was pretty high floating. I mean, one of the things that motivates the CEOs of companies to see their name in the, in the New York Times and the Washington Post and the public media and this was a really, really good way to get out there. And then the final piece is, is considering the possibility of this RNA modifications focus project, any lessons in terms of getting industry engaged. You know, I didn't I didn't even mention it so this gives me an opportunity to mention I think you need to have a strategy. And I don't know enough about the field to say anything really very concrete about it except the following. The open science ethos has always operated in parallel with proprietary science. That was true in the human genome project and it's going to be true for almost every area particularly in in biotechnology, and they are not incompatible. But one thing that you might look of those of you on staff are thinking about writing this stuff up one of the models for doing really fundamental work that really matters for industry is the structural genomics consortium that maybe you have already talked to, which has a completely open science ethos, but is also very, very tightly associated with industry users of the products. So their rules are no patents on the stuff that comes into the reference sequence and the reference resources that we create, and we don't want patents because we start squabbling with each other as soon as we have them. But you know what, we are really happy if you take our stuff and turn it into something useful, and the vetting process for deciding what targets to do what probes to make and stuff like that is heavily influenced by industry going to the structural genomics consortium and saying, hey, this is what would be really, really helpful for us so all of pharmaceutical and biotech companies can put their list together, but it doesn't become public. And then they assemble the list and say, here's what how we're going to allocate the science but if you do that science it's going to be published and completely openly available to everybody. So that's one way that you can have open science and at the same time push advanced science and new technologies, but at the same time really enable all sorts of industrial applications. It's a really sophisticated model. Thank you. And Mark, any comments on the role of industry in Germany. I would say that as we see an increasing number of RNA medications being admitted by EMA and FDA. We will see that the standards for analytics will come up. They're not there yet, but they will. That is the normal way that things go. And at that point, the demand for these analytics will increase and the companies will cater to it. And, you know, the, the, it's not just messenger RNA, the all the new as I RNAs that have come come up lately, and there will be an incredible market, I'm sure. So from that, from that direction, there will be a lot and then obviously if the companies perceive that the life sciences research sector has enough purchasing power for this, you know, what you're about to negotiate, they will also cater to that. Quite convinced. Thank you, Brenda, would you like to ask a question. So, first, just thank you both to, to Bob and Mark for great talks and, and this question is actually to you Mark because Bob introduced this concept of sharing freely the data and I'm wondering in your groups and consortiums. Do you have agreements, you know, and can you tell us anything about databases that have resulted or ongoing with your groups. Yes, the DFG stipulates a code of conduct for research that involves data management, and that you have to keep it and to make it accessible on a similar way that many journals nowadays do. And open repositories and because of that, we actually have one of our central projects, which is a database and that will make the data available to like everybody. Thank you. I have a final question from Slido from a committee member, how often do or did unexpected challenges arise. Give examples and how were the challenges resolved and in the interest of time maybe just one example, either of you. Mark go ahead. I want to think on that one for a minute. Unexpected challenges, well, I mean, as research. Everything is hopefully everything is unexpected. If you could plan the experiments, it would be development. I think the I when we started, say with the priority program. I had actually hoped that the mapping technology would be a lot less noisy or would have been developed into something a lot less noisy than it is these days. And I think that's really something that for us is a problem and to the to the field in general. So and in the case of the genome project I'll think about that specifically that I think the biggest shock was the emergence of Solera. About six years into the project I don't think anybody at the beginning was thinking there was going to be a privately funded genome project. I'll just flag it a little bit. There was actually an editorial run as soon as Solera was created by Bill Hazeltine in the New York Times saying, Why do we need a public genome project when Solera is going to go ahead and do it. And that actually became quite a political, technical and sociological challenge for the folks engaged in the public genome project both to keep it alive but also to accelerate it and reconfigure how they were going to go about it. And that will lead to the earlier release public release and assembly of the sequence that was published in back to back in February of 2001 by both Solera and the public genome project. So I think that was probably the biggest thing out of the blue. Thank you. One more round of applause and you're due for a break. So, that's nice. Thanks. Yes, there's coffee and snacks outside. We'll still try to recon. All right, welcome back everyone I apologize for the exceptionally short break. We have folks who are joining us virtually who are on schedules but I'm going to pass it over to Juan to get us started on the last session of the day. Well, the last session of the day is on the major concerns and possible pitfalls related to sequencing and mapping of our name modifications, or what I call the friendly critics, and I underline friendly. So, my name is Juan Alfonso I'm a microbiology professor and director of the Center for RNA biology at the Ohio State University. And the goals of this session and I'll be brief is to really raise any concerns, and they gave me a lot of things to pose possible goals I summarize them as what does success looks like and what does failure looks like and why are we so worried. Is this going to be impactful. A very simple thing to address, I think, and they are introduced the panelists, but before I do that, I remind the panelists that are on the side that will have about five minutes to make the remarks. And we try our best to keep to five minutes thought law where pay attention. And I'm, and the panelists are in the thought law who is a professor in the Department of biomolecular engineering at the University of California Santa Cruz. He said, I don't know mix guy in there is of course Shraga Schwartz, who is a principal investigator in the Department of molecular genetics at the Weissman Institute. And his lab works on M RNA modification and how you regulate, regulate gene expression. And then there is Wendy Gilbert, who is an associate professor of molecular bio physics and biochemistry at Yale University. And again, she's on the M RNA side also addressing the issue of cell of gene expression and regulation via modifications. And lastly, but not leastly, it is a retro green who is a Bloomberg distinguished professor of molecular biology and genetics at the john Hopkins University School of medicine. And of course, Rachel, everybody knows is translation translation translation. And with that, Todd, are you on. I'm on. Can you hear me. Can you hear me. No. Yes. Yeah, great. Well, thank you for the invitation. I have to say I am a friendly critic. I'm quite hardened with all the expertise that we've heard today. I am a lot less worried than I was before today, and hearing all the different aspects that are really being carefully considered here. So, I just have a few bullet points that I want to go over. I think one of the things, of course, that has been seen with with other functional characterization programs is you can't do everything everywhere all at once. And so I think the sample selection is going to be really critical. And I think, as the German group, our map showed, you know, doing multiple rounds, where you can see, and you can see where you're filling in the holes and where things have to be done in the future. So I think we won't be able to get all the samples right. But there are going to be some conditions and and tissue systems that are more similar than others. And so I think, I think there should be multiple stages at which sort of there's a there's a full reevaluation again, it sounds like the German program did a really good job of having multiple rounds of that. But with the multiomics approach that Jean talked about and others talked about is really critical and that that should be framed at the beginning, not in the middle or at the end, exactly how all the data is going to be integrated. And so I think also I'm glad to see that all the different technologies that are being considered. I haven't heard a lot about the the alumina sort of primer extension based technologies but those, those are those have the advantages of being very fast and being very good at finding certain modifications fairly cheaply so I hope that there's a really integrated approach between mass spec, the nanopore and then primer extension, you know, lumina based approaches. I think, let's see. I, I second them, the need to do model organisms I think looking at the evolution of modifications and the evolution of regulation of modifications. At the end, we don't want just sort of, you know a set of modifications we want to model where and when modifications are occurring. I think, again, that should be part of the framework at the beginning. And so I think the computational models that include multiomics expression levels of the different modification enzymes. And then I think the goal should be, you know, it's a big goal to be able to predict, just like we annotate genomes that no one ever touches, we annotate all the genes, and we predict functions. I think we should be trying to collect enough data that we have models for all the modification enzymes and when they're on, and that of course uses epigenomic data as well as mRNA data and all the other different omics data. And I think, you know, it'd be cool to sort of have kind of a CASP competition at the end, where, where you have groups who are already, you know, thinking about integrating all the different kinds of data with all these different technologies and being able to predict, you know, modifications for well known RNAs and less well known RNAs based on the motifs, the, the state of other of all the other molecules that you can measure in chromatin states so I guess my take home is, I really hope that it's, it's, we start from the beginning of thinking we want to train a model that we can understand when and where modifications are being made so that we can and if we're good at that we can predict a lot of them, even though we won't be 100% yes or no, I think we can put a probability on how sure we are and so since I work on tRNAs, I think, you know, we're working on trying to annotate all the modifications and all the tRNAs based on the the enzymes that are available on each genome. And if you do massive comparative genomics, as well as train it with all the data that's going to be collected in this project, I think it's a, it's a real goal that should be attainable so I'll, I'll leave it at that and pass it on. Our next speaker is Sharaki Sharks. Are you there? Yes, can you hear me? Mm hmm. I feel it was a somewhat adversarial kind of a goal here. Is it concerns? You're breaking down a bit. Yeah. Or breaking up, sorry. Breaking down. I'm breaking. Is this better? You can switch the video off and that would maybe we'll hear you. Is this better? No. No, you're still breaking down. So you know what, why don't alpha got somebody else should talk and I'll join later. Okay, so we'll move to Wendy then when this ready. I'm here. Do I sound clear? Perfect. Great. So I'm pleased and a little surprised to be included among the friendly skeptics. I don't think of myself as an RNA modification skeptic I'm an enthusiast. I think there's two points about the program. The first is the title, which emphasizes mapping, where are the mods, and says nothing about their function. I think that this focus is representative of the mRNA modifications field over the last decade. And I think that we need to consider the emphasis. In my perspective, there are a small number of compelling examples where a single modified nucleoside in a messenger RNA has an ascribable molecular function for the activity of that mRNA. And it's those examples that convince me that it's worth mapping the mods. But I think if our efforts are too focused on cataloging and don't do enough rigorous functional study that the value of the catalog is not clear. The second point I want to make is about why we're cataloging and what is most important to find. For me, one of the really exciting aspects of studying RNA modifying enzymes comes from genetic evidence that they are critical for human health. And that means, if you have an RNA modifying enzyme that is important for healthy development. You want to know which RNA targets are important for healthy development. And there is really no guarantee what class of RNA will be the relevant target, or how highly expressed the relevant target will be. And so one of my concerns about the roadmap for how we go about identifying all the mods is to make sure that we're thinking about lowly expressed transcripts potentially being the most biologically significant ones. Those are the only two points I wanted to make right now. Except for Tierney's sir. Okay, next we go to Rachel in case I guess not ready. Okay, so I just would echo a number of things Wendy said I don't know why I've been invited as a critic. I'm really happy to be here and learning about an exciting project and how you're thinking about it and I guess I'd make a few points. I should say I'm not a particular expert in the area and like Wendy I don't study modifications. No, I know Wendy does study modifications whereas I don't per se. I'm also don't tend to be a big science person so I tend to think on a smaller scale and my joke would be I didn't even think Madonna was a very good idea. So, you know, you should take everything I say with a grain of salt. As a biologist, I'm really interested for sure in what all the RNAs look like. I'm interested in what they look like from head to toe, and that would include modifications. But I guess I would again echo Wendy which is in terms of the title of the project to me it's a title, instead of her focus was a little different. For me it's a title that's focused on RNA modifications. So head to toe I went into the five prime UTR the three prime UTR and all the splicing isoforms where we for sure know there's huge biology there. And so for me the title by limiting it to epitranscriptome is a focus on modifications rather than those other features which are the really safe place to be from my perspective as an outsider. I guess that would be my thought on the title. I thought there were all sorts of great comments today from from Bob and from Jean and obviously the issue here is these projects take money, and if money goes into this it comes away from something else and so that's why we're all being thoughtful about how to focus this if Elon Musk were paying we should definitely look for all this, we should look for all the modifications. So, but the, you know, that's the balance and certainly always on the, on the, on the one side, all these projects lead to increases in technology and discoveries that we don't anticipate and obviously that's, that's a huge positive. And on the negative, it takes away from other sort of I guess the flowers you refer to the thousand flowers and that's that's the challenge. I guess I would close by just if I were a skeptic, I would say what I loved Kristen's talk, I love the way she focused on quantifying with mass spec and then going to look at it through sequencing. I like the questions that she posed about what one would want to know if one were going to map a particular modification. I guess, if I were to give a little bit of pushback I would say she has these beautiful data that she showed us today, where you have 100% modification at a given site. And the ribosome makes an error of about 1% and I think many of the modifications of interest here are nowhere near 100%. And if the consequence of 100% modification of pseudo you is a 1% fidelity phenotype, possibly in an inconsistent way. I guess, one should then step back and say how much biology are we chasing there. If it's easy to map the pseudo use that's great than we should do it. If it's super hard, and that you know that that that to me is the struggle. And so I think it was a beautiful example of, you know what we don't know everything it might result in that's one assay that's an in vitro assay. It's not the whole story but I guess that's where I would pause and think which of the modifications that likely have biological function and that's I think where Wendy's coming from as well. So that's those are my main points I guess. Yes, is this better now. Perfect. Yes. Okay, great. There's a storm here in California, so I think the white light up enough. I think I'm going to do my best. So first of all, I feel I was given somewhat of an adversarial role here and joining a panel. You know that has to do with with skepticism and criticism and they also want to point out that there's actually a lot of enthusiasm on my end for this and I think this is an opportunity both for community building. And the establishing of standards. So both in the experimental side and on the computational side I think there are a lot of things we need we need to work through and it's an opportunity to form consensus in the field. That really needs a lot more of that now have putting this on the side I was thinking kind of three main directions in which I kind of had thoughts and I think the first thing is I'm seeing dozens of people here and I think I think a fundamental question here is, when we're talking about the epitranscript home what are we even talking about. And I think if we were to ask everyone here in the panel would get a very different set of answers and I think that the reason is that epitranscript was never a scientifically rigorously defined kind of term. It was I think it's more of a buzzword something that kind of was aimed to putting a community together around a new concept but but but I don't think we really any of us really know what it means so are we talking about the epitranscript or are we talking about mRNA, or are we talking also about tRNAs and are we talking about ribosomal RNAs. Are we talking only about internal modifications or we also talking about modifications at extremities. Are we talking about cap modifications are we talking about poly A telling. I think one's are introduced by enzymes or perhaps also once introduced by damage. I think to me at least it's not a very well defined term and depending on how we define it, there would be very different approaches for how to pursue a project and in this context. I think another major kind of question is how abundance must the modification be to even be included here I think many of the modifications that people often like to include when when discussing the epitranscriptome and exciting reviews. Many of these modifications are barely there if at all so again I feel there's a real question there about what do we actually want to include. I want to highlight and it's already been highlighted perhaps without this but just kind of to make a clear point is that in contrast to studies like encode and so forth where one could apply a single kind of technique and applied widely for for example for profiling of different histone marks or so forth where basically all of it was using chip seek. Modification is a world of its own and it has a set of techniques of its own and all of it is still a lot to a large extent work under progress, or case for some modifications so so I feel really talking about all the modifications as if there's some single batch which we can interrogate using a single way. It's just, it's not the way to think about it I think for some modifications where the very mature state. For others were the very immature state from a methodological standpoint. And part of what makes this difficult is that we even lack gold standards here so in many cases we develop a technique. We use it to map modifications for the first time but we have nothing to calibrate it against. So so so really depending on the modification also from a methodological point where the very different. And finally again the heterogeneity of modifications to me at least also raises the question of what are the most interesting dimensions to profile if we're talking about sequencing the equity transcriptome. And again I would argue that for different modifications the answers will be completely different for some modifications where we, some modifications the key question is do they even exist. You know there's no point in kind of profiling it widely and so forth if there are severe doubts about it's even about its existence. In other context it might be differences between individuals if we have reason to suspect that that might be of interest. In other cases it could have to do with cellular dynamics or changes or stress responses and so forth. But in other cases it might also be sub cellular dynamics so how it transitions across different compartments within the cells. In other context it might be reversibility if we have reason to suspect that it might be reversible so I think we really need to consider that while we were trying to lump a lot of things into this notion of an epitranscriptome. It's highly diverse and non uniform and it can be and depending on how we define it. I think it will define a completely different outline for how we go about dissecting it and what questions we choose to ask. So I'll end here and happy to discuss further. Okay with that will open it for questions. So we have several questions from committee members. Well, while you guys warm up your engines. And then I'm one of the questions is what can we do with current technologies, what are the hurdles, and what will hold us back. What are the workforce challenges, specifically in training. That's a big question is for all of you for questions. It's not one. I'll answer the last question about workforce training, which is that, can you hear me. Yes, that we need training in computational biology to be a core part of our undergraduate training in modern biology, so that when students arrive to start a PhD. They are not already excluded from the cutting edge of many fields, including RNA modifications. Thanks, Kate. So, since Elon Musk has his money tied up and other things, you know, I get the sense to from what money on the panel just said that we're going to need to somehow prioritize what we want to go after right. So I'm curious what all of your thoughts are as to how to do that, you know, how do we choose which modifications or which species of RNA. And I imagine I would get four different answers if you were to all chime in on this one but I'm just curious what your thoughts are in terms of how we can prioritize what what is what are the first important things that we should be focusing on. I'm happy to take, I mean, I'm happy to take a first shot, which is, I mean, that was the other thing I was muddling which is clearly some things we can do immediately you guys are ready to go on the splicing isoforms and the five prime and three prime like that's ready to go. And so I guess that I don't have the answer but the question is, how many other ones you need to figure out how many do you want to wait for before you launch, which is I think sort of what Jean said how do you. How many controls do you do how when do you decide to go ahead that you're good enough to go. M6a is the most abundant probably almost certainly and there's dedicated enzymes that don't also modify tRNA that one seems very safe. And you're getting good at that. So then, to me, all the other ones from the outside look hard. And there's less strong evidence for function. And so. That's my, I didn't believe in Moderna. I'm also happy to respond and actually I agree quite a bit with what Rachel was just saying, I think there are aspects that are relatively easy to prioritize on which have to do with the abundance of the modification with an established function for the and I really think that at the intersection of these two there's only a single modification which, you know, which is there which is basically M6a. I think if it comes to pure abundance then pseudo uridine is another abundant one and it's kind of in has been tied also to some additional aspects if I had to prioritize a second one it would probably be that one. But as soon as you go down from there you go down massively in terms of abundance. And you're also dealing with a lot of enzymes is prime targets are with tRNA modifications and so forth and then you always have to worry about is what we're seeing simply some spillover effect from modifications that were initially geared and evolved to modify completely different So, you know, I think depending on how much money there is, you know, if we say we have a finite amount of money and you want to prioritize one then they go for M6a if it's too I go for pseudo uridine and depending on the money one can use the same kind of criteria to prioritize additional modifications. Yeah, so I agree with those as well I think from the the non mRNA perspective, I think modifications that interrupt base pairing so alter structure. I think that's one of the things that's pretty easy. It happens to be easy also because it interferes with the reverse transcriptase. And so considering the speed and how quickly we can profile a lot of samples, even for tRNAs with these new enzymes that are highly possessive. I think in terms of tRNAs, you know they're a bonanza of learning what what the signal looks like when you pass things through nanopores are a little more difficult because the window that you have to sort of train all different kinds of modifications that can be close together. But I'd say the ones that that you can get easy easily and and you know I think people are going to be pretty shocked at how many how much dynamics are in tRNAs themselves. And that's that's really altering the picture a lot more because people haven't been defining the mix of tRNAs and the different I city coder families. So from our perspective, I would say each each sub community community is going to have their their priorities and the easiest ones they can go after let those sub groups define those and and and then take a first pass that because just like all these other projects the first passes. The day is always going to be crappy compared to the second and third pass. That doesn't mean you shouldn't try to get at least a first pass at the ones that you can get so I think getting a draft, at least a low resolution draft of lots of things, and then figure out where you want to put your money after that. So I think in answering this question I can fully inhabit my role as a skeptic. I am a skeptic of central planning. At some level that makes me a skeptic of this project. I'm not disparaging the value of data that has been generated by coordinated large scale projects, the completion of the human genome sequence is probably the best example. But I think you would get different answers from each of us about what would be the most interesting project related to RNA modifications. And I think that's the strength of the community. So for example, to me what is a compelling motivation are disease connections of RNA modifying enzymes where there's not an obvious molecular explanation. For example, there's a pseudoridine synthase that's not known to have tRNA targets that appears to be a host factor for replication of an RNA virus. I find that interesting. And if I were asking an I a ID for funding for the project, the motivation for the project would be the importance of that virus for human health and the evidence that this host factor is important for that life cycle. But I would certainly not make a case to this large community that that should make it into the top 10 list of priorities. But I'll just throw in it really is useful though to have at least some basic set of samples that everybody looks at, because then you really get a deep complex look at those and I really like the the epigenomic roadmap. That I've been using for years and years afterwards because they didn't discriminate against non coding RNAs, but it was it's really deep data it's got lots of different kinds of data in there. And they have all these mature tissues and as well cell lines and that's something else that I think we can get a nice pass for with lots different perspectives, I think, I think that's the one advantage, but absolutely they're going to be subgroups and subclasses where you don't need, need everybody to analyze the same samples and it won't be practical. So when when this point is a perfect segue for one of the committee members questions. What is the cost of doing nothing. Any of you. Well, I grand so worded on this sorry go ahead. I mean I don't have a. Go on Wendy you please. I think there is a missed opportunity for synergy. I agree that there's good scientists who are excited about relevant RNA modifying enzymes who will keep studying them. But that if we miss this moment of hurting cats like me to get in line and say okay, even if my pet project is not on the priority list, I recognize that if I design my experiments and communicate my results and relate them to standards that are agreed upon throughout the field that the opportunities for springboarding the whole field are great and would be lost. Maybe I'll just add that other thing which is the other thing I you have technology that's raring to go and with any big project will come huge improvements in that and so that'll drive new things. That's an obvious benefit. So with that we move to one of the committee that was going to ask members to ask a question. And so that I asked you to please address it to someone in particular in the panel if possible. Okay, well I'll try to pose the question and then I'll try to find a member who would be able to answer. So, I mean, the idea is that if there is an abundant modification, it's more valuable but most of our knowledge and abundant modifications are from model systems. And there is this discussion on what would happen in non model or other species. What if, hypothetically speaking, there are modifications which are abundant in the systems that we have not explored. So we go down the road of picking up cherry picking up right few few of these modifications and start focusing on it. So that was my question. And segue a comment to that is some of the beautiful technologies. I'm a supporter of Nanopore or likewise technologies which can enable generation of data once and reuse of data multiple times. For instance, you could generate data, build models for modification a today, build models for a modification to 10 years down the rank, you can still use the data. Right. So there are possibilities of generating data end to end, and then worrying about building models standards of specific standards for modifications later point. So that was a comment more of a, but I'm interested in knowing answer for the question first. This question is for, say, everyone. Nice. Well, I think Todd might be a person to speak to distribution of RNA modifying enzymes in biology because of his interest in evolution and conservation. It's very compelling that dihydrouretine synthases are conserved across all domains of life. So clearly biology has thought they were important, even though it seems that you can delete all of them from the model you carry at budding yeast and the cells are a little bit sick, but very much alive. I think a careful look at the protein families and the evolution of the RNA modifying enzymes in a sort of a theoretical evolutionary framework should guide where we do our sample collection. We don't want to do a whole bunch of gamma proteobacteria, for example, again, we want to do, we want to spread it out. I think there should be one fund that says sort of orphan disease or, hey, this organism is really weird, but boy, it has some odd, I mean, transoms are pretty weird too, right? And look at all the neat biology we have from them. So I would say, absolutely, the planning shouldn't just be okay, well, we're going to do the standard tissues and the five model organisms and stuff like that. I think there should be a competition or, you know, even a meeting that talks about how to get the best diversity. Jonathan Eisen did a fantastic job of sampling bacterial genomes and getting DOE to sequence across lots of different clades that had never been sampled. It would be fantastic if we can do that because I think if you match the targets that vary, for example, with tRNAs, it's easy to see the tRNA ones. And you can see the tRNA sequences change, and then you can see a slightly different version of that enzyme. You have a lot of information already right there, and that will help prioritize the newest stuff that you want to get at. That then could be used as tools as well as understanding biology. Thank you, Julius. Question from a committee member now. Thanks to the panel. This is a fascinating conversation. My question is about defining success of this project. We heard about the human genome earlier where that project is almost defined by putting in order the 3 billion nucleotides of the genome, which is kind of a static, relatively speaking piece of information. The modifications are very dynamic. So how do we define success with that regard I think Todd had a really comment on this through his, you know, essentially defining success as being able to predict but does the committee have other concepts of success this regard. Yeah, I'd like to point out that in order to define success, I think it needs to be clear what the goals are. And I think there are so many different ways to which we could take this, that it's just to me it's complete at the moment it's quite a vague and unclear concept that I'm hoping to do here so I mean going after a cross evolutionary kind of profiling I think it's fascinating, but it's one way of doing it and looking within compartments across cells is a different way and going after disease related. After disease related enzymes and being able to actually figure out where the disease is coming from and what the precise residues are. That's the third thing and for each of them I think we can clearly define. We found success criteria right we found the residue that's responsible for the disease and we can complement it that would be one way of doing it, but would be completely distinct from, you know, looking at surveying the epitranscriptomes across all species, you know, across 100 different different ways of finding them and exploring the variability. So, so honestly I think there needs to be a lot of discussion about, you know, the goals here for me a main goal as they set up front would even be just to getting a clear picture, even of what's just what's happening in human for example. I mean going back to the question of what would be the cost of not having this. I've reached in so many conversations and people in the field who are kind of trying to get an overview of the field and are getting lost in the literature which points at so many different ways so one goal would even just be having an agreed upon consensus of what actually is out there, and what defines the rules and what are the rules that govern it I think that would also be a goal but would be very distinct. So so so I think before success we need to think of goals. I think it's as was alluded by everybody. It shouldn't just be getting the map but once we have the map can we do something with it. Do we have the end modern a modifying enzymes or reader or writers or erasers to then go change the things that aren't right for biotechnology or for health purposes. And so I think I think that's got to be up there too but yeah. The plan has to change it depending on what the end goal is. I have a technical answer for success, what it would look like to me. To make examination quality examination of RNA modifications be as routine as gene expression analysis by RNA sequencing. Because then the diversity of the research community could be studying this question as it relates to their problems. Currently, the investigation of RNA modifications is a pretty specialized undertaking. Done in a small relatively small number of expert labs, and that limits the biological significance of discoveries we could make. The next question is like Marcus, right? Yeah, Marcus, so I were from Nanopore is sort of one of the things I was hearkening to what Wendy was talking about with how important the computational methods and I would add statistical methods and harken back to in the last session Mark was talking about how a 95% model was super amazing but why would we apply it to the whole transcriptome when we need to I don't know how many nines he had been there something like a q 60 models the only thing that would make that useful right. I think to those points is that one of the main goals could be that education of there's something you can do with a 60% accurate model like there are useful biological things that you can do with that model. But that doesn't mean you can go apply a 60% accuracy model to the whole genome and get something read the whole transcriptome and get something out of it. So I think one of the goals could be like you're saying I think part of what Wendy was just saying that there's there's a training of what these what methods and comparing them on a nanopore modification model to an enzymatic method and having a way that those can be comparable in some meaningful way. I think would be a super useful, I guess sort of directed to when they seem to be on that on this train of thought, what your thoughts were along this line of having training be part of something that could be really important that could come out of this in a system of this in a systematic broad way. Sorry, you pitched me a question but I don't know what the question is. There's more of a, how can we make the training and the systematic, you know, what is it what what makes a model 60% accurate versus 99% accurate versus, you know, and how can that be a goal how can we formalize that as a goal. Yeah, I have a side comment on that nanopores are fantastic for training and getting undergrads excited about sequencing and especially with and now with nanopore stuff DNA modifications as well as our name applications. And so I think the process of running through those experiments with nanopores which are not perfect because nothing's perfect. I think that can be a really good training module and I know you see Santa Cruz, there are a ton of undergrads that are doing nanopore work, and they've really, it's really driven a lot of learning in terms of how to handle that kind of complex data. I think your question about what I think it'll depend on on on the situation, and maybe, you know, defining these standards but, but I think having something that you can plug into your laptop, and then do a sequencing run, I think that democratizes and then that gets a lot of excitement about learning the math and the software that's necessary to get, you know, information, real information out of it. I just want to go back to that previous thing which is, I mean I think success has to at some point connect to function, which you know echoes what Wendy said in the beginning and I don't have any idea how to set that up how do you. You get huge deep data sets and then you have to find something fun in there that that you can that you can poke at. But I think that should somehow be, you know, do you get a whole, you know, you get the whole thing and then you're like well now we have to do a time course and now we have to do a different organism and now we have to do. Like when do you stop and say we have to assess whether we're learning enough. I guess I had a question, I guess it's to Todd that's kind of to Wendy because you both kind of to me touched on this and it talking about viral sequences right so a lot of this is on the, we're going to look at the human epitranscriptome. Don't use that term right or Todd, you know, talking about comparative genomics, you know with the human genome we started with the genomic projects we started sequencing viruses, and then bacteria and then we then we went big. Is this really to the point where we're going to look at the transcriptome wide in the human, or should we ratchet more down and focus, you know, on smaller organisms viruses. viruses and our any viruses are way way complex. The way they fold it's crazy and all the little bits of regulation that they evolve very rapidly. I think they are hard, but I think for microbes, absolutely. We're looking at the evolution of these enzymes, you learn so much about specificity. And then, when you see a new enzyme pop up, you're like why is this due to placental development. We have a lot of new tyranny genes that popped up in vibrates. So that new modification patterns, why do we have two different versions of this isoleucine tyranny. And so that's, I think that's where you want, you want to look where there's their big sort of jumps in, and you can say well, there's no change about their biology or their environments or their needs. So so I think connecting it constantly to the biology is is a requirement and that's that should be an assessment made after every survey of raw data. What does this tell us is this now a regulated modification or is it not really really regulated. It just sort of comes and goes. And so that's, so I would say that that's the job from the beginning to connect it as much as possible. There's a lot of time but many can have the last question. Okay, so just tie things up with what Wendy said Wendy one said something and then Wendy two said something else. Wendy one said about function. I refer into the same Wendy, but initially, the, the, the, the point that you made Wendy was about, you know, linking the RNA mods to function and I think that that is a much bigger problem than your second point which was to define a very technical milestone of success which is, can we actually map these damn things right like can we actually get all RNA modifications mapped and and have it be as easy as you know a sequencing experiment. And I like, I like that definition because it's a lot more manageable I think to to have that technical milestone to have a technical goal because because when you talk about function you're also talking about the proteome you're also talking about, you know, modifications on the protein you're talking about the chemical environment metabolites you're talking about so many other things that are, you know, just it's an endless pit, I think. So that's, so that kind of also touches upon your point so that was kind of my question slash comment. And with that, thank you so much to everyone for participating and keeping awake and all that jazz. Thank you everybody. Yeah, good luck. Bye bye thank you. I guess I can do the honors of passing it off to Nicole for some final remarks before we end the day. Right so now I have the very difficult task of somehow synthesizing all of this information and providing a little bit of a summary of what we discussed today. So we started with a review of the NIH workshop that was interested in some of the same aspects that we've discussed today. What are the technological challenges that we need to overcome samples and standards that we would need to achieve a project to sequence RNA, RNA isoforms and RNA transcripts and to end with modifications, the workforce and infrastructure needs that we will need such as computational pipelines, experimental pipelines, standardized procedures and quality control. One thing that I saw emerging is what aspects of modifications should we focus on. And we heard from Kristen about why we should care why are we interested in RNA modifications. And one reason which is I think why a lot of us are interested in this is because these modifications have the potential to impact any stage of the mRNA lifecycle to impact function in cells, but we have to think about how we are going to convince people outside this room, like the public and funding agencies about the significance and what we might learn from this project and this includes connecting these writers readers erasers to human disease. We also talked about different organisms and what we might learn from modification in different contexts, and regulation of these modifications in different conditions, and how that might control gene regulation so these are all possible focuses of a project on mapping RNA modifications. Okay, we talked about technologies and two that we heard about today were mass spec and nanopore sequencing, both of which can tell you about direct levels of RNA modifications. And we heard how mass spec can tell you about quantification and stoichiometry composition or even the existence of RNA modifications, while nanopore methods can tell you about the locations of RNA modifications. And we, a theme that we saw is that really the use of these orthogonal approaches is the likely path forward where you can get both validation by different methods, as well as complimentary information from these approaches. And, you know, there's hope for improved development, for example, getting mass spec so that we can do sequencing of RNAs by mass spec. We heard in terms of nanopore technology, how engineering the nanopores could lead us to better detection of modified basis, and how development of the computational and analysis pipelines, such as machine learning and algorithms to interpret the data and learn how to call modified bases will be a challenge and a path forward and in order to do that we really need the standards that we can use to validate these different modifications and establish benchmarks for all of the methods that we've been hearing about today. So how do we get these standards with known amounts of RNA modifications in a variety of sequence contexts and how do we distribute that so that the community can use these to benchmark and validate different approaches. And we heard about some of the challenges related to making those standards, including the costs, motivation to make new modified nucleotides of whether there is a biological interest in doing that and pushing that forward, as well in the cost and characterizing such standards. We learned about the need to standardize approaches and standardize experimental and computational pipelines and we learned some of these lessons from the end code project, as well as quality control for the data that is generated from a project that involves mapping the the full spectrum of DNA modifications in full length RNAs. And, you know that this is a project that will likely involve partnership between academia industry to help develop the technology, as well as funding and governmental agencies to support the development of these projects. And we can learn some lessons from the human genome project that had a lot of parallels to this project and many successes. And, but we have to think that this is a more complex problem in the sense that it won't be one sequence it'll be many sequences in many different contexts. And we will likely need a variety of different approaches to achieve a project like this. And then finally I'll touch on a little bit of the few last points that were brought up, and that is, you know, one establishing short term versus long term go goals. So you're going to pick what to prioritize, you know, which modifications should we focus on ones that are more abundant. Should we focus on ones that have more likely functional impact in biology, or ones that we have better tools to understand now. We're thinking about biology and the function in terms of this project. And again, we can think about regulation, we can think about different model organisms we can think about, you know, a whole number of context that would be interesting to study but we need to pick one moving forward to make most progress on, and perhaps think about ways to expanding to other ones. And finally, I'll say that the, the last thing is to define what success will look like and decide on goals and milestones that we can be achieving and looking out for whether the project is being successful. Okay, so with that, I think I touched on what I found to be the most major themes and we're really excited to continue the discussion on on what the future looks like for this project.