 Thank you. Thank you very much Okay, so what I wanted to do is kind of take a step up from the from the very fine-grained information that we've all just heard about cell lines to talk about how they're referenced and how the literature is is using them so as Amos, thank you very much nicely said Yeah, I'm at the Department of Neuroscience research faculty There and I also along with some of my colleagues had founded a company. It's called Cycrunch Inc I will not talk a lot about it But we try to work with publishers to improve rigour and reproducibility in the published literature. So many things are Actually conflicting my off my the lawyers at our university have a very good time with me. So Let's move along So the big problem right that an RID is attempted to solve Can be summarized by just sort of looking at this paper. This is a random paper and it's a few years old now But you know, we can we can all I think Understand and we've seen papers like this before so This is an author who is citing Nice dot Who's citing this this mouse and they're trying to use this tell us that he's the authors are using this particular mouse this not pk skid IL-2 Receptor gamma chain null mouse from the Jackson laboratories in Bar Harbor, Maine Well, when I go to Bar Harbor, I maybe have could be could be able to find this mouse But if I go to the website of the Jackson laboratory where I think all of us would probably rather go than Maine This mouse does not exist If I search in a different way, maybe I get 15 mice if I search in a different way I get no mice, but nowhere in my searching of this this set of words Can I get just one mouse? So this is a real problem with the scientific literature But it's a problem. That's not that hard to solve. In fact before one of my talks I pulled out this paper and it was funny because I said to the author of this paper I said, hey, I'm trying to track down your mouse and I can't seem to find it Do you have any idea what mouse this is and within about an hour? I got an email back. So this was still before my talk. So I then I put it in my talk I got this really nice response back from the author and he said, oh, yeah Here's the stock number for the mouse and I said, huh You know, it would be really nice if you put that information into the paper to begin with so I wouldn't have to bother you Right and of course we all know this Problem where you know after a few years the person from the lab leaves It's not so easy to retrieve this information if I did this with this paper again I doubt that the person would know immediately which mouse this was and The fun part of all of this is that just recently this year We got the Jackson laboratory to also change What their website looks like to include the RID right there on every single mouse including of course this one And the thing that I would also like to highlight for everyone in the room And this has been mentioned in one of the talks before but this not pk skid I'll I'll to is not the name of the mouse this is a nickname of this mouse and The there is common names, and there's also known as and Nowhere is this specific mouse string Available at the Jackson lab the real name and this okay. Let me just get you to the real name of this mouse It's nod cg prkd skid i'll to rg tm one Wjl slash s z j Of course if I'm in the lab. I want to refer to this guy. It's going to be Bob, but when I publish I'm gonna have to use this name So how common is it to actually find these kind of problems? And we're talking about cell lines not so much organisms all the organisms actually are pretty good But cell lines and antibodies are really bad about 50% of the time we get one of these nicknames and the nicknames are bad We cannot find these reagents and again as we saw from this author this wonderful author that told us Most of the time you're going to come back and say hey, oh, it's that one. I just have to look up this record It's not a big deal Okay, so Looking at this problem over many years We tried very hard to get different people including our our home society Which is the society for neuroscience to actually solve this problem? And so I presented this problem in front of the entire editorial board of the Journal of Neuroscience Which is like 30 people all very very Agree in in full agreement that this is a real problem everyone went around the room. Oh, yes It's a terrible problem terrible problem terrible problem We came back a year later and there has been no change So we said, okay, how do we solve the problem for real, right? And so we brought back everyone at the next Society for Neuroscience meeting from 25 journals and They started talking amongst themselves and they said okay, maybe there's a problem But it's almost unsolvable then we brought everyone back in from into the a two-day workshop at the National Institutes of Health all of these journal editors then said, okay, there is a problem Here is a potential way to solve it But we cannot be responsible for you know carrying a stick and Hitting the authors with it because this is not really we don't see our role this way But we can ask for an RID Which was coined at that time between the NIH and these journal editors and They said okay, what you really need is in order to make this easy for the journals What we have to do is take all of these important identifiers and put them into one website So that the journal can just send the authors there and have the authors Do this job of trying to find all of the reagents that they're looking for In one place and that seemed to be a relatively easy thing to do So the bioinformaticians got involved and what we did is we brought together a lot of different resources Including cellosaurus right here For cell lines. So when we and then what we did is we took all the data and all the metadata From all of these different sources including ad gene, of course Jackson laboratory. Where is that? That's right? right up here MMRC a lot of the other organism banks and NCBI biosamples and we put it all under one unified portal There is other information there including information from ICLAC which of course comes to the cellosaurus And is also then nicely reflected on this portal But there's other information from projects like encode which Verifies and validates certain reagents a lot of core facilities have Donated information to us and of course the authors themselves are donating information to us all the time when they site RIDs so what does one of these pages look like and I am going to tell you about the German collection of microorganisms and cell cultures or actually I'm not that's going to be later but this is a web page about this particular RID this particular project and what you see with every single RID page is of course the name the identifier itself, so how do I cite this thing a Bunch of information. This is the basic information There are cross references and we know that this particular collection works with cellosaurus Cellosaurus also has another page wonderful and then there are citation metrics and there are ratings and alerts for That are brought out into its their own separate section So in this case the the information that's coming from the literature is either coming when someone sites This particular resource by RID. So this is the You know an author in 2021 that cited this resource by RID or There's a text mining algorithm that also just pulls from the open access literature things like the URL of the resource that we then have stored here and So those papers are part of this and of course cell lines have their own web pages every single RID has a has a An identifier and it has a web page and it has the same basic format with the cell line information This is a part of what comes from cellosaurus There is the link to cellosaurus directly But what we have that's a little bit different cellosaurus has information about the papers that Originated this particular cell line. That's very important We have information about the people who are using the cell line so All of these papers are actually the papers that you can then see and in this case, you know out of this list of papers you can see that the most recent was Somebody named Pemberton in this paper had used this particular cell line this Unsurprisingly HAK 293 is a very commonly used cell line at the very Bottom of this page is something called the ratings and alerts and this is where we bring down The ICLAC information trying to make that as visible as possible Okay, it's not that big actually on the real website. It's smaller, but I wanted to highlight it here. Okay so And then you might ask the question if no one knows about our IDs, how do they think how do they find this out? Well the fine journals represented here by one of our journal editors will usually have instructions to authors and Then the author will be instructed to go to the RID site Of course the author will not have to go to every one of the Individual resources because the information has been brought in even as we are not Responsible for the individual information that is the work of all of the great resources like Jackson laboratories and of course Celesaurus So the author goes here they copy this text from the site this Or they can actually cite it this way and then they paste that Particular snippet of text into their paper the paper is published and then we can do various things to mine that information back out so Last year we published a paper about actually having 500,000 of these things already in the scientific literature, which is great But the reason that we have so many is because of the hard work of the people on the journal end So what they have been doing including the American Association for cancer research and elife and Nature we have science we have the endocrine society. We have many others that are very Interested in improving reproducibility and so they've written lots of things including their instructions to authors and so what I can say is that we have Over a thousand journals that actually ask in their instructions to authors for our IDs and If you have not read the instructions to authors, well You wouldn't be alone In fact, most journals know that no one reads the instructions to authors, but the information does have to be there just in case somebody bugs somebody And just to put this in context currently there are over 5,000 journals indexed in PubMed So while this looks like a very big number it is really not We have you know, we have some support from some of the big journals, which is great But we still continue to need more support The arrive guidelines last year were updated to include our IDs for various Biological reagents the M. Dar checklist, which is currently Used by science so every single science paper actually has this M. Dar checklist as it's the reproducibility Checklist that needs to be filled out. This was created by a group of reproducibility researchers and journals To represent all papers Jats is another It's the journal article tagging suite version 1.2 This is a guideline that tells the publishers how to deal with our IDs So We are there. We're getting there, but we still have a very very long way to go So we are continuing to you know, give these talks. I am continuing to give these talks and Solicit additional information about From authors to ask them to put our IDs into their papers So what does it look like and you saw one of these papers already? This is a table key resources table from elife Cell press does this as a matter of every single paper. These are wonderful And if they show you what is the designation of the thing and really here you're getting more and more of these designations to be the full name So the full name of the mice of the cell line of the of the resource itself starts to go in here We've measured. This is actually getting much much better The source of the reference, you know, did it come from abcam? Did it come from another lab? It starts to kind of come in here and then of course the identifiers and the identifiers are linked in many of the journals and When you click on this link, you will end up back on the web page about this particular cell line in this case and If you squint really hard, you will see that the first the most recent article is by Pemberton in 2023 which strangely enough is exactly this paper right here so I'm going to ask you to think about searching for a paper and Thinking about a paper not just as a collection of ideas Which I know that many of you have already done in this room But also as a collection of methods And I would like you to consider that your paper is actually going to be searched For the methods and the reagents and the resources that you're using now the cool thing is That we will definitely put your paper in here if it has an RID But what does that really do for your colleagues? Well, if they're trying to optimize a protocol if they're trying to figure out how you use that antibody How someone use that cell line what culture conditions where you're using the really the good way to find that is actually by Putting the RIDs in and then being giving that your colleagues the ability to easily find your paper Based on that research resource based on the identifier of that research resource But we asked a couple of years ago and this is with Amanda and with Amos does Do RIDs really help science and so this is really a Deep question So we know That there are these problematic cell lines. So first of all, let me just give you the throw away here This is the original paper showing that in fact, you know, if you ask people for persistent identifiers in the in their papers the Identifiability of those research resources like antibodies organisms and tools goes way up So if you ask people to provide identifiers as I did for that original example of the of the journal author They are able to provide it about 90% of the time So this is great that that brings an additional rigor to the scientific literature, which is amazing but Does that help? So the question here is If we look for a contaminated or problematic cell line We can find it and in fact we can find it fairly easily We see that the comments and these comments are also now in red and I haven't updated this slide but these comments are in red and So the question is when people see this or when people see the same information in the cellosaurus where it's always been read So thank you very much. It was for that Does that change whether or not they publish with this RID with this cell line and The answer seems to be yes it does So in this paper my student Jeliana Babic Actually looked at using a text mining technique She looked at every single Paper that at that time was text mining available She found a hundred and fifty thousand of these papers that had used at least one cell line and this Basically showed that when she connected the data Between the problematic list any problematic any problem Reported we didn't look to see deeper which ones were really bad at which ones were not really bad We found that sixteen percent of papers We're using using a fairly strict criteria. We're actually Using at least one problematic cell and when we looked at the list of RID papers, which at that time was 634 There were only about five percent of these papers that were using a cell line that was on this problematic list the Then we dove deeper into this because this is only 50 papers that were actually Using one of these problematic cell lines and we really only found one exemplar where there was a an author that saw something like this and still used the The cell line in the paper so While maybe at that time it wasn't read maybe it's better that it was it's read now But it was really only one author the rest of the cell lines that were being used like things like hep g2 have a note on them It's fine to use them as long as you know what the origin of that cell is So this huge decrease and of course, this is one of the most significant findings that I've ever had published But this huge decrease of over 66% is what happened. We looked in that paper a little bit also about the Looking at whether or not and this is here in the blue and the purple We're using a looser criteria and a more strict criteria to define what a cell line is. This is looking at that 150,000 Exemplars and what you see is that in each case The problematic cell lines are growing, you know, maybe on average from about five to about seven or eight percent of all of the cell lines used which then affects again sixteen percent of papers and Here we only pulled out one time point of course there are many time points here where people are Talking about problematic cell lines and and how much of an issue it is here We pulled out 2013 as the ICLAC paper was published and then we read did the analysis afterwards and one thing you can see is Although these numbers are higher in terms of absolute percentage They they're starting to go down So really I think this is again the the work of the ICLAC consortium Thanks to Amanda and others who are actually Driving the literature to a better state And one thing that I I just had to put in very quickly is I wanted to ask Because there was a lot of talk about how many people are authenticating and all of this fun stuff So we had used more to we do a lot of text mining in the lab So this is our latest paper with text mining and this is just a bunch of data about journals and these are journals of in 2020 and I sorted this data by the Whether or not this cell lines were actually findable So you you might not be so surprised to understand that the International journal of cancer actually has the highest percentage of cell lines that are findable. Thank you very much for that but in total We know that there were number of the number of papers right now with cell lines Out of this data set, which is two million papers were about three hundred thousand Papers and out of that we found authentication statements in a hundred thousand of them So the authentication statements as a whole are about a third and I that surprised me That's actually much better than I would have expected Okay Do our IDs really drive better citations This is a study that we are about to put out into Into bio archive and this is looking at all references to one of the mouse repositories And so in the nickname category these these orange lines What you're seeing in the very beginning in 2011 is that about half of the papers are talking about these mice using a nickname Whereas the full name that's being provided here is this very long v6 SJ etc. That's being provided in about 25 percent of the cases and about 25 percent of the cases we get the catalog number but as you look at the start of the RID project which is here in 2014 and when this Repository actually joins the RID project to Very transparently put on their website the RIDs for each mouse You start to really drive and increase the number of papers that are using the RID which then also replaces and drives out this bad practice of Using the nicknames Not so much the kind of good practices here in blue which is really telling That most people are able to find this information most people are able to provide this information and that makes the literature Far far better So I just wanted to thank everyone here for listening to me drone on and on about why you should Use RIDs in your next paper, but of course I will just have to say please use RIDs in your next paper and My information here is Listed down below and of course, thank you very much and This is the lab that Contributed to all of this work. This is the fair data informatics lab and we have funny hats All right, thank you. Thank you very much and thank you for I For questions. I missed how do I get an hour RID? It's really easy you type it into Google and You can find the RID website. I mean if if I create a new resource Fantastic fantastic question. So if you create a A new RID if you create a new resource then and that resource is a cell line Then the RID portal has a little button at the bottom that says create new resource It will send an email effectively to this wonderful man over here And he will help you register that cell line if it is a software tool Then there will be another option at the bottom to register that with the tool Database if it's a mouse we ask you to deposit the mouse in your favorite repository or If you've created a mouse and you do not wish to deposit that mouse then we send you to the mouse genome Informatics database. So there's effectively because RID is not one thing We have to send the user and if it's an antibody of course antibody registry if if it's a If it's a resource we try to kind of capture all of the places where the registration has to happen And we send you there So Yeah Maybe I got it wrong, but Again working in a enough in a facility sometimes people take a cell line that has an RID Like an MCF 7 they do a lot of modification. Is that a new cell line? Do they have to have a new R? Okay That's good. Now. Just you have to ask this man because I know nothing Yeah, even if it comes from a Parents a line and then you have me okay now just so that we can give guidelines to our users How to use these tools? Yep exactly? And and by the way if it's a new antibody You know we take that to the antibody registry and again There are rules set up for each different type of resource that makes sense for that resource and so you know we The governance of cell lines is happening right here The governments of antibodies is happening at a different database and they all have rules that make sense for that resource I had a question concerning the uniqueness of the concept that is being identified by your RID is it being done then by the registry systems? Yes Absolutely, the registry systems are the ones who govern the RIDs And then the RID is just this kind of umbrella organization that says okay as far as we know The experts for this type of resource are telling us that this is how they would like to identify that set of resources and That is what we will then reflect This is how you avoid having duplicates. I do not wish to have any duplicates, right? So again all of the RIDs for cell lines start with CVCL For that reason right all of the ones for antibodies start with a B underscore for antibody registry and so those are the drivers of of that both the new resources and everything that we know already One last question. So what about viruses? Oh Good question. It's about the repository that we've had problems interacting with but they're working on a reference material That we will be characterizing So if it's something that we can try and push to maybe begin the RID conversation with them So we do increase the types of RIDs that we handle But there are certain rules that we have to follow in order to not make this helter-skelter if there's only one tiny repository That it has something completely different We will not usually make a type for it without Kind of a community effort So we added the most recent addition So actually cell lines were not an original RID the original RIDs where the model organisms Software tools and antibodies and then we added on to it. We added Celesaurus and we added plasmids We can add more types of things But there are rules that we have to follow in order for that to be Done in a concerted fashion. We want to make sure that there's community behind this particular repository We want to have that those You know all of the stakeholders in that field really kind of need to be Involved with that process. Otherwise, it doesn't work very well. And that's you know, that's the main thing technically there's no problem Technically, I would love to do viruses, but maybe if we can get somebody like abgine to do them They're super responsive much like you know So, okay. Thank you all very much and use our IDs