 And thank you everyone for coming here in Lausanne. So this is the first talk of, I mean, of this symposium on cell lines and around cell lines. I'm not going to describe cell line because most of you know what it is and you will, but you will hear about them, of course, in all of the coming talks. And here what I will concentrate is on the cellosaurus, which you could call a type of cell line in syncopedia. And maybe the first thing is, this is probably the longest slide in term of text, is why is the cellosaurus? I mean, and it's more an historical perspective. I mean, I was not in a world of cell line. I didn't know anything about cell line except what was done in labs and so on, but I was basically working on proteins with SwissProt and ZennexProt, which is a database on human protein. And of course, if you're looking at human protein or any species, you want to see where experiments are done on this protein. A lot of them are done on cell line. So you think you want to annotate where experiments are done. So we naively thought that we could use an existing database of cell lines and basically just say, okay, we're using accession numbers X and Y, which is cell line, Hila or whatever it is. And the problem was that basically there was no resource at that time. So in the end of 2010, which was comprehensive and had all of the cell line we needed, even so we needed at that point only 100 cell lines to annotate and so on. And so it went to 500, but even a tiny number of cell line compared to what exists. So we created what was a small cell line, Tessaurus, just a list of cell line. That's why it's called cellosaurus. Cell line Tessaurus. No, it's no longer Tessaurus because people became interested in that and said, oh, can you add the cell line from ATCC, the cell line from DSMZ, the cell line from this paper and I published the cell line. So it grew and grew and it's no longer a Tessaurus, it's an encyclopedia of cell lines. So it started in 2012 and it evolved into what is now a knowledge resource, a bio-curated as, I mean, Christoph explained resource. And just in a visual way, you're not being able to read anything here on the next two things. Here is an entry in the first release of a mouse cell line and you see it's only eight or 10 lines and here is the same entry in the current version. So at one point you have a Tessaurus and in other parts you have a type of more and more encyclopedic knowledge of the cell line. So in one slide, the Tessaurus, what is it? It's a knowledge resource on cell line. So on 150,000, it's a new version coming up in a few weeks. We'll have 151,000 cell lines. So we went far from 100 cell line, we needed to annotate to 150,000 and the scope is all immortalized or naturally mortal or finite cell line but which are well characterized but not primary cells and it covers all vertebrate and invertebrate cell line. Now, there's about now 50 different feet. I'm not going to bore you with all the different fields of information. I would just show at one point an entry and maybe point same but it's basically 50 different type of information items and a lot of reference. That's important of course to say where the data comes from and cross reference to a huge number of resource, external resource which either use a cell line providing information on cell line or very importantly distributed cell line, cell line collection and we'll have a talk, I mean during this symposium by one representative of a cell line collection so you will hear about it. Now, availability of cellosaries, you can just Google and find it but it's wwwcellosaries.org, it's downloadable in different type of formats. There is no for programmers and API. Most of you are probably not really interested in programmatic sites so I'm not going to describe the API, I mean. So basically it's a resource available on the web and you can download it if you want to have it in-house. Now, it's part of, and this is our talk again and it will be a talk by Anita later today of the resource identification initiative and we'll come back to that and of the International Cell Line Automation Committee and also you will hear about that I mean with Amanda, I mean also during today and we collaborate with the human prepotent stem cell registry, Andreas Kurz, who is here. So basically this is not only a presentation of the cell but it also presents who is going to speak and what will be some of the subjects of some of those talks and many other resource, cell line resource and collection. It's being recognized in the field of database of bio resource. So our different organizations trying to see which resource are seemed to be the most, I mean useful one and important one, Elixir in Europe and so the cell resource is a core data resource and just recently a last month's a global core bio data resource so it's a global effort to find which resource should be globally are interested worldwide, interesting worldwide as put the cells on its list. It's a new dich recognized resources for rare disease research. Anyway, you have to enter data into cells again I'm not going to describe all the different fields but what or does this data, where does this data comes from or does it gets into cells whether it's four channels, it's extracting data from cell line collection so selling cell line, distributing cell line and so generally provide a relatively good amount of information on each cell line, the distribute. I mean, unfrontly, this is not easily automated. None of the cell line collection really has a way of downloading the data and so that it can be programmatically extracted. To my knowledge, I've never seen a cell line collection which allows this so you have different strategy but it's slow and tedious. Then you want to also extract data from other resource bioinformatic resource which have specific data field which could be interesting because experiments were done on the cell line and capturing to bioinformatic resource. So that's much easier because generally those resource being in a field of bioinformatic know already about standardization, about protocols, about, so there are generally APIs, there are different ways you can download data and so on and you can get information like sequence variation, HLE typing and so on. But the biggest effort is basically taking papers up to 27,000 now of them, read them, of course not all of the paper because if you're only interested in cell line and the paper discover other thing it will be only part of it but sometimes it will be the old paper and extract what's useful for cell users. So that's the biggest task and the most time consuming by definition. And now submission and personal communication is also a way to get information and this is something which is increasingly becoming, I mean, getting bigger and bigger and that's a message, a take home message for the audience working with cell line don't hesitate to send information on a new cell line or a cell line which you think is not well characterized and there is information missing. I mean, send information, I mean it's always going to be useful and you'll be thanked for it and we'll be, of course your reference, your paper will be included but also if it's personal communication we will cite you as a personal communication. Now I was saying so database is linked to a lot of resource you see just here a little bit of logos, a big set of logos but it's not all of them here is only I think 60 of those 107 which is linked and you will recognize cell lines, collection, ATCC, DSMZ but also Bonformatic Resource, companies and so on. So availability already set it, it's on the website, there is an API but just recapitalics here, you can download it in three formats for a moment and that's where you can get it by FTP. So four release per year, I mean so not with this very precise date in a year but since it started it's four per year and is distributed in a creative common CC by 4.0 meaning basically that you can use it, redistribute it and everyone can do it whether in academics or in, of course in industry. That's just a website homepage and basically when you go to an entry you have a lot of information and I'm not going to go through all of this but you can basically just recognize sets of information, comments, like sequence, variation, you see names of genes and variations and link to database like CleanVar which is a database of variants in human. You see information on cancer which are linked to control vocabulary of disease. I mean orphaned and the NCI Thesaurus species linked to NCBI taxonomy and I'm infacing these links because in this world interconnected world what's important is to link, I mean with different resources, link to controls vocabulary or to ontologies, ontology of disease, ontology of, and control vocabulary of species and so on. Lot of, I mean, table type information like HLA typing, I mean genome ancestry for some cell line and very importantly and I will come back to it what we call in the world of cell line STR profile. This will be covered really a lot today because in fact that's one of the big issue in cell line reproducibility. So I will speak about it. Amanda will speak about it. Jamie will speak about it. You will hear, I mean I'm sure all of the talks will have a little bit on this but it's important to get this message across about authentication, reference, only part of also reference here and cross-reference where can you get the cell line to providers, important, and but also also resource which there is information on those cell lines whether it's, I mean proteomics, data sets, sequence database and so on. Okay, now let's get to problem with cell line and some of the solution which will be described, I mean so I'm not saying here of problems in terms of I mean growing them and so on, I mean this you will be also hearing about but more in terms of some things, in terms of reproducibility of science. So one of it is naming issue because people often give short names and this is a disaster. Here is 10 names for 37 different cell lines. And of course you can realize that if you call something C2 it's not going to be very precise. Even, I'm not saying even in a full world of all of certain genes and so on. Even in cell line it's going to be already 405 and of course it will be a gene name and so on. So basically short names are a problem and of course you can try to make longer names but people will be very, I mean good at abbreviating them and so of course somebody called FG2 slash C3A. Oh, I'm going to call it C3A. Now there are only two nomenclature words which have been proposed. One on insect cell line in 70s which is not really used a lot and one which is quite important for all of the stem cell and proper stem cell and Andreas will probably speak about that which is used quite a lot. And of course there is a used number of misspellings but of course stem cells, it's quite important that cells is nomenclature but all of the rest, cancer cells, everything else. So it's no nomenclature whatsoever. So of course instead of using the name you could use an identifier and that's where Anita will describe and I'm not going to describe this initiative but just in two words because it's important to say it's basically you want to have persistent, unique identifier in a literature for resource that you have used in an experiment whether it's antibody, cell line, software, strain and then basically what's important here in the context of cellosaurus, the cellosaurus is cell line resource for this initiative and basically there are ID for cell line are the accessions of a cell line entry in cellosaurus so you don't need to have two different identifier, you have one identifier. And here is what it looks like in a paper in a method for part of paper somebody have used ray 264.7 which is cell line I showed which went from a few lines to that and the site error ID with accession of it and you see here that is citing four different cell lines which also usefully is a catalog number where they got it sometime like ATC well they didn't put the catalog here but they say where it came from but at least wherever it is you can find what that cell line is by going back to the database and so it's more and more papers which sites this and Snow reached, I mean it started in 2000 not the initiative but the cellos joining it was in late 2016 beginning of 2017 and you see that's increasing which is quite useful because you have sent papers where you know which cell lines have used. Now the other big issue which will be described I mean doing this symposium quite a lot is cell contamination and it's a huge problem and here is just titles of papers basically it's a dirty little secret of cancer research wasting a lot of research funding oops and so on with estimation that's the extent of for cancer cell line of contamination goes from 20 to 36% now that means that I mean when I need to check about cell line integrity and I here have a few definition which this will be redundant to other talks but I think we need to I mean get those points over and over I mean contaminated cell line can arise through the accidental introduction of a foreign cell line which is sometimes called cross-contamination or of course from microorganisms this is less an issue because people recognize this since that our cell line is contaminated with microplasma what we really are talking which is really the problem is cross-contamination whereas the cell line you're using is not the one you think you really so when you think you're using is not the one you're using and you can have also misidentified cell line which are I mean a result of error in species or gender people telling you this is a cell line from a male organist but no it was taken from a female organist or this is a bovine cell line no sorry it's a pig cell line but and this happens I mean you would think people know where they got the cell lines from and where they originated but errors happen and in fish world I mean this happened also a few days also in terms of taxonomy I mean people not putting the right name for fish or just saying salmon whether saying which salmon it is and so and of course you have misclassified cell line whereas the tissue type is not correct the cell type or even the disease people thought it was from a given disease and no it was not that disease anyway here is three papers with three different type of those examples one is a classical cross-contamination where a bladder cancer cell line it's not a bladder cancer cell line it's Hila and another case where one monkey's cell line is not really from the monkey people thought it was I mean so BSC1 is not derived from a serocopters it's from another monkey and another case where one thought this was a semi-no mass-aligned but it's from a different cancer type and so on so all problems case because if you publish thinking of what model you're using and it's not the right model it can completely invalidate what you're doing so also this problem case fortunately there was a work started already a number of years ago with I mean a number of scientists all of them doing this pro bono basically none of them is paid to do this I mean those are scientists which are passionate in getting those cell line I mean correct and so I've created committees international cell line notification committee and we have here both the first chairman and the current chairman I mean Amanda and Anita which basically provide a list of misanodified cell line registry and you will hear about that and of course the sellozorus uses information and tells people this cell line is inside the registry as this problem and so on now so we annotate this we use I'm not going to go into the detail of all we do it but we have field problematic cell line I here give examples but let's not go into details for me what's important to remember is if in sellozorus a line is known to have a problem you will see it and it will be in big red letters and it will be shown that it's problematic and we also do it for cell line which has a descendant of a cell line which is itself contaminated because sometimes that's also a little bit of a hidden problem which is I mean maybe sometimes overlook people know okay this line is contaminated but some people have created new cell line from this cell line before it was known to be contaminated so you have a whole family a hierarchy of cell line which are wrong because I mean all are descendant of one which is not so correcting so for that people use in the cell line world and most of you know it but I will do one slide knowing that this will be probably covered later what are called shortened and repeats loci in a genome which are polymorphic and which can be used I mean it started with the human genome we'll see it's not only human now but basically those were used also for forensic for paternity and so on and basically I mean they're also used to ensure quality integrity of human cell line and so it was a standard published in 2011 which was updated in 10 years later in 2021 with the loci people should use to report I mean integrity of cell line it's not it's very useful it's not all of it and really we'll spoke today I mean on things that could look good from a point of view of seroprofile but still you can have problems so as seroprofile you have to use them but it doesn't mean you have a clean bill of health and that you're safe you could be sometimes still have surprise and of course primers were now developed for other species for mouse and dog and mouse we have here I mean the person which was responsible for the mouse STR marker primers now we have those profiling cellosaries so that's an important part of what we store and we have out of 150,000 we have you could say it's only 1,600 I said it's already 1,800, 600 sorry cells you have to see that this project is really prevalent for cancer cell line and seroprofile has a lot of other cell line where the problem of contamination is a little bit less prevalent so we see this as increasing and we tell people if you authenticate your cell line and because the journal or you think rightly that you should do it don't hesitate to send us I mean source profile and to integrate in cellosaries and we say we give all of the different sources of where seroprofile comes with conflicts if there are different source give different results which can be due to errors or to genetic drift loss or gain of an allele and again we'll hear about that today and we have a tool for that so cellosaries is the database but it has a number of tool one of it is API which is for programmer but one is for you as a life scientist using cell line if you have a similar done in SDR I mean on a cell line you can search against all of them in cellosaries with either human mouse and dog markers I'm not going to go in all details there are three different algorithm you can use I would say if you're not a specialist on it just use a default algorithm and you can also if you have many sample in industry I mean that's relatively the case I mean you just input all your sample in one batch and there is an API and it's very fast it has an interface you can either load a file or put your profile and say it will report to you and in red I mean if you have a score and you see that it's 100% identical to something in red danger it means that you basically are hitting a cell line which is contaminated or even if it's this is in fact a cell line already known to be contaminated and but if you have something which is 100% or 95 or 99% to things which are not I mean annotated as contaminated but it's not your cell line it means you have a problem but you may be the first one to incontrocess that you found out that your cell line is not what you thought but is one of the cell line it's a cell line which is best hit so always try to authenticate your cell line of course you can click and see what's your cell line and get back to Salazarus about it now that would be my last three slides I mean we want to add things to Salazarus constantly I'm not going to do a list of all the things where two days of a scientific advisory meetings where we discuss this and most of the people I mean the gaming talks this meeting were in this scientific advisory meeting we had for those two days here in Lausanne and but one of the things in terms of tools that we want to do is that we have a new full text search which will include in it what is in the API so that people can do field search that you can not just type something and do a full text but you can restrict the search to also different field and for and since I would say it's mostly for people in industry which are really using it a lot not really bench scientists we want to have an RDF version to make the Salazarus I mean compatible with what is called semantic web and integration in term of resource in a seamless way using what is called triple stores not going to describe this people in the industry have probably heard so much about it that they're fed up of hearing about RDF and Sparkle people in lab science you don't really need to know about that I mean what we will have at one point a Sparkle query system but we will try to hide it behind something where you can do queries which are much more friendlies than a Sparkle query where you need more or less to have really knowledge of the resource to do another thing system for newly discovered contamination with certification but in terms of content one thing which we felt and people really told us was important is information on susceptibility of cell line to virus think of the COVID pandemic at the beginning it was really a mess to know which it was a mess in general but in terms of the research what cell line could be used to grow SARS-CoV-2 and so on and immediately people tried immediately there were I mean papers and reprint and so on so here for that we even didn't wait four months to do a release we put a page which is still maintained I mean to tell people you can use this cell line you can use this cell line and so on here is the reprint but the thing is yes SARS-CoV-2 was an emergency and people had to do but now we have to be prepared to all of the different type of emerging virus and then it's useful to know what are cell models you can use which already exists or maybe no cell models have been developed for this particular virus and that's also important to know and we're also maybe going to annotate distributed organoid but we have to wait to see where this field is going and that's more a structural point I mean more for people linking to cells or other resource I mean Kristoff showed you can see the age of the person from which the cell line was obtained it was illa 30 years and six months I mean yes that's useful for scientists you read 30 years and six months you know what it is a computer okay well now with AI I mean can translate this but people want more what is linked to a developmental ontology which tells you know 30 years is this and then can do computation on it you can say then give me also cell line from teenager give me cell line from people which are I mean between 20 and 40 and so on and just in different species so we're going to add this with this last two slides I want to thank a number of people which are listed here Pierre-André Michel who is in fact in the audience here and who is I mean going to he's at the back because he's going to do I mean help people asking questions by giving you a mic so that's an explanation already for my talk and also talks don't short questions because we're taping those talks for a later date to put if everything goes well with the captation you never know what I mean in advance of things okay so as the talks are taped some question also but only if you speak in a mic so raise your hand for questions and it will come this is for every talk you're going to have today and basically Elizabeth Gasetiger which implemented cellulose on Xpasi, Kami which created a number of educational videos and people which are here as speakers and she organized all of this symposium you see a number of other names which did the classroom tool Alain Gatot, Lydia and some people which are here in this audience Amanda of ICLAC which has given so many advice on to get the cellulose evolves so I think she's external user with the most input on what the cellulose became so very big thank you Amal and all of the individual scientists and last slide so you can email I mean just a similar address you don't need to copy it or thing like that on so any page on top it says contact click on contacts there is a contact form and type whatever you want what are new data graphs I think are not okay or you know whatever you want but do that contact us and for news on what's on the cell line we have a Twitter and our ex account and a master don't account and but we also so basically you since you can also follow to see what is new but you can get on the website to see if there is a new version and some but people some people like to follow it on Twitter or masterminds okay with this I finish my talk and we have time I think for a few questions thank you Amal's for very instructive and enlightening talk I have just one quick question on cell strains which are used in vaccine production like with the 38 and MRC 5 they're not transformed and not primary cell in a narrow sense but they are widely used in the industry are they inside of the system when I said it's immortalized cell or finite cell line when they're well characterized and so it's like I mean a very good example of well characterized finite cell line I mean MRC 5, W8, 38 and all of those all of the fibroblasts from Coriel for genetic disease so we have a lot of finite cell as long as it is characterized I mean you know with or without the rest of our profile sometimes but at least that's our you know publications that say it's maintained by an organization distributed in the case of MRC 5 and W8, 38 so it's at least 10 cell line collection which we distributed and so on even so it's finite maybe just another comment I was taught by Len Haeflich to be very clear about terminology the terminology for lines is immortalized cells and strains is for none so he was very adamant of saying so Wister 38 which was the one he created was not a cell line but it's a cell strain so I completely agree from an historical perspective but if you look in Medline since the 1970s, 80s there's been only three or four people where people call this cell strain so trying to force on a nomenclature which is not used in last 40 years or 50 years seems like a type of battle you cannot win so in every cell collection they're called cell line in every paper they're called cell line yes you're right they should have been called cell strain but this didn't catch up so let's be realistic it's these are finite cell lines and A-K-A cell strain I think I'm Constantine of Alida and I'm a scientific editor at the International Journal of Cancer we are using cellosaurus every day and I'm very grateful that it exists I would like to know why have you decided not to include primary cells is this something that you would consider in the future well no I mean there's a fact about that on the web page so you can look at it but I was explaining what it says on the fact the problem is primary cells are sold by different places are not I mean uniquely defined so basically we punish them often with different people I mean individuals being the donor sometimes they try to keep that it's same sex and same of course if it's I mean same tissue or things like that but it's not a product which you can say that an experiment done you know eight years ago with a primary cell from XYZ is the same as an experiment done now so it doesn't have you know defined cellular entity it's not been cloned it's not been it goes it's finite which is not a problem to go back to the last question but it's not well defined and so on so if somebody did do a primary cell and then which would and said stop the catalogue when it's over and things like that you could say yes we could say this becomes like a type of cell line which was distributed for a while and now they don't distribute anymore but most of the primary cell you can get now from different industries they change lots and from one lot to another one you would have to define it so we would have to capture the lot number and it's a mess if you go to those different companies I mean say they give you a lot number but they don't tell you if this new lot number is really from the same person or not I mean this information is rarely present