 to attend to this webinar about VALZone. So I'm going to share my screen and start over this. Okay, so we're going to, I'm going to show you how was created VALZone, what is inside the different parts and when can be used, how you can use the site to access knowledge to other data, okay? So the first thing I want to talk about it's what we are talking about is data information and knowledge. So basically data is what is made by experiments, raw data, for example, sequence, raw data from sequencers. These are transformed into information by analysis tools. So you get, what about the raw sequence in the Clotid decision formation because it could be wrong from the data sometimes. Then we can have some hypotheses to understand something which is a kind of belief and when every scientist agree on something we got knowledge, this is scientific knowledge which is much smaller compared to the data information and it's more designed for human brain whereas data is more designed for computers. So VALZone will try to put links between these things and especially focus on knowledge. So knowledge in virology, when I made my studies just to know I am a virologist, molecular virologist by training, made a lot of like 10 years of laboratories before getting to buy informatics on databases. So most of the knowledge reside in books and in publications. There are a lot of books and these books are not often updated often when it's printed it's already out of date somehow because it takes like one year to print. The access is kind of restricted, it costs a lot you need to have the libraries to access to it and it's kind of the same problem with the scientific literature many articles are behind the paywall and it's not easy to access for everybody. So the VALZone project aims at digitalizing the VALZone knowledge those we find in books, in publications and everywhere so they put it together in a free access website which is easy to update contrary to books we can update constantly the same website and it complies with the fair principles so the fair principles are made for data and data in digital data which is findable, accessible so VALZone is on the web and free to access interoperable which is about VALZone gives links to many other data sets like JinBank, VALZ, ICTV, whatever and reusable so I put that into parenthesis because many of knowledge in VALZone is still in HTML so not everything is reusable so far. So the talk will be in two parts so in the first part we describe kind of VALZone classification how it is put in VALZone so VALZone fact sheets for each genius and families that is the core data in resource and the molecular mechanism pages in the second part we'll make a pause after that in which we can have questions and discussions in the second part we'll talk about the challenge of pandemics and some specific resource like for SARS-CoV-2 multiprograms so VALZone is about making bridge between textbook knowledge and sequence database so you have a VALZone with some kind of taxonomy context with different way to access the data there are fact sheets for each kind of VALZone which we'll describe later in details and these give access quite easily to Uniprot because I am working in the SwissProt group which belongs to Uniprot consortium so also doing Uniprot annotation so it's easy to annotate both resources you can access to analysis tools to NCBI Uniprot and so on so we make a kind of link between all this data the VALZone homepage is like this you have taxonomy access which is a Baltimore classification the kind of nucleic acid present in the VALZone in the VALZone you have some molecular processes you have replication cycles a few other data in there so about classification, VALZ classification is quite a complex thing because we don't have like cellular organisms, ribosomes to classify them all and cellular organisms you can make ribosomes sequence that can be aligned and you can make a very good classification and they all have one origin in life but for VALZone it's not the same for example if you see the tree of life there which is dichotomic tree except for two parts which is the mitochondria and the chloroplast this is quite dichotomic but it's simple in the way it works but for VALZone it's completely mixed up more like EV there is not necessarily a single origin for VALZone they can mix each other, microcombination and exchange genes so it's quite a mess and difficult to classify moreover the classification changes as we learn there are a huge amount of VALZone this is the most common genetic entity of the planet there are more like 10 types of VALZone for every cell on the planet and the more we sequence and discover them the more we add to phylogenetic trees and the classification is changing and we continue to change for a lot of time for example here you see we classify APAC viruses and PST viruses according to this phylogenetic tree and the whole rest is called flavivirus which I presume when we have more sequences maybe this big part will be split into other genes in general so anyway the most used classification in viruses is the Baltimore, this is David Baltimore which is a Nobel Prize which I propose this classification so it's about the genetic material present in the venous and the virus particle so all the cells are double-strand DNA organisms but viruses are genetic entities that can be single-strand DNA, double-strand RNA single-strand positive or negative RNA it can make some kind of reverse transcription go from RNA to DNA and a mix of DNA RNA as well so all these Baltimore groups allow for functional understanding of what would be the molecular biology of the virus for example the double-strand DNA virus could go into the nucleus and pretty much work like a genetic entity of the cell but a RNA virus would be completely different it cannot be replicated or transcribed by cellular proteins so it will have a lot of different needs and means to just make the messenger RNA and have the protein so the Baltimore is to me a good classification because you have got an understanding of how the virus works already so Varzone is following that for the main part I had a new one called Circular RNA which is for like hepatitis delta these are small viruses they are modified in negative-strand but there's so much difference so I kind of added this to Baltimore to be consistent in Varzone so we see in Varzone we follow the main taxonomy official but we add up some things some layer of informations so this is a view of the you have these buttons in Varzone you can see here we have negative-strand RNA viruses in which you have all the classification made by ICTV which is an International Committee of Taxonomy for Viruses this is an official committee which gives a classification of viruses every year to make an update so everything equal to order of family and genius are from ICTV and we try to keep that updated so it's not yet finished this year but in red it's more like a functional classification we made ourselves I will tell you more about later so ICTV now it's pretty much classifying viruses especially RNA viruses by the polymerase so this is a phylogenetic tree and you see according to the trees of the polymerase you get different types of families and groups it's very convenient because many viruses get polymerase but not all but you can still classify the sequence for GeneBank because GeneBank needs to classify very quickly when there are new kinds of viruses and to know where these viruses fit in the taxonomy GeneBank is the head of taxonomy for databases and every sequence needs to be classified so having the polymerase to be able to make phylogenetic trees simplifies the work for that and it's very convenient also there are some viruses that do not encode polymerase that could be really classified this way in the early days we were classifying viruses according to the capsid but also don't have a capsid so as you see viruses are so diverse it's difficult to find just one perfect way to classify them so this polymerase for example on classification is pretty good for for ACTV and GeneBank and works pretty well this is a work by Andrew First made some HMM model to get the different for the different polymerases and see how several different classes of RNA polymerases that really distinguish between each other and we use that to add an extra layer of data for example for the capping of eukaryotic viruses there are seven ways to cap the messenger RNA so in all the cases the cap has to be the same molecule made by the cell but viruses will use completely different pathway on antimatic activities to do so and if you cross to the polymerase types you see that you get matching so it's kind of like the polymerase of eukaryotic viruses have evolved with the way the cap the messenger RNA so it allows me to make some kind of extra classification for example this is a positive strain of RNA viruses so now we have released a kind of classification based on the capping system I can show you in viral zone life which is kind of better so for example the corona viruses have a nidovirus type polymerase and the capping which is their own types with their own kind of enzymes and the capping system is pretty much shared by all these viruses so some may have some differences and we have added a new kind of feature which is kind of inspired from Wikipedia in which in the leaks you can just by hovering over the link have some information some summary and a picture so it's not yet available for every viruses just a few of them but we will add that much more later so for this you can see here you have the Baltimore classification you can click and you access to the classification of different viruses reverse transcribing the best on the RNA so this one have a capping system of course the DNA viruses don't have necessarily a special capping system compared to RNA so we can do that okay so we go back to the first fact sheets are really the core part of our zone in which for all kinds of genus families we get a fact sheet that describe the knowledge you need to understand a little bit more about this so it's the idea that for people having the basic knowledge of molecular biology of cells that you learn in school in not specialized course molecular biology like what is the right bosom transcription application blah blah this is what you need to know to understand a bit about the virus we are talking about okay so again if we go back to the viral zone website let's take the alpha influenza virus you see you have a variant description so these are pictures I will tell a bit more about later but the pictures in the viral zone are completely free to be used by everybody without even having to be asked for permission it's a creative common CC buy so we just ask you that if you use the picture whatever you want you cite the source but that's it okay it's completely free so you have a variant you have a kind of information on molecular biology with a picture of the genome so it's a segmented virus you have eight segment of RNA and the protein and codon which is each segment some kind of information on gene expression with for example capnatching there which is a way of capping the RNA a little description of how it looks replication where it's the nucleus on the cytoplasm of O-cells and links this is where the viral zone is interoperable links to NCBI specialized resource even in ICTV here we have next trend for phylogeny we have a bit more taxonomy some reference trends so we give a way to do the most annotated reference trend both in NCBI so the genome is annotated it's in a ref sec so this is a ref sec different ref sec of influenza in NCBI so ref sec in reference sequence that NCBI own their data and they are updating operating them so this is the best standard of annotation and also into uniprot so this is the same virus proctorico in uniprot which is also entries have the best annotation because we pretty much add most paper induce entries to keep to keep few entries for reference because there are so many virus sequence if we annotate all of them and put papers everywhere it would be difficult to get a big insight and there are few information on the host cell receptor described and ecology and disease for example antiviral drugs which are fused or experimentally developed against it kind of transmission and disease but it's very just to indicate what it does this virus so this is a virus fact sheet the virus on pictures so there are hundreds of pictures of kind of variants on genomes as I told you available to not only academic activity but also for any activities even if you want to to make money with it there's no problem just cite it and you can use it the way I do the variant pictures I've developed that over the years I'm mostly using cryoem when it's available so I can draw the variant with the right size of the membrane pretty much where is a cap seed of course the nucleic acid is completely schematic because it would not be visible on this side of the particles and I try to put the proteins in the shape we have from 3D structure around and to feed that into the image I'm using the illustrator so I can show you there so for example this is the flavivirus so it's vectorial pictures and you see well I'm drawing with the back some real pictures of cryoem I'm trying to make variants as much realistic as possible so these are schemes but trying to be realistic in terms of proteins, membranes and cap seeds okay so over the years we have a lot of different variants which are drawn from typical bacteria T4, denovirus, baculovirus papillomavirus herpesvirus tabacomozoic so there are a lot of different variants to bronze in our zone if you like these are the main human viruses and with put in size so you can see you have some very small viruses like 20 nanometers parvovirus the hepatitis are pretty small presumably to reach the hepatocytes papillomavirus and you see the biggest are the poxvirus with variola but also multipox right now and hebovirus is extremely huge but actually it are both several genomes of hebovirus for example so this picture is available directly on www.invarsion.com so to draw all these capsids we had to cope with the special assembly of Ecoserol capsids and we put special pages in Invarsion about it so again I will show you up in the website so if you go to the home page you can access to variant and here we go with all the kind of Ecoserol capsids so here the hover type is implemented so you see you got one so the capsid are labelled by what we call the CASPA and CLUG system which are two authors that presented the system a long time ago and so basically there is a calculator made on how you can go from one pentamer to another pentamer because Ecoserol are made of 12 pentameters and you can add a lot of examers to it to make bigger capsids like here for example you have examer added and the more you add examer the biggest is the capsid this one you see you barely see the pentameters so and CASPA and CLUG system is based on numbering how much you can go from one direction to the other and you end up with 169 which tell you how many units there are in each part of the Ecoserol so this is described on display for every kind of capsid and it's also linked into for example this is the adenovirus fact sheet there so it tells you that the capsid is from this symmetry and you can pop up and you can access back to it so the whole structural virology about capsid is a bit detailed in Valzone and made as easy as possible to access there are Val genomes as well so as a standard we keep color code the system for example the capsid made green in Valzone the renair made green as well the polymerase as a gray and the surface protein are yellow orange and so on protein we don't know pink often so in this way the genome the variant have pretty much the same colors for the protein to view which protein are coded in which part of the genome for simple genomes like many arenas for example viruses and ok and then we have besides all these entries which is based on classification of genera and describing what kind of what type of viruses for every types we have the molecular biology pages which are more about general knowledge in biology for example we have entry processes so you see there are described every kind of process that virus used to entry into an eukaryotic cell so the attachments can be fusion a plasma membrane can be endocytosed and you see different kind of endocytosis a microbinocytosis you can make the virus can make a pore directly at the surface or just enter by connection between cells without really going out of the cells like syncytium nanotubules or in fingers for example then once the virus is inside it can go into the nucleus different ways to do that or make replication in cytoplasm of the nucleus so all of these things are clickable and give you access to page describe a little bit more the process of the virus which is involved here we have also a description on replication transcription and translation processing you see that for example the cellular organism for replication just using bidirectional replication so the virus have a whole diverse way of replicating the nucleic acid which are also diverse because you get reverse you get RNA the first time and so on all the translation system used by virus to modify and to encode much more protein that could by a classical one messenger one protein way so everything is described there here we have the virus exit as well so the same system virus after making transcription replication how the nucleic acid will exit so for example getting out of the nucleus you have three ways and all these ways are described there are various pages I'll show you before so they are indicated when we know them so you know that the virus will follow this path or this other path in the exit, in the entry in its molecular biology we have also pages about both various interactions so these are extremely important because that's where cells are kind of fighting viruses viruses are making their way through the systems to infect us so you have for eukaryotes you have a different team of pattern recognition receptor of course adaptive immunity but also you get some apoptosis, autophagy a lot of things could happen on virus have different means to stop it or to modulate this for their own advantage of course so molecular process page once you click on that so the picture I showed you before is kind of a menu of the molecular process of different type entry entry application and so on in a process you would see some pictures often describing what it is about for example this is a budding by exosciosis so there are different ways of budding in the Golgi in other membranes in reticulum and way to for the particle to exit so for example you see the Bounia virus exits to the Golgi the person box actually traveling through there Arterif, Lavi, Coronaviris spherules in the red in which the bud go on to the Golgi you get in this process some kind of description you get links to all the viruses that are known to use this process we often get papers which show you where the information is taken from because most of this information for each kind of viruses is taken from literature from publications because when we annotate a virus we ask ourselves what are the best ways to use and this information is available many in publication from experiments so we try to find this publication and show at least a few of them we cannot show all of them but so that people get links to the real data it came from also this is another molecular process penetration in the host nucleus so you have different ways of first to penetrate on different kind of virus do this several ways against you get papers, description of viruses and in their own pages you get the link to this page like it is described here for example in the influenza virus fact sheet you get it's mentioned that the cellular mRNA are made are cleaved by cap snatching for the virus to make its own cap and this is the link you can go to the cap snatching page see what it is about and in the cap snatching page there is a link to auto-mix over the day to which the alpha-virus influenza belongs to ok so the data is kind of circular we also have some replication cycles so I can show you that now it will be more interesting so replication cycle is about putting all this process together into one picture so for example if we take the ebola virus ebola virus is a big particle, it is mainly entrained by micropinocytosis so you see the replication cycle is a little bit interactive so you can click on it and see more detailed explanation of what is micropinosis for viruses there are also some links to database like if there is a keyword in uniprot about this process you can access to all the proteins and trees related to it so all the viral proteins that would do this process would be involved in it a lot of links to go back and forth so you see the replication cycle how the virus enters how it end up diffusing at the end of the membrane with the genomes make it towards transcription replication some kind of special things like editing how it blocks some general sensors and how it after replication into viral factories it will be released out of the cell by being pushed by actually using cellular acting to push it outside because again it's big so it's not diffusing hot so everything is explained there you got a link to the fact sheet if you like in which you can see picture of the genome and everything so all the data is kind of integrated so replication cycle we add them up as we can so not everybody have them but the main one of course you see the coronavirus how it enters in red this is how the fight the host defences so this is available and so for prokaryotic viruses so there is kind of different things for prokaryotes so for bacterial viruses we have a special menu that will display how they work because it's completely different cells completely different antiviral system so we kind of detail that as well because we handle pretty much all kind of viruses, bacterial karyotes and so on so going back to the presentation ok so viruses is under constant development we have about 500 fact sheets now and we are adding more as we discover new viruses actually I didn't tell you but many viruses are now discovered by metagenomics and it's a very nice thing because it's a way to discover a lot of diversity we were not aware of before for example you see the phylloveridase the family of us which Ebola belongs to we got a lot of genus genera which have been identified by metagenomics and these are put into parenthesis so why it is because by metagenomics we have a few real data we have the genome, we have the sequence but we are not sure of the host because it's for environmental samples we are not sure about what kind of cells it infects on the biology of the virus we can infer biomology from the genes but we don't have real data of the virus because many of the data have never been cultured actually so we have a few knowledge to put on but we can make by a lot of weight gaps the messenger will presumably be the same at all mononic avialis so we can add a lot of layer information but it's kind of unsure so we put that into parenthesis and you see here we have many new virus families have been discovered by metagenomics in this mononic avialis or some family have been expanded a lot so RAPDO viruses many by by metagenomics ok so we have all these fact sheets we are about 200 pages of the specific velour molecular process knowledge about molecular biology of viruses and the website is visited by more than 30,000 unique visitors every month all over the world so it's kind of a success we made that because when I was a molecular biologist at the lab I wanted to accept a lot of information that was difficult to have for so I kind of wanted to make this site to deal with that need and seems a lot of people in the world had that need and it was pretty much used for that so now we are going to make a pose for questions and discussions before getting to the second part which will be more the challenge of new epidemics pose and specific resources that are created so to finish up with the fact sheet description I wanted to show you also little tabs in each description of viruses here we are in the VISA virus to which rabies virus belongs to we have protein by strain, by name and obviously interaction so let's see what it is protein by strain gives you so these are links to uniprot data so we are annotating uniprot so it's easy for them to under this you have by strain in which for this strain a strain and bad pisaviruses isolate you got all the proteins and you can link to them, you can retrieve them to nucleic acid there are ways to interact with this data so these are protein sequence in uniprot and you see there are five genes in these viruses so for each type of viruses you get a series of genes belongs to each specific virus but you also have protein by name which means that this time proteins are not classified by taxonomy but by the type for example all the glycoproteins are put together so this is for example all the glycoproteins of lesaviruses in uniprot or the polymerase of the lesaviruses in uniprot and so on or you also got some tools a new tool is aligned because now these are similar proteins it's possible to align them you see it's indicated whether these proteins are reference strains whether there is a PDB the 3D structure so you can see in the family that there are several 3D structures of glycoproteins so if you want to get access to this data you see that there is four 3D structures in glycoproteins just one in polymerase one in matrix and so on so you get more 3D structure and this so it allows you to get access to one site on this data you can align this and there you go to uniprot website so the data are injected into the uniprot alignment system and you can align 24 sequences so it will run I made it already so we save time the uniprot alignment is made you get an output like this in the uniprot website in which you see all the sequence align the conservation the color is based on the conservation of the amino acids and in red you got the sign signal peptide there and you get a lot of tools you can use like these you can have a simple tree made of these alignments so you can see the relationship between the different sequences you can have some outputs you also can see the notation which is made in uniprot for example if you want to get access to glycosylation you choose glycosylation and you see for example this asparagine is glycosylated into this glycoproteins but you see others don't so it's a kind of niche glycosylation but others you can see are presented pretty much all the verses so you can have access to alignments of sequence a few tools basic tools to to play with it okay and also the interaction we have made some some I mean some main interaction I've been described for this but it's not for all the verses so it's got an inside not a few interactions that the verse is doing and described with publications, links to PubMed and the molecular biology pages about it okay so that's what I wanted to add