 Now, we are going to speak about the second part, which are the COVID multiposs, the challenge of this epidemic supposed to database like Ferazone and what we did about it. And what we did actually are very specific resource for SARS-CoV-2 multiposs that were issued for HIV, HPV and some of the most important virus in terms of human impact. So the SARS-CoV-2 virus is an RNA virus, it's a 30 kilobase genome, it's the biggest genome of RNA, you get RNA viruses, about 28 proteins and one of the first challenges is when the sequence came out to annotate the proteome actually early January 2020. So the sequence came out and was available in JinBank pretty quickly actually, they were able to update and put that available very quickly, but for it to be available in Varzone could be as pretty fast, but in Uniprot you see it's kind of a problem because there's a lot of program and complexity of annotation on proteins, which is not a nucleic acid because nucleic acid pretty much find the open reading frame and it's annotated where for proteins you get thousands of things you could say about single protein when you know how to analyze it. So the first problem was to name actually the open reading frame because the literature in the polyprotein of the COVID you have one polyprotein which is in two parts and it's actually assembled by Retro and also proteins which are assembled by subgenomic RNAs, transcriber subgenomic RNAs, some were called NSP for non-structural so NSP867, but the other was also called in Dr. NSP because they were non-structural so you get NSP6 which was this one but also this one, so we agreed very quickly with the scientific community that this should be called open reading frame 126, 129, 10 and this would be called NSP so there has been no problem. So this was the first thing we had to do then we have to be sure of the gene model of the virus and you see for example in gene banks they put the open reading frame 10 but we find no correlation, nothing similar in other corners, it's very small and there is no subgenomic RNA so we did not annotate it in VALZO or in Uniprot but we annotated for example the RF9 which will presume to exist and turn out to exist, RF10 turn out to be not transcribed, not expressed and some people expressed it and find some would be functions but actually it's not made at all in the virus but these open reading frames have functions actually. So we made this work before putting in VALZO then the virus mutated a lot you know all about the variants so mutations occurs by replication errors but also by cellular editing enzymes so the cell have a system to edit the RNA in the cytoplasm which is kind of antiviral system because if you change the sequence of a viral genome it will end up to be non-viable but if you don't kill it you can make you stronger like you said Friedrich Nietzsche and that's what happens in the COVID because we see from the transmission that many mutations for example in Omicron or Delta came from this system so the virus exploits actually the defense of the whole cell to have more flexibility in the sequence and the advantage is most leads to spread more and escape immunity. I will skip that one. So the scarcocov 2 variants are nicely described for example in Nextrain which is also related to Swiss institute of bioinformatics through making a very clean and nice phylogenetic analysis of all the sequences I never which is millions now of sequences and you see the different kind of lineage for the last one VQ1 XBB which are circulating a lot in Europe and in the US but also a few others in East Asia but Delta beta stopped to be circulating for example. The big issue there for doing a viral resource for example is the name of all the variants because Nextrain is using this naming system which is very convenient so we got 20 for the years and ABC AG and so on. So for example what we call the Delta is 21, 21 IGA Omicron is called 22, 21 depending on the year it appeared. So this is one nomenclature of of lineages. What you have to know is that international committee of taxonomy for viruses is dealing with general taxonomy but not below the species so every naming system of lineage whatever you want genotype serotypes named under the species it's not covered by any official means. So everybody can do what they want and that's what they do. This is a pongo which is a very important resource for variants in coronaviruses. The way they name the different lineage in Omicron so this is the BA45 so don't read everything. It's quite a mess and this is BA2. So you see it's pretty completely different from what we use into into Nexpot and it's much more detailed but yeah it's a bit confusing if you look at it at once. Then you have GZ which is also naming differently a lineage with different systems so at the end of the day we've got pongo, GZ, an extra way of naming then came WHO which gave names like Alpha, Beta, Gamma, Delta, Omicron which is a pretty good thing it's easier to under and make a kind of official way to name the variants because all of those are not official it's resource doing what they want but WHO as an official posture can really give insight into the more current way of naming. Then for example to put the variants into into viral zone and to provide a sequence but we need a sequence we need a reference sequence and that's also a big issue because WHO and very people give names to lineage but they don't give sequence and what people need very quickly if you say there's a new microvirus people will have to know at least one reference sequence so if they want to study it if they want to make a PCR to identify it in people and anything you want to do now is linked to the sequence so you need to have a reference sequence but in a lineage many mutations occurs not all of them are present all the time so we kind of choose to take mutation that are observed in more than 80 percent of sequences so there is there is a resource called Outbreakinfo.com which is linked to Outbreakinfo.com which is linked to GZdata and they analyze the sequence and give you even the the rate of sequence found into this lineage for example in peak it's like 70 percent of sequence have this variant it's 100 percent of the sequence have this variant in the Omicron for example so this is a way to give access to some some mutation landscape of any kind of variants and it has been quite a difficulty to find reference sequence so at the end of the day we have we make an agreement between the BVBRC and US Viper people which provide also reference sequence and the next trend and other people in Los Angeles to agree on the same reference so every resource we display the same reference sequence for Omicron, for Delta, so on so it's to make things more comprehensive for for users so we've made a resource I will show you in our zone about coronavirus and then variants page with the described variations links to pangu covariant outbreak next trend links to reference genomes and proteins so that people can get really harmed on the sequence not just any rhythm mutations in the literature okay so just showing you up that in life so coronavirus resource we made so we got the the basic fact sheet which is knowledge of the virus with a genome we have a little bit more data on the protein it's made polyprotein and interaction with small brain this is kind of simple but a lot of people walking in biodiesel jumps on walking on coronavirus so these people needed a lot more information to work with it so we expanded the the usual fact sheet for this virus so we have more inside description of the protein of how it's made the transcription in the virus making subgenomic RNAs a special sequence of how it works and your link to paper describing if you really want to know we made a coronavirus life cycle of course I showed that before we have links to the proteins in uniprot and covid resource i would tell you a bit more pdb as well because pretty much all resource reacted to to to give more information so we have a kind of simple i would say interactome there which we have interactions shown for SARS-CoV-2 other were shown at first for SARS we would say one for 2003 so the different attraction for which we really have a good function okay so these are by similarities but you are also some large-scale interactome available or some resource like perosource net which gives the interactome with thousands of proteins that could really represent real interactions but we are not sure yet okay we have made some idea of the covid so it's very simple way to show it how it's working time we analyzed a bit the vaccines so what are the type of vaccines for this so in the future we'll expand to many other viruses or maybe make pages on type of vaccines because we have so the new mRNA vaccines are made there are like some mutations inserted into the proteins there so the proteins is modified or it is modified or it is expressed or presented with pseudo reading for example the mRNA you get a dosage of the vaccination global knowledge about it for example the another adenova vector vaccine is made where put the gene of the SARS-CoV-2 how it is modified you see it's more evidently modified in Jensen it is in AstraZeneca for example in activated vaccine mostly used in China right words and subunit vaccine also Novavax so pretty much all kind of vaccines have been tried for for coronavirus and it allows us to make a stupid bit of description and give also for example the Novavax use a lot of more mutants so you get access to this information it's not always easy to get now about anti-viral drugs so on the replication cycle you could find ways to block the virus at some parts and if you block the virus in one part of its cycle it will be dead and you have different ways to do so there are spike maturation inhibitors that have been tried or dosomal fusion inhibitor but it turned out that the COVID-cooled fusion the plasma membrane in some cells bypass this also protease inhibitor there is a polyprotein if you prevent the virus from leaving it then you're okay polymerase inhibitor the virus uses a special polymerase to replicate its RNA so you have different kind of of anti-inflammatory antibodies different kind of ways to to catch the virus and stop it and here I put a lot of description of what main that have been tried because actually there are two sums of molecules that have been tried some of them are approved like this one actually well they're pretty much quickly effective in the country much of them it's not perfect yet but it gives an insight this page is hard to keep updated because it moves very fast and I should time stamp it actually so you will see that some pages in our zone like variance going to it that time stamp so you know last time it was updated because when the information can be changing a lot over time we don't have necessarily the means to to keep the term the path of updating that every time so you know if the page is a bit outdated or not so you'll see that last update is 20 January so it's pretty close to now so the variant page have several things and you have a 3d structure of you directly into the page of the the trimer of spike protein and you can see directly on it because you can see that you are asparagine 540 so this view gives you access to all the amino acids you want to so you see the famous mutation the first one that appeared in covid was p614g you can see what it is so this is a part of the protein which is on the membrane and this is a part exposed to to the to the receptor that will bind the receptor and that will also interact mostly with antibodies and you can see with the alpha gained a lot of mutation all over the place some in a receptor binding um up to the micron if you see delta a lot of mutation as well on this part is many antibody escape this is antibody escape on receptor binding but when you go to omicron you see there is a lot of mutations going on now and many parts of the protein um and the last one is uh it's bb15 which is you see all this mutation so it's kind of nice to have a view on the 3d structure even though what you really need if you want to work through sequences so here's a picture we described on a linear way the different mutation that are presented more than 80 percent to do before because it's a lineage there are many variants into the lineage so we present the the major mutation that are represent the lineage actually but define the lineage so for micron i just put the mutation defining the omicron basic strain and then you have the special mutation for every kind of omicron variants so you have to add up these two to really have omicron and alpha delta gamma and then you have a list on a smart table in which you can have access to the omicrons which are now circulating but also the old lineage alpha b the beta delta and so on on this table you have um link to pongo which is for example kov lineage so you have the lineage in pongo which should be describing okay the localization in the world and everything so this link are convenient you have covariant which is link to next strain which describes also the variants in other ways using the same reference strain that we are fully um your bbrc so bbrc is a us website public made for for study of pathogens or the virus and bacteria inside and the part of us they give you also um a lot of data on not only the virus and their genome but also on variants of of covid so you see you have to have a representative strain link which is the same bar zone defining mutations um so here in bar zone also show uh so i made it drop down menu because there are a lot of mutations it's easy to scroll down the name synonym of so we call that the way using pongo uh but it's 21 key k k sorry in next strain or it was named previously like this which was a bit complicated now it's b1 you have access to the genome here so the kind of reference genome i told you we struggled to to agree on and then display on different resource so this is the gene bar country we call the genome so now uh if people need to they can really access to the direct sequence of um um one of the best reference we choose we choose um the reference at the basis of um of the outbreak mix so at the basis of the tree so that it's one of the first so it should cover um better the whole diversity that will happen later because if we take a later diverse sequence it could not cover all the lineage uh and also get access to spike protein because it was miss most study protein in the covid also from gene bank okay because they are not always present in uniprot i will tell you uniprot takes a long time to to update uh when uh gene bank is very quick so at the time we we make for example xbb1 which is there the spike protein is not yet in uniprot so we'll interact with them okay um so this is for the variants let's go back this um we also put a notation of variance into we push it into swiss plot so in the swiss plot code of spike we have line of variance with the name and all the code so it appears in uh in the swiss plot as well uniprot and people can really pass this data directly from the database if they want to because it's there but again it's a challenge to release quickly in uniprot um so to to make something about covid when the covid pop out in in january seconds of covid pop out in january 2020 we make that notation very quickly analyze the proteome and it was like two weeks we are everything ready but uniprot will not be released at a time before april 2020 which is pretty long way when the new virus is eating the world so uniprot decided to make a special part of west side which is an early website to see the entries so uniprot is a very big boat with a lot of of annotations who are program working at every release so we could not really steer the boat toward the covid but we could make a sister website just for the covid in which we release just these entries and that's what they do and it's a covid 19 uniprot website in which you have the latest annotations and they are they're at every release of uniprot they're released right away so they are like in every two months they're updated which is pretty much the best we can do with all the complexity of uniprot but as you see the the new pandemic posed a new question to speak database is how do you update how can you provide the right data that people need at the right moment i think maybe we could do it better if there's a new one but we are learning so um now there are 11 millions of SARS-CoV-2 genome where it was in main or now it might be like 15 millions i don't know deposited in gz and also it's a bit complexity because people deposit gz which is not a public public database it's owned by a foundation and then some are pushing nsdc database which is a public gene bank gene bank dbg share in the world so there was about five million which is already a lot actually so you see the number of genome submissions per country quite it's not the same some countries are sequencing and submitting more than others in in gene bank i mean and in gz also there are more submissions all over the world okay Switzerland actually is submitting to both database because well um database made by swiss city to buy informatics that gather every second for swiss around and push it into gene bank on gz so everybody's happy about it okay now about the multipox virus so again we have a new emerging virus which is multipox a box virus we did not add a new man since the viola was eradicated so it's much more complex virus you see it's a DNA virus one of the biggest feeling we know already you see the size compared to multipox particle it's much much more bigger and it's a 200 kb genome and about 200 200 proteins encoded by by it in a complex way so we have made a lot of previous work with experts about box virus biology that we have leveraged to to create also a very quick multipox special arch resource so for example the variant of multipox or pox viruses is pretty complex two layers two membranes a lot of proteins so for example sars you get one protein at the surface of the virus okay just one the spike and just another one inside two but here you have more than 20 proteins in the mature variant and even more developed so it's much more complicated the receptor is bound by this complex which is made of maybe 10 proteins and so we we had established that in 2016 so we put that into the resource so that people can understand at least the complexity of the variant of the receptor binding and see it's quite different virus than for example corona virus if you want to study it it's definitely not the same so we have made a pox virus resource that I can show you up so it's there so again the classical fact sheet we had already before and then for example we draw a genome so you see the genome is much more complex there was also a lot of problem to name the genes because in biology where there is no standard to naming genes and proteins at all there's no concentration people are doing naming the virus the sequence the way actually they want so we have a huge problem of diversity especially happened in pox viruses every kind of vaccine have different way of naming their genes they were based of different regions which are cleaved by restriction enzymes and then a number but as these viruses are the genetic plasticities that recombine that change the way of genes are organized we ended up with many genes that were similar completely different names in even in vaccine analysis so now there's it was agreed to put an official nomenclature of genes recently so we use that and this this picture show you the old nomenclature of the new one so it can be easier to link between the literature the old literature and know what we are talking about so you see the genome also it's called the main functions we have actually pretty much all genes in the viruses could be classified some are made for genome replication or transcription so in blue and you see in pox viruses it's pretty much in the center of the genome some for assembly embedding in green so it's disseminated in center or so far as proteins in yellow but also you have osmodulation protein which are presented on peak and you see that the extremities of the pox virus are full of osmodulation protein and this is where the genetic plasticity comes really they can change exchange get new genes on daily genes modify them very quickly in these extremities so as to change easily between host and to adapt to host as well so this is a genome we also have some interactions of the pox virus you see the main interaction described again for for clear function that we identified so you have the name of the gene old nomenclature new nomenclature of PG and where they are actually acting what functions they are doing we have so a list in which you can access to the protein so there are pop-ups also summary of what is in the database you have vaccinia because vaccinia is pretty much annotated a lot of work has been made made in the last 30 years on the vaccinia proteins multipox kind of new in the blocks so we find less less data on the function in the multipox entries and in the vaccinia so put two together because there are homologues and and also the host protein are interacted with here for example etfr and this one this is a vowel etgfr and there are some models also make of go-cabs so it's gene autology made a model that describe on pretty much large pathway of interaction between different things so it's more like bioinformatics things but it could be pretty easy to read and can be reusable by computers and artificial intelligence so it's kind of way we go between the literature of interaction up to the bioinformatics in it so for interactions of course life cycle that was made with expert a few years ago it's pretty much the same multipox vaccinia structural protein i show you the antiviral drugs so there are already existing antiviral drugs you see three most used known and you have two kinds of drugs so again when you stop the cycle somewhere you kill the vase often you need to stop the cycle at different parts because otherwise could evade just one push like for hiv one leads to make strain as well in which you can see the multipox circulating this is the new lineage that came out in the world and so far multipox have been getting down after after infecting a lot of people about the world it's still present but in many countries it kind of goes down the new infections unfortunately did not mutate much so we hope it will remain so okay and so this is what i wanted to tell you about about this this resource you can see that we made also the first one we made was for hepatitis b so i told you the hepatitis virus are pretty small but they're to go through a live around the dotaal sinusoidal cells which are very small holes so a big virus like adeno or Ebola could not really reach the hepatocytes it's kind of filter and maybe it's not a surprise that hepatitis virus are one of the smallest so they can really reach the cell the target so again some some some replications in which you have links to for example the HPV surface protein so more details on these virus again for RPS like for example the variant is pretty complex again like we made for pox where the description one of the best description of variant is very complex and not really sure how they interact but trying to make sense of the literature here and HIV resource also that let's make in collaboration with a swiss South Africa general program some people in studio the oliver for example in South Africa we work with him to do that because he's been working a lot of HIV so again users suspect the drugs some interactions so we get a lot of interaction that has been showed up with a rating of how sure we are which really are the functions sometimes it's not really it's just one paper but some sites can give you like thousands of interactions even like 10% of you imagine interacting with HIV which I don't believe is really true but it's it's what Lashka Interatomics gives you as a result so it's raw data okay and I think it's it's over it's over for this presentation so now it's time to thank people in the team working so Edward Casso is a programmer making very nice work to to create new things pop up everything makes JavaScript and everything's made by him Chantelillo is the first time with which we funded we created the files in the first place and Patrick Masson is also an editor which I put a lot in especially in the direction we are working in the swiss port group and the direction of island bridge and I want to thank a lot of collaborators over the years that help us to develop the the source there are more than that I can see everybody and I thank you very much for your