 Welcome to this webinar on SARS Coronavirus 2 glycobiology. We have the pleasure of having Professor Frederik Lisacek, who will be telling us from Geneva the latest data on the glycobiology of this virus. Just two points of organization. This course is being taped, so please mute your microphone. And also there will be a question and answer session following the presentation. So if you have a question, please use the chat and we'll take them in order following the presentation. And with that said, I think Frederik, I guess the floor is all yours. My virtual floor. Yes. So I thought that I would start with a small short presentation on the topic. I know it was recommended to watch the video that we made with Philippe Le Mercier on the glycobiology, but I thought I would just take a few minutes to repeat a little bit of that video and also give you more details on what we're going to talk about. So I've borrowed the pretty pictures of viral zone by Philippe Le Mercier in the beginning of to explain what the glycan shield is about. So if you consider the hemagglutinin at the surface of the influenza virus, for instance, so it's a multimeric protein that is usually shown rather bare. But if you actually take a close look at the glyco version of it, it's covered by some glycan molecules and there are four to six potential insight, but they change. And this is the interest actually of studying these sugars is that if you go through time, you can see that they do not occur on the surface of that protein at the same place at the same site. And so it's one of the explanations probably for antibodies being a bit puzzled by the recognition and possibly also an explanation for the difficulty of vaccine. So the same even worse applies to HIV, where you have here the surface protein and the glycosylation is heavier, much heavier. It's really this is why it's called the shield. There's no other word for it than just a shield wrapping the glycoprotein, the surface glycoprotein. So this is again a potential explanation for vaccines being so difficult to design for HIV. That being said, we go back to cell surface. And here is the representation again in viral zone for the host membrane here and the receptor or the glycolipid sticking out and you have I would like to give you the explanation for that purple diamond, which is really the core of the representation that I'm going to talk about regarding sugar, carbohydrate, glycane, polysaccharide, you know, all these names are used to designate more or less the same molecules. And these molecules are made of these monomers. So the monosaccharides here, sialic acid, another sialic acid here, a galactose here and an acetylglycosamine, galactosamine, sorry. And so this representation is a bit cumbersome. And if you are not so knowledgeable in the carbohydrate chemistry, glycobiologists and carbohydrate chemists have actually joined efforts to put together a nomenclature. And you can see that you have these different logos to characterize each monosaccharide. This is called the SNFG notation. This is the link to actually have all the details, which means that the molecule here actually looks like this as a cartoon. And you can be sure that the linkage information, which is really important here, is written whenever possible. So this is the sort of cartoon I'm going to show you in the resources that we have because this is widely adopted, at least in glycobiology. I would like to say also a word on glycomics as it is. This last year, a review that was published by the most prominent analytical chemist in carbohydrate chemistry. And they show how this is the overview of the analytical techniques. And there's quite a few of them and not one that is actually solving all the problems to resolve the structure of glycans. So there's no template, as you know, with glycans as opposed to macromolecules like proteins. So a lot of the work, and this is why it's difficult to automate, has to be done with different type of techniques. And of course, the most common at the moment, and especially for high school, although this dash means that it's not completely solved. So mass spectrometry does not completely solve the problem of identifying the glycans or the glyco conjugates, but it helps. And so for my little molecule here that I had shown before, here is a mass spectrum, an example of a mass spectrum that actually gives you the identification of related all glycans that are there and with this one in particular. So the plot thickens if we go to glycoproteomics, because it is actually so applied in the virology. And here is a spectrum and here are the peptides. And you can see the glycosylated asparagine here. But the problem is that you have a very broad information on the glycans. You don't have the very defined structure that I showed before. In fact, instead of that, you have this. So this is flattened completely because you cannot distinguish any of the hexodes. So you have a non-filled circle, so colorless circle, colorless square here for for the hexosamines that are there. And so the only color that remains is the sialic acid, because there's only one type of cell. I mean, there's many, many sialic acids, but it can be represented with that only logo. And so a lot of the data I'm going to show is this type of data, not as fun as this one. Sorry, this is all we got at the moment. So if I go back to my description of the cell surface, this is what we showed in the video. So the actual sialic acid are probably sitting on much more sophisticated structures than what was shown. And the viral entry is actually known to involve the interaction with the sialic acid. However, here we have a totally bare glycoprotein at the surface. And this is what it's all about. We'd like to know what is conditioning from the point of view of the virus, what is conditioning the recognition. So now I can start talking about the COVID-19. And the spike protein is actually made of two subunits, subunit one up to here, subunit two here. There are 22 glycosylation sites that are really spread on the two subunits. And I actually compared the sites with the SARS-CoV-1 of 2003 to see the potential for change. And the color code is here that the yellow are fully conserved, exactly the same sequence for the yellow ones. The green ones are semi-conserved in the sense that the aspargene is conserved, but the rest of the amino acids are different. The new ones that appeared are here in pink. So 74, 149, and 657 are new on the SARS-CoV-2. And two have disappeared, the gray ones. So on SARS-CoV-1, there's a glycosylation site on 111 and on 368 that have disappeared here because the aspargene is gone and here because the sequin is gone. So this is the linear sequence information. And the work over the past few months has been incredible. And this is thanks to Max Christman's team from the University of Southampton. Actually, all my little pretty pictures of glycosylated 3D glycoproteins, viral glycoproteins are made by his group and published in papers from his group. So you can see here all the sites, the 22 sites. So this is the top view. This is the side view. And we have the subunit 1 in gray, light gray, subunit 2 in dark gray. And this is the look of the protein. This is a model. This was actually submitted to the PDB. So we'll see the PDB model. I mean, another view of that in the presentation. So this is really what we are guided by. And so regarding the molecular interaction, I would like to lighten maybe the presentation by talking about Twitter and our most devoted ambassador on Twitter, Gordon Lauck, who has tweeted in March this comment on a comment already on a tweet, which was talking about the interaction between the receptor and the spike protein of COVID-19. And for him, this is simply wrong. So this was actually corrected. I mean, not the fact that it's wrong, but Gordon Lauck was happy to tweet another tweet a week later, where an Austrian group has actually shown the simulation of what would the spike protein look like and what would the receptor, which is itself, the ACE2 receptor, is heavily glycosylated as well. So this is oversimplified indeed. What it looks like in reality is this. And so we have to bear this in mind. And this is why it is important to have the data on the glycosylation on both sides in this presentation. This demo is all about. So just to finish, I would like to say that everything I'm going to show now, that would not happen if I didn't have the dedication and enthusiasm of Julia Marietto and Thibault Robins for developing very nice resources. And I cannot either not mention that this is all with the blessing and old history back with Nicky Packer, with whom the whole story started. So I'm going to exit that presentation and start going into the heart of things. You can still see my screen. Monique, you confirm? Yes. Yes. Cool. So we have really made sure we're talking about draft data. This is not published. All of the references I'm going to talk about today are on bio-archive. They are in the process of being reviewed. So in all likelihood, all these pages will be modified once the papers are out with the proper data sets that correspond. But because of the emergency and because a lot of people are curious, they want to know what's there. So this is actually the Max Crispin paper that we just recently put into the database and this is a derivative, a recombinant protein of a derivative of HEC 293. So it's a thermo-fissure cell, which is not exactly HEC 293. So we made a special page for that. But we also have the HEC 293 here with two references. Also, again, draft data. And you can see that the origin of these is different. Here you have commercially available recombinant SARS protein, whereas the Max Crispin's data is actually native-like protein. So that's a very important difference. Here you have 22 sites, of course, as I said. In those papers here, you have a bit more because they have also all glycosylation. So I will not spend too much time on these today because the bulk of the data is really on asparagines. But you have a serine here, a serine and a threonine here and there that are also glycosylated on the glycoprotein. So you have the number of peptide and this is the summary of this page. But we can actually look and see the data. So we have those boring compositions, so no color, colorless ones. But they and you have all sorts of little boxes. So of course, source is the source of this one. So it's difficult to see that there are different numbers. So I took an example of a well-studied protein that we have in the database to explain why these boxes are there. Here this is actually a blood serum protein, but it's not only in blood serum, it is in plasma. It's finding a lot of body fluids. And you have a number of papers back in 1980 where it actually describes some of the glycans that were found in association with the protein. So you can see that we have in the database most of the time, we have the structures, so the 2D structures with nice cartoons. And then we have the most recent papers with the glycoproteomics experiments where we have only the composition. So the idea is try to reconcile the two. And for instance, if you take this very well defined, so you have all the linkages, everything is perfect here. But there's no site information because this was actually released from the protein. And then after being released, the glycan is free and there's no possibility of knowing which site it's being cut from. So what we offer in Glyconect is to suggest some sites. So we have two sites that are suggested here because in a glycoproteomic experiments a bit below, we have exactly that composition, X5, XNAC4, so you can count, you have four square and five circles. And so here, this is it. And so I have suggested structures, of course, for the composition of the glycoproteomic experiment because I have them in the database. And I have two sites where this composition was reported, which helps me suggest the two sites that are there. So we try to combine the best of both worlds, the precision of glycomics that goes to the extent of really mapping very precisely a structure, but losing the information on the site and using the glycoproteomics, which is more coarse in terms of the fine grain of the data, but yet we have very precise information on the site and we can suggest structures. So this is why you have all these little boxes there that do not make sense with the spike glycoprotein of the COVID virus, but this is an explanation of glyconnect in itself. So you have here the PDB structure. So we use the Leitmoll software for visualization and for the Leitmoll software, there's a plugin that was written by David Senaldi, author of Leitmoll and Oliver Grant of the glycam group to see actually from a PDB record the information on the sugar ligands that are often in the PDB. So this is the structure that was proposed by the Max Crispin team and you can see it in this way here. So as I said, well here you see it with the fog here, the density, so it's a different view. You can see here that, as I said, you have the origin. So this is the paper, this is a bit cryptic, maybe if you see 2917 or 2915, so this is because the references, so this is the reference. And the reason why I show this page is that because we show the identification software that was used to map actually, to match the masses of the intact lycopeptide with that of the database put together by Bionic. And as it turns out, the three papers that we have, these two here and this one, they have used Bionic, but in different ways. And this is also explaining the different parameters are giving different results. So what is interesting is possibly to look at a page. So if you click from here on a page, like asparagine two, six, five, seven, you have the details of all the compositions. So here we have 152 composition associated to, with that particular site. And you can see that sometimes there's only one reference, but sometimes two references. So you can actually see, you can compare the content of each reference and you can see as with one particular composition, whether it's seen twice or once, depending it's exactly the same site each time. And you can actually check the peptides on which it's been seen in the two references. So this is one level of detail further than just the general information in this page or in this page. Then what we do, and I need to explain a few things there is that we also try to compare site by site in the two different datasets or three different datasets. So let me introduce glyconec compositor, which this is the webpage that is actually the default webpage when you get to this URL, you can actually go to not beta. It's just I use beta, but it should work without beta. Let's try and take risks. Yes. So what it does, I'm gonna take a very well characterized proteins with not too many glycans. There are a few sites that are known, some that are unknown, so we have information here. So we have 20 glycans. So if I compute the graph, I'm going to have, I can show it, whoops, not that much. This would be fine. So these are all the reported composition associated with that particular, so they are 20. There's one, the last one. And the connectivity between these two. So let me go back to this list here. This is at site 657, and you have a list of composition here as if they were independent items, but they're not. They're not because, please not show that. I can't get to, ah, here we go. I'm back here. So these composition, for instance, from this composition to this composition, you just have to add one exos. From this one to this one, you just have to add one exosamine. So you can see that they are related, and it makes sense because in reality, so this is a gross approximation of the natural biosynthesis of glycan. There is always one enzyme that is going to add a monosaccharide to another monosaccharide. And of course here, we are totally oblivious of the linkages, which is a gross approximation, but we start with what we have. And you can see that here on this particular alfalfitoprotein, there's a consistent set of compositions. They are all related to one another, except for this one. And you have the label. So each time you look at the composition, you have the potential structures there. You can actually go to Glyconect and see the page for the composition. You can actually go to see the structure in Glyconect as well. So the size of the node is actually reflecting the number of publications that are available to back the information that is in the graph. The number, the label here is the number of structure that you can see here. There are two structures here. There are three structures. And we also have the edges. If you want to see everything that is adding an exos, it's here, everything that is adding a salic acid is here. And you can also look at the root of the graph. So everything stems from this particular composition. You can actually see here where the root is. So in a small graph like this, it's easy to see. Sometimes in the big graphs, you can't see it. So you need that help here. And you have here a sort of summary of the composition and the properties of the composition. And if you want to see all the few costulated nodes, they're there. If you want to see the salylated nodes, they're there, et cetera. So we try to have a consistent representation of the content of the glycom associated with one protein, but we can do it also with, for instance, a saline. So the show cells are very commonly used in glycobiology because the glycosylation is comparable to mammalian cells, for instance, other mammalian cells. And here we have 79, so I've pre-calculated it. And you can see that it's a little bit more scattered. And you have clusters of very consistent compositions linked to one another. So you see the seven structures associated, et cetera. And we have introduced virtual nodes. So the virtual node are actually the nodes corresponding to compositions that do not exist. They are not found in show cells, as far as the data sets that we have in the database reports, but they somehow connect existing nodes. And so we increase the connectivity of that graph by introducing those virtual nodes so that we have a potentially a more consistent picture of, and reflecting probably some constraints of synthesis. And so then you can try and ask yourself, well, why is that one in particular, for instance, missing because it would actually make this cluster far more consistent if it were to be present? So that's pointing at potential missing information that may have a biological explanation or not. There's nothing, I mean, I would like to insist on the fact that we are offering some visualization of the data that doesn't mean that you can jump to conclusion with anything we have. We are trying to provide you with the means of asking relevant questions rather than answering all questions possible. So this is an important point to make. So having illustrated this tool with lots of data, we can go back to the, I want to get rid of the menu from, here we go. So I can go back to this graph that I have here where I said I tried to compare from three datasets the asparagine, how asparagine 17 is glycosylated in the two different cell line, the HEC 293 and the derivative of HEC 293. And so you can see here that with the first data sets, there's a bias towards neutral. So most, so the overlap is relatively good. So it's necessarily, the overlap is necessarily there because they are using both Bionic as a software just to remind you with a composition file which is provided with the software that is probably comparable. And so we have mapped a number of virtual link that are also interesting with regard to using Bionic because maybe you can add or discard some composition in the original file to keep the graph consistent. And here you can see that you have the pink nodes that are actually the common nodes, those that are shared by the two datasets. And so somehow the pink nodes are central, then the red nodes are central but a bit shifted here and the blue nodes are more peripheral although a few are there. So this is the outlook for asparagine 17. So I've looked, no, am I still sharing? No, not anymore, you have to go back to share again. Okay, and again, it's done something here. Are you there now? Yes, we see your screen again. Cool. So, and do you know how I can get this menu from Zoom not to bug me? No, not really, when you're sharing it's on the top, yeah. Okay, so I am now comparing. Ah, this is what it happens all the time because it's so close to what I have to click on. I think you can just grab the green field and move it on your screen but it doesn't bug you anymore. Okay, thank you very much. So I'm gonna grab it. You are right, I grabbed it and it's not bothering me anymore. So I'm here, back. So if you remember, I had 17 here. Now I have 74. So 74 shows a graph which is a bit different. You still have the bias towards the neutral here and you have a mix. So you can see how the red ones and the pink ones, which are the common ones again are somehow not possibly separable in that particular graph. And then I have, these were on subunit one. So this is on subunit two. So on subunit two, this is another site that I've compared and there's a lot of hanging composition that there's quite a bit of virtual nodes still. We still have a good mix of the red and the pink and the blue tend to be more in the outside if at all included in the graph. And a last example, which of course, there's very few glycan in the second set. So it's a little bit bias. You can see that there's 73 in the first data set and 15 in the other. So 66 are original for the first one and then seven is shared and you have 31 virtual nodes. So you really have to get them to piece the thing together. And so this is yet another distribution and another look. And it's the start of your comparison. So you can actually, if these do not belong here, you could put them there. You can actually change the node yourself if you want to put this and you don't want to cross everything is modifiable. You can, et cetera. So this is a view and this is a view for you to give you some thoughts on how to deal with these compositions and how they are on each side. Now, when you take all of the sites, so I've taken all of the sites from the two spikes protein. So of course, I reconstitute the composition file more or less of Bionic. And I have one very specific composition here that is not in, is this actually? Yes, so it is only in the second set. And then you see that again, the common ones are in the center and the peripheral one are all the blue ones. But you can see that the overall distribution is not as before, before we had per site, we had a lot of bias towards neutral, whereas here you have a bias towards fucosulated. And in the second set, you have sort of equally as many fucosulated and neutral. So this is yet another view. And I have also, and so this is making it really very, very crowded, calculated the graph for those three references independently. So in Glykinet compositor, we have an advanced tab where we can actually buy data type. This is a DOI of each of the papers. And so there's 165 for the first paper, 257 for the second and 109 for the third one. So what I'm going to actually show you here is not really the graph itself, which is so, but you can see the differences each paper with the distribution of the composition, which is different. I mean, not very different, but yet different. The bias on fucosulation is definitely there. This one is not so different from this one. So these are the three papers. And this is actually the intersection between the two. And this is the intersection between the three. And I'm going to show you how we can actually visualize this intersection. So here we have an export button and we can select the intersection is ABC, okay? So I want to only see ABC here. No, not the virtual, I don't want the virtual. And I have here the list of composition corresponding the 85 composition. So you can have it in our format. You can have it in the Bionic format or you can have it in the very condensed format. So either way is possible. So you can actually put this in the keyboard and you go to custom here and you paste, you give this common 85 as a label and you add it to the selection. So it's telling you that you have 85 and we're going to compute the graph, but since it's going to take a little while before I compute that graph, I'm going to switch to the other side of the story. So I compute the graph and I go to Uniprot. So the other side of the story is the proteins that are actually listed on the website of Uniprot for COVID-19. And these are the human proteins. I mean, you have of course the viral proteins, but we've talked about it, but there are also the receptors or all the proteins of the host. So the human that actually interacts with the virus. And this is slow. So maybe I should have stuck to the previous one. So here I am going to go to Uniprot direct because there are no cross references there and we need to see the cross references. So this is one of the proteins that interacts. And you see here that in the PTM section, you have a lot of the glycosylation sites that are only inferred by sequence analysis. There's only one publication that is actually a proteomics study. So that is only interested in detecting the glycosylation sites without the structure. So there's no information at all on the structures. So we have some information, not much, but we do. So you can actually go from here and see the cross reference. So this integrin has a number. So this is again the result of composition. You can actually see some different 3D structures again with some trace of glycosylation. So it seems that the glycosylation actually takes place. Nonetheless, the glycoproteomic could only find one as per gene that is glycosylated and actually contains, I mean, bears all these different glycans qualified by compositions here. So this is really poor information. And this is hardly better. So I'm not going to, I forgot the, I'm going to do the other way around because I know the number of the protein in glyconnect. So the ACE2 protein here. So you can see there's something that I should have said from the beginning that I didn't say and that I will add, I'm sorry. But anyway, here you have the uniprot enzyme. So we can actually go back and forth between the uniprot enzyme and glyconnect. So here you have far more information on glycosylation. You can see you have publication except for the last site that documents the glycosylation. And the two publication are basically a proteomic study and an x-ray information. And the x-ray is actually the one whoops, no, here. That you can see here. This is a pretty heavily glycosylated reference that is here and that you see how heavily glycosylated the protein is. Actually, you're probably mixed in the Twitter picture, the two proteins, but whatever, you'll find it again. So one thing I have not mentioned that I should mention from here, for instance, because it's really important. You can see a sugar structure here. We have sugar structure pages. And we are really, really trying to have glycobiologists move towards associating an accession number to each structure exactly in the same way as no one would think now of citing a protein without giving either a Uniprot accession number or a RefSec accession number. And in the same way, we need to be able to reference to each of the structures in a database and our database for that is called Glytocan. It's developed in Japan by Kyoko Aoki Kinoshita. And she is really maintaining, and each time you have a new structure, you can submit a new structure. So it's really important that you have this information. So I would like to go back to my, oh, it's still running. No, it's there. So here you can see that the 85 common composition are extremely documented in the database. Here, when you do the custom search with a set of composition, it's going to retrieve every information of the database irrespective of the species and so on. So it means I have 15 structures associated with this composition, 32 to this one. So you can see that the 85 common structures that are found on the surface of the glycoprotein of COVID-19 are really extremely common in Glycans that are found in the database that we have, and probably in the literature equally as well since we try more or less to reflect the literature. And so this is the actual distribution and this corresponds more to the paper on the derivative of HEC 293. I think I have almost finished. I just want to mention that this is the homepage of Glyconnect and we have advertised this octopus that is not working very well. I have to apologize for that because we need to change it. And this is in the pipeline and going to happen in the weeks to come. We have a new Glyconnect octopus and the new Glyconnect octopus might be of interest to you because if you just select some structures, you want some very silo-related, for instance, and few co-silated structures. You can search them in Glyconnect. You'll have the octopus back for those who have ever used the octopus where you have in the middle, composition linking protein by default. This is the default view, protein with structures and structures here are by numbers but you can actually which structures are there. And we have changed the octopus to actually in the middle, you can put just about anything. So if you want to put the proteins in the middle, you can put the proteins and then you can put the taxonomy on one side and here you can actually see which are the taxonomies and here you can put the diseases. And so you can actually link the proteins with the taxonomies and the disease. So you can actually do your own shopping in that list and these lists there and visualize the data in any way. So this is really dev, this is will be online but this is to replace the situation at the moment that we know is not good when you go to this part of the database and there's a lot of problems for those who are actually using it. With this, I take questions. I hope I was not too dense in my explanations.