 All right, so I will start today with a topic that is somehow the centre of attention at the moment, although this particular topic is not the centre of attention, its object is the centre of attention. And I would like to talk about the glycane shield. The glycane shield is the expression that is used to explain what's going on at the surface of some viruses. So if you take the flu virus or the HIV virus, these are, sorry, I should quote that these are the drawings of the resource which is called viral zone. It's a Sib resource developed by Philippe Le Mercier, a colleague at Sib. And if you don't know it, I advise you to check it out. It's very, it's an encyclopedia of viruses really, very interesting. So I borrowed the pictures from his encyclopedia. And if you think of the hemagglutinin at the surface of the flu virus, which is of course here and repeated, the actual situation is that the spike, this hemagglutinin is covered with glycane. And there are four to six potential endling sites. And what is actually interesting is that they vary with time. And so you can see here that since 1918, when they have samples from the time and they have, I'm not absolutely sure how they map the glycans, but they had, they sequenced, and they had the end sites and they inferred presumably that they were there. And so these are models of the change of sites. So you know, you've heard of variants with viruses, more than enough to know that there are sequence changes. And so sites are appearing and disappearing with time. So that is a partial explanation for the not totally successful treatment with antibodies because the recognition sites are all of a sudden covered with glycans. So the situation is a little bit more complicated with the HIV virus, where there are many more potential glycosites. So the envelope protein is really very, very much camouflaged under the glycans. And that is also a possible explanation for the treatment with antibodies and the fact that the surface is not as accessible as it looks. And guess what this is actually a small animation that I made from the viral zone. So the entry of the virus into the host is very much mediated by glycans. And this is what we are trying to understand. And of course, in SARS-CoV-2, this is usually the way you are shown the spike protein. And this is actually how it was modeled by a group of people in UCSD in collaboration with Manus University, Carl, hello. And this view, which I told the leader who's working on this, she made that view and it looks like a nanodrome picture of the spike protein. And this is actually the virus view, I mean the host view of the virus. And so it's really covered. And it's interesting to see that very often this information you have so far 22 sites and 20 sites that have been mapped on the, so it's really covered with glycans. So what is interesting is that we have a Twitter community and there is this spare head of of our interactions on Twitter talking about glycobiology and the importance of glycans. So Gordon Lauck, who has, he's a professor at Zagreb University and has a company doing glycan profiling. And he is defending, of course, very strongly the viewpoint of not showing the spike protein as people usually show it without the single sugar. And so in an exchange between Gordon and some other researcher, the question of course of the interactions with also the receptors that are at the surface of the host and also heavily glycosylated. So how does this all work with the glycans interfering. And you may not know but the extremely flexible at the surface of cells because they are sort of waving in the wind created by the liquid I suppose. And so this is a real conundrum that is often brushed under the carpet and there's a lot of questions that remain unanswered and that should be considered a bit more seriously or it's not that it's not serious but it's not looked at enough from our perspective. So, sorry. What I'm going to talk about now is the last part on like connect and the reason why of course I mentioned the virus is that we have created dedicated data sets and the first one we created was the coven 19 data sets. And we actually pulled out from the database everything that is relevant to the study of SARS-CoV-2. So, as you can imagine in dichobiology because of the situation I just explained. There has been a lot of rev of publications on mapping the glycome and seeing different situations and here so of course with our terminology as I explained yesterday taxonomy is to because we have the human and the virus. We have nine proteins because, as I mentioned yesterday, we consider each recombinant protein depending on the expression system as a as a different entry. So I'll show you I'll make a little demo after that anyway. We have different sources, of course, because we have different cell lines we also integrate the receptor information when we have it. And so you can see that we have a lot of structures, which correspond to quite a few composition as well. Of course, and this is just building up when we have new information from a new paper we tried to include it. So I'll show you a few things after that. I'll finish on the special data sets and and we'll demo a bit after. HMOs that's human milk oligosaccharides. And so often, again, when you have the composition of milk, whether it's cow's milk or human milk, you are told that you have a lot of fat and a lot of casins and, and you, you are told that there's lactose but lactose is a little bit more than that. And there are quite a few molecules based on lactose. So you can see here the lactose which is just the glucose and the lactose here is common everywhere but there are variation on the theme. And what is actually the situation in the database at the moment is that by looking at a lot of references we are reaching quite a number of structures that correspond to these molecules. And it's not related to disease. On the contrary, it's supposed to be a sign of good health, because it's fighting microbes in your likelihood. We also have a link to a resource which is called the glycolog, which I will also just show you briefly. So we are developing the HMO module of the glycolog with a team. So Andrew McDonald's is, you have seen his face yesterday. He's partially in my group and partially in Dublin at the Trinity College working with Gavin Davy. And they both have worked and developed the glycolog. But we focus on HMOs together. And I'll show you how a bit later. And just to explain why that might be really interesting. You can see the sort of papers that are relating the microbiome and HMOs and the fact that there's a high chance of HMO playing a role, an intermediary role with fighting pathogens in the microbiome, in the milk microbiome. And so, of course, and maybe you have not heard of that, but there's already one HMO which has been accepted by the FDA and that is now part, maybe one or two, part of the formula for the milk for babies with, of course, the idea that it's bringing protection for the baby. So there's more and more research, there's controversies on the number of, we are probably, we have taken a wide array of papers and maybe they are over optimistic in the numbers that they have, but this is the situation as it is, and we keep on going. So with the O monosaccharide, this is also a completely different distribution. As you can see, we have a lot of species, many more proteins, a lot of sources, but not very much structures. And the reason why is that these are not very interesting structurally speaking, they're just one monosaccharide attached to a residue, often serine or threonine, but also tryptophanes, so different residues than usual. And they have very, like, I mean, it's proven that they have a modulatory role at the surface of proteins and the most famous one is the Oglucnac and Ogluclacicillation is now very widespread. It was told, it was said before that it was competing on serine and threonine with phosphorylation, so there's antagonism between the two modifications and there's now some databases that are focused only on Oglucnac. So we had some data from partners that we integrated. We will probably not put in that much more but refer to these atlases, but it's interesting to have this O-fucosylation for instance, all these different small glycan monosaccharides that play a role functionally at the surface of protein. And then the last one we have recently integrated is the human immunoglobulins. And again, the amount of glycosylation at the surface of immunoglobulin is massive. The most studied are IgG and IgA1, but there's more and more you can see that for IgM it's quite a challenge. And what of course is interesting in the case of immunoglobulin is that you have variations. So this, there's only one site on each chain of the IgG here on the heavy chain and it's been for decades, it's been mapped and remapped in different contexts and especially adversity effects for monoclonal antibodies, you have to have the right glycosylation for the antibody to be effective. So there are companies who have been created for the quality control of this glycosylation, which is really important. There's probably around 70 or 80 possibilities of structures at the surface. I mean this is what we have in the database more or less. And what is interesting is to, of course, measure the quantity of each of the possible structures, especially those that are galactosylated or fucosylated. So all of these properties can be quantified. And this is the only way to make sense of the situation if we have quantitative data. So this is really the objective of Glyconect for 2021 and the beginning of 2022 to have some profiling at least of IgG to show the in different conditions, different disease conditions, different tissue conditions, how the variation are impacting the function. So what it really shows, and I borrow this image from Josiah, who is a glycobiologist in Boston. He suggests and he has been pushing me to having Glyconect a representation that would be like this where the protein is described with each of its sites and each sites, you have the relative quantities of the different structures, the different glycans that are sitting there, whether they're o-linked or N-linked. So probably with N-linked, it would be easier considering the data that we have and what is published. But at the moment, not so much is published on this site-specific quantification, not in many proteins at least, certainly in immunoglobulins and this is why we focus on it. So of course, the idea is that it's tissue-dependent, so each time this has to be as a function of the protein and the expression. And of course, we need the tools for that, so we're working at it, and you can see that we would have data also for the spike protein. This is the first paper, it's still bio-archived or maybe just accepted, I don't know, that was released a couple of months ago. And it's really the site-specific quantification of the spike protein sites. So you see that's really a key aspect of what we're doing at the moment that is missing and that needs to be integrated. So before I move to that, I would like to show you live those different data sets. And for instance, if I look at the proteins of this particular site, you see that there is the receptor here, ACE2, that we have, which is very glycosylated. So we have two references that actually talk about it and mention the composition, so you can actually browse and have a look at that. And here are the different sites where, the different spike proteins, so this is expressed in this cell line, this is expressed in show cells, this is expressed in a certain type of Hec293, and they even distinguish when the Hec293 are not the same, because this is a different, you are potentially familiar with the cellosaurus. The cellosaurus is the new, most recent and most comprehensive database of cell lines, and the actual accession number for the freestyle 293F is different from the Hec3. Here this is 0045 and here this is D603. So we make sure we're not putting everything in the same basket, we can compare that with compositor, and this is why we distinguish. So we have here, you have only two structures because it's only the mapping of O sites. So that was done with only a gal or a core one O glycan, and we can yet compare this, there's a few others with O glycans with more detail O glycans. So I have actually pre-computed some data on COVID to show you, so for instance, the difference between the, where is the difference between the Hec, no, that was it. Yes, this is the difference between the, so this BTITN blah, blah, blah is an insect cell, and this is Hec293. So I've taken all the N sites, so you can see there's quite a list of them here, and this is the mapping between the two, and you can see that the blue nodes are the insect cell, the red nodes are the Hec293. So the partition is relatively clear that you have all the Hec293 in that area, you have the blue in that area, and you have the common ones like, for instance, the usual suspects in the high mannows, that are there, so there's a bit more here, and around here we have just a few consolidated versions. So you can see that there is a bias according to the distribution here, towards neutral for the insect cell, whereas the neutral are less in the Hec, and you have more few consolidated and silalated, and sometimes both in the Hec. So it's a trend, of course the partition is not absolutely clear cut, but you can see obvious trend. So glycosylation in the insect cell is not the same as glycosylation is in the model Hec293 cell line. So I had another example, which is here with the RBD domain that was singled out. So there's also the difficulty of expressing the spike protein, of course, is a number of subunits. So sometimes you have subunits expressed together or not. Some domains expressed are singled out. So here we have different papers, and one is expressing in show cells, the other one in 293, the RBD domain, and then this is the control or the reference, the basis. And you can see again that if you consider the full protein being expressed or the RBD domain being expressed, you don't have necessarily the same expression. And if you express the RBD in Hec293 or in show, you don't have the same results either. So you have in the middle the common structures that are there. So they are all obviously containing a lot of group next. And, and then it's, it's a matter, I'm not going to, you see the distribution there, where you have here, obviously a bias towards a few consolidated here bias towards neutral, and then you have more high mannows in this one, and so on. Compositor is just bringing out these differences. And then it's for the biologist of the virologist to interpret and see what system is more relevant to use to express the protein, etc, etc. I also did a comparison at the site level, so looking for instance at in the insect cell and in the Hec293 cell for one particular site, so the Asparagine 122 is particularly glycosylated. And you can see that the partition is actually reproduced where we have the different distribution, the bias towards neutral and for the insect cell. And I think I have yet another. I did also the freestyle, the Hec293 and with ASN165. And again, you see some trends and partition different, though comparable distribution between B and C, B and C are a bit more overlapping, which is hopeful because they are somehow a derivative of one. And this is again food for thought and seeing that you have a tendency to recognize rather one or the other type of sugars of glycans if you use an expression system or another. I can see there's something in the chat. No, there is no glycosylation of native. As far as I know when we can check again because we we are not, but that would be that would be ideal, that would be ideal. So what we have, there's a very interesting paper, and that is relate related to the glycosylation of immunoglobulins. So they have found with some native isolates that the actually I should say that the freestyle is the closest to the native they try to to to get to the in the closest conditions to to native. But to go back to the antibodies, they have found by testing some different type of samples and associated with more or less severe symptoms that the more the more severe symptoms. The antibodies that are actually binding the virus are more heavily fucosylated. I leave it to you to think about it if you're interested in the paper I can dig it out for you. But that's, that's an interesting point. So this is on from the point of view of the antibody. The immune system is working with a lot of components. And because the glycans are not always taken into account. It's interesting to see that there are seeds of explanation for a number of phenomena. And there is also a paper of people who have tried the different vaccines and the effect with glycosylation of the different vaccines, which is an important paper as well. If you're interested, I have that in my list. Okay, so this is for COVID. Okay, I will call. This is for COVID. I'll go quickly on the. So the, I'll go quickly on the Omono saccharide. You've already seen the structures that I showed. You can see here, for instance, that all fucosylation is in a number of different species. You can see that here we have only slime mode for this one, but the ogluc neck that is here is listed in different organisms, including viruses. You have here the first one, which is only in, so you see that we distinguish when we know the linkage and sometimes we don't know the linkage. This is sometimes sea linked. We are not quite sure whether we can distinguish the sea or the O linked. So sometimes it's ambiguous. So, again, if you're doing, if you are into protein and proteomics and you're looking at PTMs. All of this, these different, so needless to say, there's a, this has, if I go to the structure, we have the mass data for these, and they are well known. And, and you can see that it makes a difference in your peptide if it's modified like that. So that is for monosaccharides. So, and if I go to the human IG at the moment, we have, so this is really the work of Catherine, who is helping me with the course. She has gone through a number of references. As you can see, the immunoglobulin so it's in the human are described. So we have papers until 2021 goes down. So this is relatively recent and we have all the ones as well. And we have in the proteins so we distinguish the chains so you can have, of course, we have IGG and they are four genes for IGG. This is gamma one, gamma two, gamma three so there are some old papers you can see here. We have relatively old papers from the 1990s down to 1987. And so at the time, you probably know the human genome was not sequenced and it was not necessarily known that there were so many genes for IGG. So, and maybe it was known but it was not easy to associate the data. So when we have, of course, the, it's actually specified, we take gamma one, gamma two, gamma three. So we have distinct entries for that. So, and you can see for each of them, more or less, you have 40 structures here, 42 here, 36. So this is the average 50 for, this is for delta. So you see there's a, there's a really broad collection here. And with, with different so we have a few hybrid as you could see, I think, here. Or maybe we didn't keep the hybrid there. No, we don't. It's in the database. Okay, so here. So last for the HMOs. It's a different section. And really, as you saw, protein zero because they all free. So there are saccharides. Yet, we have in glyconect, you can look at the milk. And you will see that we have a lot of proteins. So there are some recent papers, especially from the Albert Heck lab that did a really extensive and glycoproteum of human milk. We got a lot of structures and proteins from human milk that are described. So it's all mixed here with the lactose with everything. So this is why we created the special section for HMO so that we see only the free ones. If I go to one of them, this one possibly we have so this is a typical one. So we have four references that are actually describing it. Yeah, I should, I should add so sometimes, like we have these complicated ones. And there's only this one reference that usually describes the more the more complicated ones, but still, it doesn't matter. It's, but it's always more comforting when you have four references or more than one reference that actually establishes the existence of a particular molecule. So here, we are linked to the glycologue as I mentioned. So, this is the, the way the molecule is represented there with the orientation, we've seen that yesterday afternoon. And so you can have the pathway, and then the idea is to have the mapping and have the, like the glycogenes are mentioned here. So all the enzymes. So this is Alpha 3 fucose T that actually synthesizes this reaction. Same you have ST3 gal 3 that synthesizes this one. So we have the list of enzymes, and the purpose of the glycologue is not only to re-constitute the molecules step by step, but also to simulate how many structures can you actually build with that set of enzymes and why do we have only 200 or 260 when you can actually create millions of them. So maybe not millions, but at least quite a few thousand. And so there are some rules. So Andrew McDonald is a biochemist, you know, he's and demology by heart. And so he's applied some rules for not using any old reaction in any old way. And so we are currently working on the simulation. It would take a while to explain what we're doing. We're trying to write the paper at the moment. But I'm just saying this is this is one of the purposes of what we're doing with HMO. So possibly to help synthesis having some synthetic HMO that maybe are not found but that could have a role, an interesting role, in blocking pathogens. So what else did I want to say about these? I think this is it. So it would make sense that I would have a little bit of a break. Now, before I start, oh yes, there's one thing I thought yesterday that I didn't tell you that I think was in one of the exercises, but for those who don't do the exercise and look at the compositor. And the fact that if you take actually any protein and you have, for instance, so we have by default the serotransfer in because it's, it's actually quite, the glycol is quite extended. And there's a number of papers again that are documenting this glycosylation. So, here you go. I wanted to make a point about virtual node, which I didn't really insist. I mean, I told you yesterday in the presentation how virtual node can be actually confirmed with further experiments and further paper included in the database. And what we usually do to actually see whether a virtual node makes sense or doesn't make sense. There's a very simple procedure that I wanted to show you, which is to select the virtual nodes here to copy them to go to the custom and paste here. So these are my 10 vn and they are n linked here. So I add them to the selection. The, the reason why it's taking a while is that it's actually going to through the whole database to see whether these different compositions, what they correspond to so it's dragging the whole database with these composition to find out whether they have matching structures. And so that's why it takes a little bit longer. However, it has an end, I can assure. And it's highlighting then in your, in your network, the virtual nodes, whether I include them or not is not making any difference. It's not going to find any new. Sorry, it takes a while. Okay. Maybe Frederick, just on the virtual nodes, just the example of Catherine down in Australia. We, we had done a graph like this if you want to, that's a really good example of how it works. No, I, but she had and but she, she has challenged our definition of, of virtual node in the sense that she found a two step. I think it has happened. No, she has found a two step, or even I kept it. It's open here. What she did is that I sent her a compositor network graph, asking her to justify a virtual node so they are the virtual are in red here, exactly the way I would like to see them here and it's not happening. I said, I have, you told me you have this composition you have this composition and this composition, and my compositor suggests this intermediary, or this intermediary possibility to bridge between H7 and 4S1 and H6 and 3S1 that you have identified. And she suggested she went back to the raw data. She looked in the mass, mass spectra, and she could actually find another path here that goes through H6 and 4, H7 and 4 and back to. So, there was indeed something that she, this node, without those virtual node was isolated, and it was not making sense. So she found in the data, a two step process for connecting H3, H6 and 3 with H7 and 4S1. The problem is, as I said we, we, we're doing only one step at a time if we start putting two steps we're going to have overpopulated graphs and it's going to be crazy. So we need some rules so I'm working with actually this person in Australia, trying to see how we could constrain and open the possibility of putting two virtual nodes. In very strict conditions so that we can possibly make sense of linking two nodes that are not linked. And in the meanwhile it happened. Thank you Catherine. So, so you can see here that you have the red nodes that are the 10 composition that were virtual node before so I don't have any virtual nodes anymore. And I can justify nine of them because they happen to actually correspond to existing structures here or there so in all likelihood, they could be there. Maybe, maybe not, but it's a fair assumption to think. And so maybe it's also a fair assumption to privilege this one which is more common in comparison to this one which is less common, but who knows. This one here is not associated with any data in the database so as I've said before of course the database is not foolproof and and it's biased and we don't have it's not comprehensive so it could be that this one is connected to a structure. However, it's a new GC. So this is probably a bionic identification. And then I would consider to ignore that virtual node, because it is not bringing more and this would be connecting and it's actually related to a number of structures. So this is like a quality check. Once you have a glycone that has virtual node, the export here, only of the virtual node well not in that case because there's no more. And pasting it in the custom tab is a good quality check for the reality of the virtual node. If you don't have someone at hand in Australia who is going to go through the mass spectra and dig out what is missing. Okay. I've said everything I wanted to say about glyconect and the idea now for the second part of this morning is to concentrate on glycone binding, unless you have burning questions on glyconect or compositor or anything else.