 So, welcome all to this introduction to glycoinformatics, which will be given today by Frederic Lisacek and Catherine Hayes. And I'm going to start this. So, what is new from. Okay, so this should disappear. Okay. So, from the last video, especially regarding experts say so, presumably, maybe I'm assuming too much but since you are familiar with SSIB and enrolled for training on for SSIB. You must have heard about experts see not only from us but from other groups because this is the, the portal of the Swiss Institute of bioinformatics. And we have posted most of our resources available on on experts see for the past six years. And particularly in, we, we managed to create a glycomics section where most of the tools that are there so I mean all the tools that are there have. They are not all functional I have to apologize because we are with this so much to to update so some are working better than others so please send some feedback and if you see that there's one. You are interested in that should be fixed in emergency will definitely take your opinion into account. And so, we have. So, so this section, and also our own portal within the portal, which is called, which used to be called glycomics at XPC but we shortened to glyco at XPC, where all of these tools are also available, but mostly, we have tried to develop a methodological interface for people who have no knowledge of glycobiology. And this was presented in one of the videos last year. I will quickly demo it after this presentation anyway but the idea is that you have a menu on the, on the left, where you can actually click in a, I mean tick a box. And this will automatically zoom in the bubble, because we have, as you can see here everything is represented as a bubble, every tool is a green bubble, every database is a yellow bubble and every portal is a red bubble. And so we try to contextualize all of these resources with a very short summary on what is in the content of the resource, and then you can click on the bubble, and you zoom again, and then you have for instance, for this particular site theme, you have three tools that more or less either predict or display information on glycosylation sites on proteins. And this is one tab and from the novelty of the new feature compared to last year's that we have included a dependency wheel because not only are we trying to help you with cataract categorizing the themes behind each database each tool whether they meant to do, what's the input what's the output when you have a tool etc. But they are all individually explained so we thought that it might be interesting to have a dependency wheel where you would see which tool depends on which database and etc etc or which portal host which database. So we keep the same color code with red for portal green to tool and yellow for database, and if you mouse over a particular resource here glide to can I remind you that what you can is the glycan structure repository hosted in Japan in the glycosmos portal so you have the glycosmos portal. It's actually used in glide Gen which is another portal. It's actually cross referencing to other glycan structure databases which are unicarb DB and CS DB which is bacterial and fungal and plants carbohydrates so with this type of specificity, then you can see that there's another database called glygen which is the main database of the glygen portal and the glide connect also cross reference to glide to can so all the references between databases are in yellow and then green you can see that you have a number of tools that actually rely on the glide to can database to be actually functional. So, this is just an example I'll demo it again after the presentation but that's somehow helping to see distinctive features of each database and tool in this sea of databases and tools now. I always start talking about the resources from what the glycobiology says point of view, because we are relying on knowledge that comes from many. I mean it's really an interdisciplinary approach to understand what glycans are doing at the surface of cells. And here I only talk about proteins but I could talk about glycolipids. This is another type of problem so at the moment I'm just restricting to proteins but if I speak about proteins, I, I, I call to proteomics and protein science, if I talk about glycans. There's a lot of carbohydrate chemistry structural biology that is behind biochemistry. And so all of that is sort of behind this picture that cannot be cut into pieces like it is usually like you look at glycoproteins with the sites from a protein science point of view but you don't care about the glycans. Same thing, you look at the glycans but you don't really look at the protein that carries them. And then you have the glycan binding protein the lectins that are recognizing the glycans so that because of all the experimental work that is behind which requires very different skills. Often, the problem is broken into pieces, and we really mean to keep them together because otherwise functionally that makes no sense. So I picture here, the end glycans, the old glycans and the glicosamino glycans that are sticking out at the surface of cells, and that we can that are recognized by carbohydrate binding proteins. Now, going back, whoops, sorry, I'm looking back, I want to go for excuse me, I'm going to make you seasick. I want to give you an update on glyconnects. And for the time being, glyconnect that we co-develop, I mean, Julien Mayetou is the main developer, Catherine Hayes, who you will hear is also doing some development and some curation. And we are mainly so you can see here, that was last year's figures or beginning of this year's figure with 4,720 structures and 1157 compositions. So these are the new numbers we keep on growing the numbers, having more glycosylation events to display in the content of the database, and we still have the dedicated data sets. We update much less the COVID-19, but we have like in the next release, there will be a new, a new data set of COVID-19, HMO is not moving that much. All monosaccharides like all Gluconac is going to grow because we are going to integrate more data from a partner database and human immunoglobulin, Sophia, is also growing and we are, we have been asked to annotate a lot of therapeutic antibodies and their specific glycosylation is reported in a number of papers so Catherine is really actively working on that. And soon so the idea is that within, so hopefully next year, if there's a SIP training on glycoinformatics, we should be able to report quantitative data so that we have profiles of glycosylation associated with immunoglobulins, so that is really the point in updating what we're doing. One of the issues that have accumulating data, this is a page for a composition, so you can see here glycans that are all matching the composition, they all have six hexos, six hexnac, one fucose and two silic acid, but you can see this is a very well defined structure, all the linkages are actually shown, but this is far less distinctive and you can see here this is the, of course, the representation of this composition only with the shapes of the SNFG representation, so we are accumulating a lot of ambiguous data, and it's because glycoproteomics and the most high throughput methods that are using glycoproteomics and glycomics, they do not allow a high level of precision in terms of generating the structure, so this is what we end up with. And we still need to be able to compare those structures together to find some motifs in those structures. And if you think that, for instance, you have this motif that corresponds to usually a glycan epitope, you can find it in this structure well defined, you can find it in this structure well defined, but you can also find it in this structure not well defined, but potentially the composition is there, so you would like to be able to have a tool that not only matches substructures in a very precise way, but also in a very loose way, so that it matches the current data that we collect in in glycoproteomics and glycomics. And likewise, you would like to be able to have a very loose definition of a motif of a glycan binding pattern, for instance, which is ambiguous, and you would like to find the structures that match that. So for that purpose, we relied on the semantic web technology so built an ontology, which is described on that I mean the some sparkle queries that you can do with RDF are can be tested there. And it will be the content of the second hour of this course that Catherine is going to explain to you this very busy slide, which is probably very quick for you, but she's going to, you'll see it will be crystal clear at five o'clock today. So if I just may tell you what it's about is that we have the means with this ontology and with the semantic web technologies and the sparkle query language. So with the possibility of asking a question, can you find me any end links that is categorized as complex that has no undetermined region that is by antennae, and that contains the terminal salic acid. So this is like magic, it works, you can ask this sort of questions and Catherine is going to explain how that works with the new tools that we have put together over the past year. So this was for glyconex, and because as I said, I never distinguish the glycan binding, the, the glycane bearing protein from the glycan binding protein. I have to talk about unilectin. So we have updates on unilectin as well. We have developed a new module which is called Trefleck for and predicting the better triple lectin and there's a really. So, yes, of course, Francois did the first version of unilectin and Jala took over last year. And there's a nice story attached to the Trefleck module that we put together, because and this is a paper on the revision at the moment, which will hopefully be resubmitted very shortly. And this is that we looked at the, the different classes of Trefleck so we have a distribution we can see how these different better triple are spread between animal plants and bacteria versus eukaryotes, etc. And so out of this prediction, Anna Bertie with whom I co lead development of unilectin so she's in Grenoble and she's a lectin specialist. She found one prediction that was very interesting because it was in a primitive protozoan so and it is used often as a model for animal evolution. And it had another domain so it had the lectin domain we are looking for repeated three times and another domain and arrow a rollie lysine domain. And so she decided because she's an x-ray crystallographer or has a team that does x-ray crystallography so they actually did the x-ray crystallography of the lectin domain. And so that we predicted the right sites with Trefleck and that they could identify the the pockets where the glycans are fitting. And then they actually looked at this model with the arrow lysine domain. So we made a beautiful carousel domain, I mean a protein model of the lectin domains being the horses of the carousel turning around. So it's a poor forming protein so we still have to bind very specifically this ganglion side here and this alpha gal neck in this way and it's just that we could prove that prediction that we made was actually a real protein. It has some interesting properties as a poor forming protein. Another aspect of lectomex flow we've looked into so we have updated the prediction of lectin so this was last year's figures we were almost a million candidate lectin that we are over a million and 1 million and 200,000 candidate lectin. But with a score you go down to only 750 or just about. And if you are really restrictive in the scoring system, you have 150,000 to check. And the number of species of course is also growing and we have as new genomes are coming in we have found more. If you're interested in exploring lectins you are quite welcome to check the update. The other application we have managed to do is to combine our lectin prediction with glycan binding prediction in this software. And this is the work of the team of Daniel Bojar who is in Gothenburg University in Sweden. And so we have with this prediction, we're trying to refine the classification that we have in unilectin, which is based on folds and glycan binding of course is related to folds. And, but we can do some fine tuning of classification with this prediction so hopefully by next year we'll have more prediction included in, in unilectin. And so the highlight of unilectin for this year will be the release of human lectome. So this is a new module on the human lectome that is all the lectins in the human genome. And I will demo that as well we had to actually merge a number of sources including Swiss Prott but including different reviews that were published in the literature, or online. So in the end it looks like we can stick to that number of 271 lectins in Homo sapiens, but we have some curated lectins and some putative lectins. And again, the idea of having everything in the same spot is that we can see everything on well known lectin like Galectin one see the specificity, all in one go. We have, and we have in preparation more of that but we have the cross references to expression databases so that we can see where these lectins are actually expressed. And we have also included all the alpha fold prediction, and we have also made some models ourselves to to have in cases where there's no three dimensional structure to to support the annotation of human lectins. So I'll demo that shortly after. So finally regarding tools. This was a real thorn in my foot, because since last year I kept on announcing that we would have a substructure search updated on expertsy. And this has happened at last with glystream, what Catherine will talk to you about and she'll probably mentioned the substructure search so this is online now. We still a bit raw in the sense that if you don't have a glyco CT code for your glycan structure, then you will not be able to use it so we will definitely integrate a graphic interface where you can draw your glycan and ask for the substructure. So this is not fully operational as it used to be but really getting close to it. Stay tuned. And the last tool I want to mention and the update and the interesting feature, possibly for you is glycom glyco glyconec compositor sorry, which is really related to glyconect and I just remind you that instead of taking the, what the glycoporium mix output is giving us that is a list of composition associated with the sites, we are trying to connect those composition, just by the addition of one mono saccharide at the time so here a few goals, so that each, you can see that in fact, those composition are related to one another just by adding one composition so that we can have a glycone for site, we can have a glycone for a protein, and we can have a glycone for a tissue. And the last new feature I'd like to advertise is that when you actually query for instance source which is the tissue. Let's say you want to do a glycoporium mix experiments, and you want to build a database of a data set of composition that are relevant to urine sample analysis. Here, we usually use glyconect to do that so we have 84 composition, and the new feature is that now I have been bashing and really preaching everywhere in my talks in my courses that having a accession number for each of the entities that you're dealing with because for instance it would not occur to any of you if you're studying protein, not the thing of providing a uniprot accession number or refsec accession number to characterize a protein, we have to have this inbuilt reflex to have an accession number for a composition or for a glycone structure whatever it is. The problem is very often it's disconnected and so it requires an effort from the user's point of view to go to glycocan and extract a glycone accession number. So we're trying to simplify your life, and now you can export your selection of composition with the associated glycocan numbers for these so sometimes we don't have the cross reference but it's for such a simple one. Maybe there's just a bug and we'll have an update, but definitely we expect glycoporamics people to report each and every composition with a glycocan ID so that we know what we are talking about. So I'll keep on preaching and you will have less and less excuses because we'll make more and more tools where it's easy to get. That doesn't mean so this is the urine glycone, and that doesn't mean that you cannot use to just a simple list of composition if you don't want to have the glycocan ideas, this is not compulsory. So if you go up in the window, you'll have the glycocan IDs, if you go down, you'll have it without that's simple. Now, the great advantage of looking at the relationship between composition is that you have profiles, and you would like to be able to compare those profiles. So, let's take a protein glycoprotein this is the profile of that protein has 16 composition, they are related in that way. And what I call the profile is here, the distribution of properties. We have a majority of salelated and a little less of eucosalated we have no neutral and we have no oligomanos in that particular profile. So the question we've been asking ourselves, can we pull out proteins with a similar profile in glycone. So, are they there or not. Can we do that and what. So we've been struggling for for a while, designing I mean defining the measure of similarity, which I will not going to describe at the moment because it's still experimental but we're getting there. So, for instance, this particular protein this particular peptidase has a similar profile with three other two other protein that you can see here. So, another phosphatase here, and here a receptor. And you can see that really the profile cannot be any closer. And you can see also that the composition may not be exactly the same. So, the first one, well there was two extra in this one but there's more than two extra that remains in remain in blue. But you can see that the intersection between between A and C for instance is very limited to zero and to BC is only one. So it makes more sense probably that two enzymes would have more common glycans than a receptor. Possibly, this is from the top of my head, I have no idea I haven't checked those proteins really. This is the only examples, and I wanted to illustrate that. And I'm going to show you some more examples. So, this is going to start with proteins but we are also looking at comparing the glycoms of tissues or cell lines and of diseases with this new measure that and hopefully trying to be very strict so that we have very, very strict similarities so that we don't end up comparing loads of profiles with loads of profiles based for my presentation and what I would like to mention compared to what we explained last year. So, what glycoinformatics says is that we do have a number of resources databases in particular to that are specialized in proteins in glycoproteins in glycosaminoglycans in lectins in glycan structures. So we really relying on these, we have tools that we really trying to improve more and more to exploit and visualize that knowledge that is in the databases, and we keep on building a complicated picture of glycoinformatics and glyco biology with my group, whom I think very much with so Francois has gone but he's been so instrumental and Jalla has not provided any picture. But on the unit in front. Our team of like our advisory and board of collaborators in glyco biology. The XPC people who have helped us very much for hosting XPC glyco at XPC as it is now. And also I remind you that we have tried to put this gly space alliance together with Japan and the US. So glycosmos in Japan and glygen in the US and that we keep on trying holding this glycoinformatics together. So, funding, and I thank you for your attention, and I'll take the next few minutes to so to actually show you some examples. Back here to show you. Indeed, how you can actually directly a zoom in a bubble and have all the box ticks ticked here. And, and here, you can see on structure you have glad to can you have CSDB that are there. And is the universal universal repository CSDB the bacterial air kill plant fungal carbohydrates. So you can zoom out clicking here. So we have, oh yes I have the, the whites. Now I've, we've added those ones because we have glycopedia, which is a very useful resource to learn about glycobiology and carbohydrate chemistry in general, and essentials of glycobiology. So the, there's a new edition that was released. Actually, this year, you can, when I change screen everybody can see, Monique, you can see. Yes. Okay. The new essential of glycobiology is here. New cover. Guess what, the we advertising how glycobiology and the knowledge of glyco science is important in in virology in particular so very recently released so 2022 and updated edition. So if I move to here, I mentioned as well, I can. Sorry. So we have this particular gang DB here, for instance, which is a because I'm in a glycan database related to matrix DB to the protein. And so where you can see actually the binding of because I'm in a glycans with some proteins and glyco 3d. We can look at a tool like so glycery that was the substructure search, which is used integrated in unilectin in glyco store in sugar bind any compositor here related to the all these software that are like glycoproteomics data analysis software. So we are trying. This is wishful thinking, it's actually not act related. I mean, it's not cross referencing each other. It's just that the topic is related that you can see all of your glycoproteomics data with glyconic compositor, and you have here glad, which is an array analysis software, which is related to the CFG portal to a database of of glycan binding proteins reagent, the RS for reagent, which is an NIH and sugar scratcher, which is the the tool that is used there. So you can actually browse all of these and see for yourself. So uniprot, of course, cross referencing, not necessarily again, uniprot does not cross reference to all these analysis tools for glycoproteomics, but all these tools for glycoproteomics actually rely on uniprot to identify glycopeptide. So this is why the relationship is there. Now, I wanted to show you some more complicated profiles. So you can see here, for instance, I have three proteins lactotransferring pro sap, pro sapazine and fibronectin, all the end linked with the the sites. You can see I have a quite a tight network because each time we have close to 100 composition, but you can see the, so you can see the few cost related structures, and the ratio between neutral oligomanos, few cost related and is really very close in those three proteins, you can see the 51 so 50% of these structures are actually common to these three proteins, they have their own specificity, they have a lot in common pair wise as well. And so this is one way of exploring again. I have another smaller example where you see here a big bias towards neutral and less with oligomanos absolutely no silhouettes. This is in the mouse. No solidated structure. So this is again a possibility of investigating the data and the common profile. I'm sorry if I make you, if this sounds interesting and you can't use it now, but we'll need another few months before it's online but it's really very close to to being completed. So let's go back on unilectin here. So this is the human lector. This is how you can actually browse it so it's really close to to ready so this is the actually I wanted to show you first how it looks like curated so this was shown, and I wanted to actually particularly focus on this one, which you can see here so we have 23 lectins of this beta sandwich immunoglobulin like, and these are usually adhesion molecules, and for most of them, they are allic acid binding IG like lectin so called also siglex. And you can see here that in this in the categories of siglex, we have siglex three that has a unilectin entry. We have seven PDB structures of siglex three. So this is CD33. And you can see the specificity changing. We have acid linkages here. So some larger so this is what unilectin 3D would tell you. However, in many cases, we don't have so siglex 141110. There's no, there's no PDB. So we have to go to lectom explore to actually see the entry here. If you remember, you can actually check whether the binding sites are conserved. And this is with a with a point five score means that most of them are conserved, but not quite. Anyway, this is the lectom explore entry and the human lectin entry. I think I looked at this one looks like this or there's more information on the protein you have the sequence as I said, you have the alpha fold in that case, because we have nothing to offer. So you can see that you the lectin domains are pretty well. So it's a blue score so it's, it's a relatively reliable prediction but you have some kind of orange very low prediction in the loose ends of the siglex, which is understandable. So we have a link to Swiss model, and we are currently gathering all the model prediction for for those unknown human lectin fold. And here we, we are expecting also to have the human protein at last cross referenced and we have here the proteomics DB to see the expression and then there will be more RNA and DNA. And it's a little bit more complete than our usual records, which are really focused, usually on here, if we go to a unilectin, which is here. So if you look at a record, it's really focusing usually on the on the binding with a closer look. Light mall, we also moving from light mall to mall star. So this is, and we have this possibility of looking really at the very close interaction here. So this is for unilectin 3D. The last thing that I wanted to show you is a page on lectin explore, which is for people who are using commercial lectins. And we have gathered this table where you have the specificity so it's ranked by specificity in columns and by fold in in rows. And each time so you have, it's the specificity here you have the green button for predicted lectin so that means that there is no unilectin 3D entry for this one for instance the unilectin 3D entry so we open the same sort of entry as we have before. Whereas, with a green one, you have a lectin explore entry. Oops. The more effects that doesn't work. Let me try another one, you would to see if I identify a problem so here there's no unilectin link at all. This one. This is not advertised yet. Again, it's in preparation and we have obviously a problem. So it's working with all the blue buttons but the green buttons are obviously in need of something so I have to look into this. But anyway, if I go back to a blue button, you have a link to two databases. The data repository of of glycan array on the CFG site and a very new database which is called carbon growth and carbon growth is actually summarizing with with different box plots and so on, the the specificity of some lectin so we have a direct access with with this. So, this, again, is in the making, almost three but not quite. We've used the vector laboratories guide and a review that was published this year. So as to help people who are using that is a bi-lectin to be sure they have the right specificity for it. And the last thing I wanted to tell you is that we also have a new. So I should hide the URL please don't look at it. We will have some especially education and training that will be summarized here on on glygen on glycosmos on what we do so it's it's really about what everybody does. So you have here a map and maybe we'll add some partners in the future. It's really not restricted to the founders of this alliance, and we will, we hope because the current website is not very functional. And it was just to to advertise and put our foot in the door so now it's the next generation and will have also some news. So, potentially you can stay tuned and it will be at the glyce by glyce space.org, as it is at the moment. And hopefully this will work soon as well. That's it.