 Good morning, everyone. I'm Monique Zahn from the SIB training group. And today and tomorrow we will have Federique Litzacheck, who will give the course on introduction to glycoinformatics. And just so you know, this course is being taped, so please mute your audio. And I guess we'll be using the chat for questions. So over to you, Federique. Thank you, Monique. So the idea for this morning is that I will give you an introduction on glycoinformatics to begin with. And then I will take some time to demo a few things that I have been talking about. Then I'll have a special focus on glycoproteomics and the bioinformatics that goes with it. And then I think with some example of a few more ideas in software, we'll probably reach the end of the morning. So I will start with sharing my screen. Everything starts here, my presentation. Okay, so the landscape is such that I think I'm going to reduce that too. I'm going to talk about the landscape and what we call the glycepase alliance you may have never heard of, but sorry. So I will start with some concerns, not that it's a negative, but and I would like to start with the example of proteomics. And so once upon a time in proteomics, the concern number one was to release data. And release data was so like almost 20 years ago, producing lots of lists of proteins and the longer the list, the better you were at the time. And the idea is that since the bioinformatics behind was using, for instance, the Uniprot database, still called SwissProt probably at the time, or just beginning to be Uniprot, you had those long lists of protein names with masses and the peptides identified sometimes not necessarily in that list and the accession number referring to that protein. Then time went on. And the second concern of proteomics was data presentation. So how should we represent a proteome. So you had a lot of pie charts and the pie charts were actually defined as a function of the protein function. And there was alternatives where you had Venn diagrams showing how many proteins how many peptides were matched from from the databases for instance here comparing different search engines you had different identification so there was some kind of input in trying to show the content of a proteome or the content of identification with some charts. And concern number three was data visualization and of course data interpretation. And this is where, when the networks of interactions and the all the proteins that were identified was really put together in a way that made sense from a biological overview, so you could interpret the data and start building a scenario of what's happening from the in biology, when you had a particular sample to study. So this was a real evolution of our number of years. And this is a good example for us to start with, because this evolution was powered by bioinformatics. Not only of course by technology I'm not putting aside mass spectrometrist, putting another efforts in identifying protein but it wouldn't have happened if the search engines were not fine tuned by bioinformaticians and but most importantly, if the different people who were developing databases data repositories to collect the peptides were not speaking to one another were not starting to describe the data in the same way agree on standards, create consortiums or that they would talk about the same things in the same terms or similar terms. So this is a very important aspect, and it's a key to success in an omics field that different groups start shaping the way data is presented visualized interpreted as I just showed. Now, in glycomics, we are stuck with concern number one at the moment. We're still in the process of producing large tables with glycans. And I would say it's even without having solved concern number zero, which is to have a precise accession number for each of the structures, each of the glycans that are identified. So this is happening more and more, but not quite. So we still have a lot of publication in glycobiology journals, where you have this sort of table. And if you are a bioinformatician, I can tell you it's hard, because programs are not good at reading images. So all you get stuck with is looking at these pictures and trying to put them in a format which is machine readable, which and this step should be skipped. I mean, should be handled easily. And so we shouldn't have to do that. This is why we are trying with a number of people to go directly to concern number three, which is really the heart of the matter of what we're doing and and being interested in biology and glycobiology. And we formed that life space alliance with a Japanese group, our group in Switzerland, and an American group under the supervision of Nikki Packer, who is in Australia so it's a very international alliance. And this is grouping. So the Glycosmos group in Japan, the Glygen group in the US, it's actually consortia and glycomics at ExpoZ in Switzerland. So this is possible because we speak the same language. We have ways of describing glycans. We can have several ways. We are happy and over ways. We are not ayatollahs telling you that it has to be one way and not any other way. But we are certainly narrow down the possibilities. We agree on a notation. For instance, the SNFG notation. It's a nomenclature of monosaccharides that is now more and more widely accepted. So that we can describe a complicated molecule into a simplified cartoon that everybody can actually visualize easily. So this is another way of describing a glycan composition. So when you don't know. So, Hexos are here, the white circle is a Hexos and these are all the possible Hexos, so Grukos, Manos, Galactos, etc. So we are trying to shape this. Even here, the composition, we can count the Hexos by using Hex as the shortened name, but it can be H. So this is a condensed way of saying it. So there are variations on the theme, but still we speak that language. We also try to, I mean, you can see that a lot of us have published this paper in 2014 saying that we really need to standardize and systematize the way we study glycans so that computers can be helpful. So these are examples of the way you can describe a glycan. This is a linear way to do it. This is another linear way. This is a two dimensional way. This is another text, a way of doing it, another text way of doing it. And these are coexisting. They have been produced over the years. And the great thing is that the Japanese group has produced a glycan format converter so that you can actually translate from here to here to there, etc. So this this table in that paper tells you the name of the standard like IUPAC, which is linear table, KCF or Glyco CT, which are connection tables, other linear texts and the properties and so you can convert one format to the other. Another thing that has happened over the last 10 years is that if you are familiar with otheromics, you know that there are standards, especially in proteomics, for instance, there's the MYAPI standard to describe a mass spectrometry experiment. So it's called the proteome standard initiative. They have other standards. It's related to the Miami standards that was proposed in transcriptomics already 20 years ago. So we have the Mirage standard. Whoops. Sorry. The Mirage standard, which is the minimum information required for a glycomics experiment. And so there's a consortium. Also, there's a committee sort of group of several people hosted by the Belstein Institute. So this is really the initiative of the Belstein Institute that said, we need that standard for glycomics and now glycoproteomics. There are reporting guidelines for glycan sample preparation for mass spectrometry experiments, mass spectrometry analysis, mass spectrometer, glycan array. And so really, this is evolving and reproducing these guidelines so that people who are reporting experiments can do it in a similar way so that the bioinformatics can act behind that. We also share a concern for presenting and modeling the data in a common way. So we have chosen to go for the semantic web technologies because in bioinformatics, this is becoming extremely common. So ontologies are defined. So this is the model of the data. And this is very useful then to cross talk with a database like Uniprot because Uniprot is has an RDF model, and then we can query the databases in a much more detailed way. And in glycomics, I mean glycoinformatics, we are producing those ontologies. Here, this is an example of the glyco ontology for glyco conjugates that was just recently published. So this is also a work ongoing. So we speak the same language, and we can really exchange data and information annotation between our different sites. So this is the glycosmos homepage, and we'll have a quick look at what's there. So this is the glygen homepage, and I'll go very quickly on that as well. And here, the choice that we made so glycosmos was is dedicated to the Japanese resources that are in Japan, by definition, and it's really focused on glycoinformatics and glycobiology data. So glygen is also totally focused on glycobiology and trying to connect a lot with the NCBI. So that's a glygen is in a very privileged situation to be cross referencing more and more with NCBI resources. So this is what I had in mind with glycomics at ExpoCy. I was using the fact that ExpoCy had a certain reputation since it was created in 1993 became the proteomics server in 2001. Really in 1993 and up to the turn of the century was dedicated to molecular biology. And then it evolved as a proteomics server. And until, so that one of the changes happened in 2011. And this became the CIB ExpoCy server. And having here a lot of different tabs in which each omic was described. So it was not the proteomics server anymore. It was also for genomics, for transcriptomics. And so we jumped onto that opportunity to create the glycomics tab, where we were presenting the data in the same way as in other tabs where you had the databases on the one hand, and the tools on the other hands. And we published that description in NCP at the time. And until 2020, this was glycomics at ExpoCy. Then ExpoCy decided to change, which is wonderful because it's actually a much cleaner site that was introduced last year. And there's no more tabs really. You can actually tick boxes. You have a left menu here that is actually showing the different parts that were tabs before. The downside for us is that it's only concentrating on CIB resources. And as a result, we call our partner resources as we used to in the previous version. But I'll tell you a bit later today that I'm going to stop my phone from saying things. So this is really the landscape. And we have a certain division of labor. We are all concerned by data quality. We are also concerned by the integration of information. So at NCBI, but in any at EBI as well. So we have specialties, respective specialties and the specialty of the American site is going towards automatic data curation. They also developed a glyco motif database based on our contribution. Also, we had one, but they sort of centralized it. The glycosmos specialty is really going into data submission, providing repositories where you can actually submit your experimental data, whether it's glycan structures, whether it's mass spectrometry data for the identification of intact glycopeptides or glycans. And we are specializing in manual data curation. We have certainly the influence of Uniprot to thank for being in Switzerland. And we are also contributing whatever is useful to guarantee data quality. So this is a rough view of what we're doing. So what glycopilogy says is that at the level of the protein of the cell surface, you have proteins that are glycosylated and this glycosylated protein. So this is an N glycan. This is an O glycan. This is a glyconstamino glycan. These glycans are there, not just to decorate the outside of a cell. They are carrying a message. So they are read by other proteins carbohydrate binding proteins, possibly called lectins as well. Maybe a subset of all the carbohydrate binding molecules. And so we are trying to capture this information. You have also glycolipids of course at the surface of cells and I have not represented them because I wanted to focus a bit more on proteins. And since last week we also have glyco RNA. So RNA seems to be glycosylated with some N glycans. This is probably, this needs to be substantiated with more experiment but it's actually quite exciting to see that this is spreading to RNA. So glycosylated at this website has a focus on glycogenes. So the glycogenes are those genes that are encoding for enzymes that are synthesizing glycans or degrading glycans. We have also some information on glycolipids and they import the relevant pathways from reactome. So there is no restriction on organism coverage. So it's whatever is available is stored in the databases. We have also a strong focus on binding. There is a profile database. So for glycan expression based on glycan array data. And there's also the glycomatalyst, which is mapping the different tissues of tested organism with the glycan expression. So I'll go briefly over that in a minute and they have for the glyco motif, the glycoepitope database. So glycogen has a restriction on organisms really focusing for the time being on humans and rodents, animal models and viruses. I mean, we all got into viruses very recently, as you might imagine. This is the glycogen URL. And so they have a specificity on the structure browser that is called the GNOME browser. So you can actually find the related structures in inquiring about structures. And very recently, this is really new just out of the box. They have the glycogen sandbox, which really looks into also the glycogen. So what are the synthetic pathways behind a structure. And they are working on an international microarray data repository, but at the moment, this is not released and should be shortly. Then there's us, Glycomics at ExpoC, that used to be that form and which is now that form. But this is the announcement and this is the exclusivity of this presentation. We have been molding and this is the new site. So the new site where we can talk about other resources, but ours, which is nice. And one of the things we wanted to get away from is lists because lists are not telling you really what's going on. You have a list of databases is one of the most main approaches we were told when we had the form of version of Glycomics at ExpoC. You have the databases on the one hand and the tools on the other hand, but we don't really know how these relate to one another. We don't really know what the tools are related. So we benefited from the categorization in the new version of ExpoC because that was one of the main points of ExpoC as well. They wanted to categorize the resources. So we worked with the ExpoC people on the categorization of resources and we also thought that we should have some kind of instantiate in, I mean, the possibility of instantiating the lists. So here, for instance, if you look at glyco conjugates and if I click on glyco conjugates, I will focus on the bubble that describes glyco conjugates, and I will navigate I'll show you how. So my list in the middle is reduced to glyco conjugates. And I have a short description for everything. So if it's a Sib resource, I have the Sib logo, no problem. But then I have a bit of nuance in having all the different categories of tools. You have the tools in green, you have the databases in yellow and the portals in red. And here you can see that all these categories are ticked. So this is what we're talking about and we can zoom in even more if we need. I'll show you that later. But this is one way, which we hope is a bit more pedagogical in telling you about glyco informatics resources. So I'm into projects in glycomics.expo is GlyConnect. So I work with Julien Marietto, who is the main developer and really heading all the developments of GlyConnect and Catherine, who he's with us today, who is doing the curation of the data. So we'll also with everything changing in the new website, we're also changing logos, we're going to introduce the new logos soon. This is the homepage of GlyConnect. So I will of course talk about it at length later today, this morning. And this is the website. So the second main resource that we are developing is Unilectin in collaboration with Grenoble. So François Bonnardel, who is starting a new job today. So I can't say that he's not really part of the group. He's, he just left yesterday. And Jala Hela-Madi, who is taking over, but does not provide pictures. I can't show you the picture of Jala. So we are really working on having detailed information on Nectin so that we can actually have the fuller picture of glycobiology. And likewise, we'll change the Unilectin logo. And you can see that there's meaning in the logo with the glycans in the middle and the sites of the proteins. What glycobiology says now is supported by a lot of databases that I will spend a bit of time this morning to talk about, and with tools as well, that we develop and that we are storing and making available in the different resources whether glycosmos, glygine, or glycomyxadex. I have to add that we are not talking, I mean, I won't be talking about another whole world of complication, which is the glycosylation of bacteria and plants and fungi, which are a bit different. So fungi I can talk about a little bit. So there's a dedicated database, and I forgot to put the URL, I should have. This is hosted in Russia. So there is this group headed by Philippe Toucache, who is really collecting very properly all the information on the glycosylation of bacteria, a plant and fungi. And so I really recommend this, and I can't touch over that. This is probably in need of a dedicated day. And the other resource, which is not yet online, but is also a very interesting view on the glycogenes I've mentioned that I talked about in glycosmos and in glygine. So not with us because there's so many resources, there's no need to be redundant. So we'll benefit from this work. In no likelihood, we'll have the cross links soon. So this is also there's no URL yet. This is happening probably this year because it's been described at length very recently, and the website is available on a temporary URL, but not yet. So as you see, we have a lot to put together, and we really try to. And so I certainly acknowledge my group with the people I've mentioned, and others I will mention in another context. This wouldn't be possible with very active glycobiologists who have been really involved with us in particular in shaping resources and advising us. And the XC people I have to think, and Elizabeth Stiger is our middle person between uniprot and us, so that we can really have the best integration between uniprot and our resources. And we are on the on the XC side, and the main people of the Glicebase Alliance are Kyoko in Japan, Raja in Washington DC, and Mike T Meyer in CTRC, which is the most one of the most renowned institution for carbohydrate research in Georgia. So we have a bit of funding to do that that I have to acknowledge, of course, and I thank you for your attention for this part. And you can always follow us on Twitter for the updates on what we're doing. I have finished for this presentation, so I thought I would take the next 20 minutes or so to demo a little bit these different websites I've talked about. So to give you an idea, and then, of course, you perfectly, I mean, I invite you to do it on on your own. We have here the landscape. So you can see it's very rich. The genes area, the proteins area, the glycans area. And here you have the binding all the so lectins motifs and lipids are here. And you have the side menu here to show you how you can by category, have a look at the information. Of course, the most relevant for us at the moment is to be able to search glycan by text. If everything goes well. I had no, this is not going to work. We just have a structure here to show you this is the different format of a structure. So maybe this is the one you most familiar with. Here you can actually interrogate search with these linear code, the impact code I pack is usually recognized. And so, with this code. Okay, I should be able to see something so it doesn't want. I just picked that before. This is the demo effect by definition. So I'll try to have a short maybe my text is too long. So I'll get rid of this new act here. And this and see whether I have a better luck. No, I shouldn't go for a smaller structure. Okay, I'll just type something stupid then. Very simple. I still have an error so I'll get rid of my few codes. I'll just put this extremely simple one here. And if it doesn't work. Okay, so Catherine am I making a stupid mistake. It's not that it should be a condensed. I'm just checking it here. Oh, yes, sorry, sorry, sorry. I was just, okay, we good. I was just was beginning to worry. It's so you see, even when you're aware, you make a mistake so they are the different standards and indeed I should have I thought I did I pack condensed and I did I pack extended. I'm good. So no reflection black cosmos black cosmos working. And you can see here my my small sugar that is there so maybe if I redo the big one I'll see the big one as well. But no, okay. I don't know why it doesn't work. Anyway, this is how you can actually look at you can search by graphics so that means that you're going to have to draw yourself. The, the structure. So there will be a practical this afternoon for those who stay with us for doing it this way. You know, some collaboration between glygen and glycosmos so that you can use the gnome browser as well. So the gnome browser is like this should be doing something that it's not doing. No, here you go. So here, you just just have to. You just click on, you have to know your SNFG by heart so you have to know that this is a look neck so you have to look neck and then you go to manos and you put two manos, three manos and you get to your call one. And so this is one way of showing from a composition, all of these possible architectures of the glockens structure and you have each time the Glide to can identify so Glide to can identifiers are just starting with a G and then you have six digits, which is a mix of letters and numbers after that. This is for glycans. So here you have glycoms, and you can actually have a look at the glycom atlas that is there and that are advertised in the presentation. And you have here so if you want to see all the glycans that are in lungs so you have the details of the glycans that you have a look at here, and you can see that in fact it's shared by other tissues in the in the atlas. So it's a it's a it's a nice way of navigating the data and seeing where they are, the glycans are expressed, and you have a similar mouse and zebrafish data. So this is for glycom atlas. So, again, it's, it's not my site. I'm, I'm hinting on the interesting parts and what we don't have in particular so here you have the glycolipids. And so it's actually inspired from different resources that I hear you have to know about lipids classification and so in the description you can have a look at all the entries of glycolipids so the famous GM for basically keg information that you can browse from here but that's at the moment really in the making and is gathering information. So this is just a short overview, as I said so the submission are here so glide to can you can submit new structures glycopost, you can submit your mass spectrometry experiments so there's a lot. If you familiar for instance with proteomics there's pride, and there's J post, which is the Japanese version of pride, or not a version but the equivalent, I should say. And so Unicarm DR is also the glycomics experiment repository. And we, this is happening right now so we are in running in the, these projects, they are not fully operational we are still hitting some obstacles but we're trying, because to be honest, there's no funding to do that so Kyoko and especially with Unicarm DR this is a Nicholas Carlson's project. And myself, we're doing that on our spare time to to and Julien of course in my group so it's it's a bit complicated, but ultimately, it will be a service that will run more smoothly and that can be used. So this is for glycosmos. Then, I would like to talk about glygine. So you can do glycan search protein search site site search, but I. So I showed you how you explore with the with the known but you also have the possibility of building your, your own composition. So it contains, for instance, we will go for our three hexos and two, two hexos amine, and we search the glycans and fine. Okay. So it should be in and I still have some you put in the. Oh, sorry. Yeah. You see, I have not. Oh, sorry, yes. That's why we team. So by definition, there will be quite a few. And the ambiguity comes also. As you can see here, you have the very well defined linkages for this particular one so this is the best one in terms of precision. But this one, for instance, the this linkage is not specified as beta so there's a question mark so this is why it exists so these are all the glide to can IDs. And of course this matches also all linked, and you can have a linear form, and so on. And the composition version, where of course you don't have linkages at all. And these are the results of the search you can get from just asking a composition. So this is from glycein. You can. So the, the, the gnome. Exactly the same as we saw before is is here so you can have your here you see it's very quick. And you have all the possibilities, and so on. So this is a new window so of course you can look for a protein, which is there so you can take the example here. And search for it. The details for the protein where you have, you see that we work with glycein so they import some of the structures that we have from different sources that they have and each time you have a PubMed reference. So if you actually click on glyconect, you will go to the link to glyconect. And so this is all the information, which is actually important from uniprot regarding the protein, and then the from from the DB snip for the variance. And this go annotation, there's also the ligands, when there's information, and also you have a different proteoform annotation, and the pathways, and so on. So it's a very thorough description of the protein, and really with the expression, they are using BG as a reference, and other references so you have the menu here, like in the uniprot entry, and you can actually see so they gather this protein in an automatic way so they do the integration of the data so it's trying to really combine different sources in that way. So this is for a glycein view. And the other thing I wanted to mention, so they have a site search also that is possible. And the interesting parts that I mentioned is the sandbox, where is the sandbox on. So this is an interesting tool also where you can actually have your accession numbers mapped from one resource to the to the other so we want to have glyconect map to some other database. And there's no actually in that sense it would not make but in like to can so there's a possibility for to get the, the cross references from these this mapper, but I have lost my sandbox and I don't understand where it's gone. So the sandbox is really interesting because you have here so the presentation is is not great, but you can explore a particular structure. So you see this residue so it actually gives you the composition, of course, it gives you the enzymes that are potentially synthesizing those. So you can see this structure and you can actually see. Okay, you can see the, the, the different enzymes and how the, the, these enzymes are actually acting on different structures. So, again, you would need, we would need a whole hour to go through and this is the Glengen people's responsibility to explain this, but I just wanted to indicate the existence of those options, so that you can explore for yourself and needless to say with if you have any questions understanding what's going on. This is the the Glengen people are very helpful the Glycosmos people are very helpful, and I can be an intermediary, whatever. This is this is Glyspace Alliance, it works, and we talk about what we do together and share the different resources. So the last part, and my pleasure to now to move this to see something. So, our new site is here. And as I explained, you have the complete list so if we want to focus on portal we go to portals. And each time you have. So here, this is the CFG portal which hosted the resources of the consortium for functional Glacomix so we can zoom out. And so the list of portals is here with the difference so you can also see some, some reference about CFG. So there's an article that was published so it's an old portal. Quite a while ago. So you navigate in the bubbles by clicking in one, and each time you have the corresponding boxes that are ticked. And you can see. For instance, in expression, there's the glycom Atlas that we just saw. And there is another tool that we have developed which is called Insight so we have the Sibb logo here. So if you want to see the intact glycopeptide you see it's a very busy bubble with lots of tools that are here described. So the oldest glycomod, which has been sitting on XPC for the last 20 years almost. 2002 or something like that. And it's widely used. And so this is the most recent. And that is a new search engine released in 2020 to identify old glycopeptides. So, you can always zoom out here we have glygen and gly connect as the glycoprotein related. So this is the glycoprotein bubble here. And we have information on glycosides with also different resources described here. So, you see the correspondence between the two. The list that updates the categories that are ticked the the bubbles that are zoomed in or zoomed out on and this is hopefully a more pedagogical way of showing the resources in glycoinformatics so the next stage so stay tuned because because next month will have the not. I mean we'll have the information linked that is which database is calling which tool and cross reference cross referencing which other database which is the key information in understanding so these bubbles are not independent. And we need another representation to show that and we're working at it we close to the end, and you'll have this information in the next release of the website. The end of June. Okay. So, I think that's a lot already. And I'm ready to take questions if you have any.