 I want to focus a bit on glycoproteomics, because somehow this is an important part of what we're doing. And I would like to, in fact, the philosophy of GlyConnect in particular is to benefit from what Nikki Packer or Daniel Coleridge are calling glycomics inspired glycoproteomics. And as it turns out, there's a whole lot of, this is a very good review that was published a little bit ago, on the very different methods that are used to actually analyze glycans. So there's a real toolbox available to decipher the information in glycans and of course, the most common and widely used is mass spectrometry. So that in mass spectrometry, you take your glycol conjugates, you actually shave your glycol proteins, for instance, with an enzyme or chemically, so that you release all the glycans and then you actually try to separate your glycans and measure their masses. I mean, a very typical approach strategy that is used in proteomics with peptides, and so it's the same idea. And so glycomics now has reached a point where the level of accuracy and the refinement of the structure is really possible. So you can have very precise structures assigned by mass spectrometry. In contrast in glycoproteomics, you have other levels of precision, but what you have is for the glycans, you try to keep your peptides and your glycans together. So this is what is called intact glycopeptides. And you measure the mass of this complex and you can actually precisely identify the mass of your peptide. But the mass you have for glycans only allow you to identify a composition. So you can count the number of hexos, the number of hexnax. Of course, with tandem mass spectrometry, if you dig in a bit further, you can with fragments start assigning. So you have a guess, you can do educated guesses of what the structure could be given the composition and given the fragments that you have. So it's a bit of a puzzle. And this is why, in fact, the idea is to combine the power of glycomics that allows you to precisely identify the glycans. But unfortunately, since you have released from the from from where they are attached, you don't have the site specific information. So you have the site specific information, but with roughly defined like glycans in glycoproteomics. And the idea is to actually try and map the two type of results so that you can reconstitute from the same sample. You have the what is possible, what has been found as the glycomb of that particular sample. So maybe you can actually, I mean, it's not maybe in all likelihood you can corroborate the two sources. And this is what we are really interested in, in doing bioinformatically. So one of the key issue in glycoproteomics that has been solved more or less in glycomics but not completely but quite well but not in glycoproteomics is the reproducibility. And so it has been a concern in glycobiology for quite a while and so they are some challenges they call challenges are often so within that hippo context. And so there were some challenges to have the same sample analyzed by independent labs and see whether you identify the same glycans. So the stability of the glycomb was the first concern of the experimental challenges that had nothing to do with bioinformatics. More recently, NIST actually suggested also to do some kind of internal laboratory assessment to see how reproducible the identification of glycan is. So I think I don't have to advertise the power of having a challenge so a data set that is given to people who are developing software. So this example of the prediction of the 3D structure from the sequence which has been launched in 1994 and ended up in 2020 with a wonderful results that maybe some of you have heard of where there is actually now a deep learning method that is predicting the very high confidence the 3D structure from the sequence and it was a long way to get to there. So the importance of these challenges is such that each time software improves software. And even if of course there it doesn't mean that it has to be used to one single software at the moment is the top software that does it but tell you what the other ones are going to are emulated and they probably going to improve as well. So this is the great advantage of these challenges. So this is not happening and this is happening also on analytical methods and these analytical methods are very much depending on Glauco on software so Glauco Informatics software. You can see for instance that in many instances you have a pretty good reproducibility if you test different methods so you have liquid comatography you have capillary electrophoresis all sorts of separation methods that are used and you can see that reproducibility is reachable yet not in all cases. So there is still from the technological point of view some fine tuning to be done and this is taken care of by different labs. Regarding the efficiency. So we also need to praise the power of all the I showed you in the bubbles on the Glacomix at ExpoC website how many software how much software there is and it's it's actually much rooming and there's more than what I have put in the bubbles. Some of them, oops, sorry. Some of them are still not quite, I mean they've been dropped and they're not no longer maintained. The oldest one is Protein Prospector. It's actually a proteomics search engine that has been fine tuned to actually identify intact glycopeptides and they do very well they perform very well especially with all glycans. So you can see that in last year already three have have been proposed. There is a very popular one which is licensed which is called Ionic and you can see in many Glacoprogrammix paper this is the one that has been chosen and it's not to forget that you can use the other proteomics search engine like Mascot or Omsar Sequest that allow for the detection of sugars. So really the panoramic view of those Glacopeptide analysis software is very very extended and this is why Morton Tyson Anderson who is based in Sydney in Australia decided that there should be a challenge to actually evaluate the performance of these Glacoprogrammix software and the different strategies that are used with the software. So this is why he invited not only software developers but users of software because the strategies of using the software are also important to get an overview of what's going on in Glacoprogrammix and how reproducible any approach is. So the questions are can we identify the correct peptide with the correct site with the correct structure on it possibility of modification on the glycans and on the peptide. So that's a real challenge. So the challenge was set a few years ago. So they started they actually the results are just being published now. It's a bio archive paper still under revision and it will be published very soon. So but they they started already two or three years ago by selecting the sample decided that they would go for any no Glacopeptide using so processing whoops, processing the sample. I don't know what's going on here. I have something. Excuse me. So, as I said, as I said, inviting not only developers to participate but users to participate. So the idea is that Morton's lab actually did the analysis. So what they provided the participants were the raw data so that they wouldn't have to do they knew that they all started on the same foot with the same data. And it was the responsibility of the developers and the users to process that data with the tools that they are developing or using. So, they, they had those different files so they had different methods that were tested because of course the software is sensitive to the different fragmentation methods that are used so you had HDD CID and ETC ID data. So this was also part of the of the challenge. And they, they got the participant details they got the identification strategies, I mean the, the organizers got this information. They asked for which end glycan and all glycan repository database. They were using, and, and of course they had to report the identification. So there was for testing and for evaluating, they had spiked some synthetic end glycopeptides. They had to have a sort of control, and they had some very specific glycoprotein identification, and so they measured the glycopeptide coverage so they had all sorts of evaluation evaluation criteria. And then they scored, according to a number of criterion, the criteria I'm going to talk about as well. So in glycopeptide, you can see here. So the, the, the software developers are in orange and the users are in blue. You see an incredible variation in the level, the amount of identified and glycopeptide and the, the different search spaces that they used, which was so I did not check, but when you have exactly the same search space here, you know, likely could they using Bionic with the default parameters. Just guessing. And so this you can see is already a little bit of a why in terms of lack of reproducibility. The information is hardly improved with O-Glycopeptide, and you see that the, the distribution of results is quite hectic compared to what you would expect if they are doing the same thing. The, for the evaluation, the organizers decided that they would separate different tasks, so they would evaluate the synthetic, I mean, the, the, the capacity of the, of each group to identify the synthetic and glycopeptides. This is for N only, we will do the O after. To properly identify the N-Glycan composition, the N-Glycoprotein identity, the coverage of, of the N-Glycopeptides, the, and how common the N-Glycopeptides were across. And they had a special treatment for the, the, the special salic acid UGC, which is not supposed to be in human, but is sometimes proposed as a possibility of identification. So they had this idea also, they had a list of what they thought would be important for each of these assessments, and they based the score on these criteria, and so each team was assessed depending on the task, how well they performed. So these are the score and depending on these different scoring criteria that are specified in this table. So you can see it's heterogenous, that's the least you can say, and some teams that do well with one task can do less well in another task, and this is the same whether you're talking about the developers or the users. So, in the end, they try to correlate the success of the, of a software. So they had a list of properties that were, I mean, the description they had really asked the participant to thoroughly describe their approach. So whether they, they're using, for instance, I mean, which enzyme they're using to, to digest, so whether they use trypsin or they use another enzyme, so they have all these reported information for the experiment. And they try to see whether there was a correlation between the high performance and those criteria so they could strike out the importance of the protease specificity, for instance. And they could actually also correlate with the space, the search space is obviously important. And so this is a table that goes to show what seems to be important in the contribution to the score so that a method performs better than another. And in the end, they, they had this negative and positive correlation between the two types. They did the same with old glycopeptides. So same idea of assessing the different tasks. They didn't have spiked in synthetic old glycopeptides so it was a little bit more difficult in the end, but same thing. They could see that some team would perform well in one task and less in the other. They could see that some teams would consistently perform relatively well, or relatively not well, and so on. So they had different results, and all the same they tried to correlate. So they had a few positive. So here the protease specificity was more important than it was within glycans for instance, and retention time contributed. The inclusion of retention time data contributed also, etc, etc. So I, of course, advise you to go to the paper that describes all of that in detail but it's extremely telling about a number of criteria and properties that should or should not be considered and also can guide you in choosing the software. There will be other challenges as far as I know Morton is just getting out of this one but he's preparing the next one because as you could see the methods of mushrooming and already three were published last year. So they need to be part of it as well. Problem is database search like you are maybe used to in proteomics is impossible. We don't have a database, yet we need a reference database. So this is what we're trying to do with glycanect. And we have those different approaches, different types of information where we are not restricted in terms of taxonomy we take the papers. If they are interested and they have good data irrespective of the species, but because a lot is published in human glycobiology we had a we have a lot of human sugars. So at the moment, we have quite a few structures that are really annotated, our composition are also growing so but importantly, what we do is to correlate to the tissue and to the conditions of expression, because this is a key to understanding the dark oscillation. And so sometimes we have disease information. And so we have visualization tool that we call octopus that I will show you. And I wanted to insist on the fact that Morton's evaluation as I said is still a bio archive paper. We have it in one of our references in glycanect. So they decided to share with us the consensus data. So the consensus data is limited. There's only 54 composition. And so the there's only one source it's blood serum and 37 protein. So that entails 69 peptides and 76 glycosylation sites with an O or N. So this is a benchmark and I'm happy to advertise it as a really important step milestone in what we're trying to do with glycanect. So one of the things that we have always noticed, and especially with the use of bionic as I said, that is a very popular piece of software and has a default composition file. But the effect of composition of choosing of selecting the composition file was highlighted in Morton study Morton's challenge. And we have been aware of that already for a while, looking at the data and trying to curate data for glycanect. And so this is why we put together a piece of software which we call glycanect compositor. So this is where you can find it. And the idea is that in in most databases and in a glyco in a glycan composition file like you find in glycanect, you have a set of compositions that seem to be unrelated because they are just a list of independent item. However, you can see that in this simple example, these compositions are not unrelated. This composition is related to this one through the addition of a few codes residue. So it can be DX, if we're not talking mammalian sugars. So what we thought was that we should not give a list of composition, but a network of composition. So here you have different fucosylated composition linked to one another, just by adding a few codes. And so we can generalize that with any monosaccharide, we can take a list of composition, and whether you add a few codes, or you add the hexos, or you add a Hexnac, or you had the psilic acid, you're just going to have in a set of composition, which composition are related to one another. So a glycomb, according to glycoproteomics is going to be a set of composition. So this is for instance what Bionic proposes, although I think they use fuc instead of DX, but anyway, it's, it means the same in that case. And so a glycomb can be a set of composition at the level of a site. It can be a composition at the level of a protein, and it can be a set of composition at a level of a tissue. And the more you try to see whether there is connectivity between those different composition, in fact, the more consistent your data set is going to be, because all we're doing is roughly approximating what the glycogenes activity is doing by synthesizing a glycan, because glycans, I synthesize step by step. So one glycosyl transferase is adding one monosaccharide at a time. So it would be totally absurd to expect that if you have a full glycan with a set of monosaccharides that compose it, that they would just randomly be put together. So, indeed, there is a hierarchy, and this is what we're trying to capture by looking at a site at a protein or at a tissue with the set of composition that we are given from a glycoprotea mix experiment. So, here is an example, for instance, of what is in glyconnect with the protein which is called the tissue type plasmidogen activator that has at least three sites, glycosites and sites that have been identified where you have structures. Here, when you have a number, it means that there is actually two structures defined that have been associated with that composition. Yet, when you see that you have a very disconnected graph, and we were wondering, well, this is strange that they would be so far apart, and we introduced the idea of a virtual node. And you can see that if we re-establish, so maybe there's just one missing step. One missing step because in the identification, the mass of the intermediary structure was so low that it was not picked by the program, or maybe it was not expressed. God knows what the explanation is. Yet, we're thinking of reintroducing some consistency and you can see who would have thought that in fact all those parts were connected just because there's one monosaccharide missing. So in this case here, to go from here to there, you just need, I mean for connecting these two, you need a hexose or a hexnak. So you re-establish some kind of consistency of the glycomb associated with, for instance, in that case, a protein. So for 33 composition, we need six virtual nodes. How is that, I mean, if we choose to customize a glycan composition file, then we're going to advise users of Bionic, for instance, if they are focusing for some reason on the tissue. Plasminogen activator, we would suggest that they take all of these composition as a default file. And we can actually more and more prove that the virtual nodes are real. Here is the glycomb for human urine that we have accumulated over time. So six months ago, we had 75 composition. Then in the last six months, we've added more papers in the database and we have actually reached 84 composition. So each time we were doing that composition graph with nodes for composition and edges for the addition of a monosaccharide. And I just took two excerpts. So this is an old picture. So the resolution is not as good as the most recent picture, but anyway, you can see that I've mapped this composition here, which is H3N3F1S2. So three hexos, three hexenac, one fucose and two sonic acid. And I wanted to compare the environment. So in this one, I have highlighted the virtual node in red. That's another matter altogether. And in this network, the virtual node are in gray, like I showed you before. So if I do the correspondence, which I did for you, because it's a little bit complicated otherwise. We can see a number of things. Sorry, it's a bit busy, but we can put these two in correspondence. We can put these two in correspondence. We can put these two in correspondence. And what you see, for instance, is that our new paper is actually bringing a structure that wasn't there in the previous one. Same with this one, same with this one. So we had a really cool paper from Katelyn Medzarski. And we could introduce all sorts of new structures, because it was glycomics experiments that we were mapping. And then there's this environment here where you see that H2N3F1S1 is here now with three structures instead of one. And all the others, well, they are still without structures, all the same amount. And here we have no structures that came. And here a virtual node in the previous discussion that, in fact, is a real node now with a structure associated. So it was not so crazy to suggest that just one monosaccharide was making the difference, but you could connect those nodes and now it's a reality. But unfortunately, for the second one we could not, it still remains a virtual node, but maybe when we reach 95 compositions for urine, we'll have this sorted out. So you can see that it is possible to assess the consistency of a glycomb through those related compositions just by mapping them and seeing with one monosaccharide difference what it does. I have another example of how, so this is my old urine, sometimes we have shapes that funny, we have animals, we have fishes, we have all sorts of funny things that crop up in our graphs. Anyway, that's anecdotal. I want to, so this is the same node that I can, this is the same network I was showing, so this is hex3, hexnax3 and so on. So when you, in glyco, I will demo compositor, but each node has a tool tip that tells you you can click on it and get to the composition. So if you click on this composition, you have a GlyConnect page that opens and we can use the annotations. So in GlyConnect, we have the information, oops, sorry, we have the number of references. So for this composition, we have five references that are associated with this composition. And so we have a number of proteins and so on. And what I was interested in looking at the composition, so this is reported in urine. So it's part of my urine glycom here. And I have some sites that are reported here. And this is my fourth reference. It's a glycoproteomics experiment, whereas these are coming from references here, so one, three, four, five, two, so five here, which are glycomics experiments. And what is interesting is if you look at the source, so you see that it's a fucosylated and silylated structure. But you can see that in the differences here, your fucose is either alpha three here or alpha four here. So that is really, that can make a really big difference. We are not sure. I mean, certainly in terms of recognition, you can see if the fucose is on one side or the other, it's making a difference. And then we look at the source. So here, this is expressed in granular site. This is expressed in neutrophil. Whereas the upper fucose is expressed in urine in that case and is expressed in amniotic fluid in that case. The idea is that you see a clear difference between the different expression of these sugars and the position of the fucose seems to be correlated to being after three if we're talking blood cells and being upper if we're talking urine or amniotic fluid. So I am not making any conclusion. But what I put forward with Glyconect is that you have a possible hypothesis. You have, you can make an assumption and say, well, in my sample here, because it's urine in my glycoproteomics experiment where I have only the composition. So here I can guess that my structure would look like that and that the fucose would be located as alpha four and not alpha three. Again, it's a, it's an assumption. It's a guide. We are not pretending that Glyconect is giving you the answer, but we are making suggestion and providing food for thoughts for people interpreting data and interpreting the glycoproteomics experiments. So this with a compositor, we have been trying to assess the Glycon composition files that are used in software and for instance, in a study on prostate cancer. The authors have used mascot and they have used a composition file that you can guess from there is actually extremely regular in the sense that they have mechanically said, we take the, this is end glycan. We take the core and glycan with the two hex neck and the three map nodes. And they actually build on that. So they add mechanically one monosaccharide at a time so that they glycan composition file is a very tight mesh of fully, almost a fully connected. Some large glycans that are there that we can actually hook in that area of the of the graph. So this is a very regular data set. It's no no judgment there it's just an observation. I've looked at bionic bionic used in the cerebrospinal fluid in cancer and in some cell secretome. And so here, bionic usually the default site set is 309 I think end glycans so we're talking and like ends again. Of course it's a it's a very busy graph. And you can actually see that the different usage of glycan X so we have here that the main properties of the of the glycan composition, whether it's all ego mannows or high mannows or neutral or few consolidated. You can see the bar chart that corresponds to each. And so here we have the intersection in magenta and gold the intersection between A and C between B and C there's absolutely no intersection and ABC is common to the three sets so ABC is probably the most regular and glycan that you expect to see in, in a sample, and you can see that the magenta, which is the common to the CSF and the, the secretome, they're sort of central and going a bit external here, whereas the nodes are sort of located in one particular area. Again, no real conclusion from there, but trying to see so you see that we need 35 virtual nodes to actually complete the composition set of bionic. Maybe they could actually include those 35 virtual nodes, we could provide binary users with the 35 virtual nodes to add to their. But anyway, what I'm getting at there is that each time you're going to do a glycoprote mix experiment analysis and that you're going to use a composition file. It is very important to do several testing with that composition file and to see, for instance, if you select. If you doing a urine sample, and you select only the urine composition that we have in like connect which of course is not comprehensive we fully aware it's not comprehensive, but at least you'll have something that will be that will guarantee that you find what you expected to find. So that could be your bottom line. And then you add further some composition that are not in that set to see whether you can extend. And this would be really important to to assess the quality of identification and prevent a lot of false positive that are identified both sides. So I'm done with that part. I will go for a demo, but I would like to give you again. Another little break before I start refilling your brains. And I will start with showing you simple things that we can offer. So we'll share my screen. So this is protein entry of uniprot. This is famous fat win a which is has been known to be glycosylated for quite a while. So it's a glycoprotein. There's not even keyword glycoprotein. Anyway, if we go to the ppm section, you can see that it's annotated and it's actually annotated as glycosylated with a history here you can see that you have these are rare. There's a few hundreds of these. This goes back to the time prior to uniprot when it was Swiss plots and glycosylation was actually annotated in conjunction with a database that was called glycosuite. And these were the accession numbers of glycosuite DB and glycosuite DB does not exist really but yet it's reborn in like connects so we have it so we have kept those sort of historical link. So you can go directly if you click here to a special section of glyconnect with the information on that glycan. Well, this is it. So you have the calculation information, but as I said these are rare. So it's a bit anecdotal. So what is interesting here is that you have the ppm database and now you have the systematic cross referencing of glyconnect and glygen and recently the information about the number of structures that are there is specified so we have two insights and two sites. We have 60 and glycans and for all glycans on these different sites and the I'm not quite so they have a legend has 15 sites so 29 and glycans on two sites and 11 only glycans so on 10 sites. So you see that we are not necessarily taking the same the same data and we are complementing each other and that's great and this is why the two resources are there. So if I go to the link here, you can see that this is the this can happen to you. So there is here in in this area here, the possibility of choosing the SNFG but we also have the text of this is the IUPAC condensed and we can click on SNFG and we'll have our usual contents. This happens if I'm not quite sure to be honest, but there's something about using the browser and for the first time I'm going to the site for the first time or not or whatever. There's some re-initialization also of the process so it happens. So you can see that in glycanect we have so the information on the taxonomy, we have the information so here this is the protein, we have the sources so we know that this protein is found in blood but it has also been found in different other body fluids or in a gland and it has been associated also with some diseases, cancer most of the time and we have 15 references that are talking about this. So all in all, four sites as we said, two N sites and two O sites and in some glycoporium experiments we have the peptides that have them. And you also, since there are so many references, so each time you can know which reference actually has identified which structure. We can click on the structure and sometimes you have two references that have seen it, sometimes you have more than two, no, not really. And what is also an important part of it is that you can link the glycoporium so for instance this was actually resolved not very precisely but precisely enough in the sense that you know at least the beginning of the structure and it was identified on this site but there's a glycoporium experiments which is listed below with composition so here for instance this composition is from identified in many large scale so these are large scale experiments and so it's suggesting that there are two sites for that and so it suggests a structure which is up there because it matches the composition. And so and it has been identified before so this is exactly the example of what I said before in the presentation about matching glycomics and glycoporium experiments so glyco glycomics inspired glycoporium mix and you, we can navigate from the two types of experiments so go back to this structure here. And it's, or yeah there was this one, and we have a suggested site, which is more than the site that was identified simply because the glycop, there's one glycoporium mix experiment that says that it was also seen on that site and then you have the peptide to back this information in this case. So this is the sort of information you find in a particular protein page. We can navigate all across so we can go and see the milk, for instance. Here we have all sorts of different, and we can see that we have many proteins involved in milk. So I can go and see another protein that is also in blood, and we'll see another profile in, this is slow. And we have the same sort of thing so here this one may be a bit more interesting because we have on the side, a structure that we can have a look at, and you, if there are more, we can see more. So here we are using this light mole software which is now used in Unipot, and it's also in PDBE, and it is here using a plug-in of light mole that allows to see the SNFG 3D connections. So you can see that in that particular structure, we had a few questions, it could be this structure, or this one, or this one. So there's quite a few, and this is it. So we can also investigate, so here there's quite a few structures. Some glycoproteomics experiments at the end here. And each time you can also, of course, click on a composition and see what are all the structures behind that composition, which protein they're seen on, in which species, in which tissue. What are the references? We have sites, so end peptides, where you can actually see all of that. Very recently, no later than yesterday, Julia has actually added, so if I go and see the proteins here. We have, we are starting to give the user the possibility of focusing on n-linked or o-linked and non-mixed the two, because sometimes it can be a bit confusing. And we are aware of that, so we've decided to make this actually upon the request of one of our favorite better users who is Daniel Kollerich. So, you see, we are trying to be responsive when people suggest we should do things. So this is for looking at the data in GlideConnect. I should also say that any of the data, if you are interested in exporting, you can export in CSV all the information that is in that. Page, so regarding the taxonomy, the protein, etc. You can also, here, if you want to copy the composition, because this is exactly the glycomb you want for haptaglobin. So, you copy the, in the clipboard, clicking on this, and this is it. This is done. So that can be very useful for, I mean, we're really trying to help the building of composition. Another thing I wanted to say, no, I said the 3D. So I think I, I've covered more or less, and so you see you can change the different views. And we have, of course, cross-references, that's what I wanted to say, we have cross-references with Uniprot, of course, with NexProt when it's a human protein, with Glygen. So Glygen points to us, but we point to Glygen. And we have gene cards because of also for gene cards points to us in terms of information for human genes in this database. And we also always have the link to compositor. So if you instantly want to see whether your glycomb, the glycomb of your protein is consistent, not consistent. Let's look at the end link here. So it tells us there are all these sites you add to selection so you can include or not include the virtual node. There's 86 compositions associated with Haptaglobin, as you can imagine it's a well studied. And here is the Glygen. So you have here, you can see the distribution of properties. So there's a majority of fucosylated that are there, located there. The neutral are also in the extremity and the center and is rather isolated. So if we include the virtual node, we'll have a slightly more tightly connected network and we see how much we need to complete the picture. The network is not working. The compositor goes faster than that usually. It's only because I had to migrate. You need to recompute the graph for the link? Yes. I think you didn't click on the button. No, no, it's, I can see the wheel spinning. But it's a little bit of a technical issue. Here you go. So you see that the graph is a little bit more connected. I have three leftovers here. This one is always a bit leftover because it's, it's not one that I know very well. Well, this one is a bit, this one is only one hex, one hex next. So it's a little bit lost. And this one is a lost cause. I don't know why, but that's what it is. So what we have also with the compositor is that as you could see I showed you each time you can see on the node the structures that are behind each time you have a label. But we also look at the paths that are connecting the nodes. So for instance, if I want to zoom on a node, so I look for the road with the route, which is here. So, sorry, I have to, I have located the route. So now I zoom out to see my complete. So we're using, we're not responsible for the shape. We're using a library of graphs. And so you can see here that most, I mean the vast majority of the nodes are stemming from this one, which is normal. This is the core and linked. And so all the end links that are derived are coming from this one, except for this one, which is funny. And obviously, it's an independent one that we have. So we have two routes for that graph. And this is quite unusual, but that happens. And otherwise than that, everything is reachable except for that little path here. And all the same, you can see that if you have a terminal node somewhere, like this tetraantenary here, you know, it's actually, you know the path that is followed. So there's not so many paths as a unique path in that case to reach this one, which is not necessarily the case for all of them. So here you see for this one, you have several possible paths. And all the same for this one. This is, there's a unique path. So you can actually have a look. If you want to see, for instance, all the fucos elated structures, you can also mouse over the plus F, and then you'll see everywhere fucos elation happens, but you can also see all the fucos elated structures from here. So you have different ways of considering this graph and looking at it. So I will go back to compositor, but what I wanted to do is from the homepage to show you the homepage as I showed you. So we'll go through the dedicated datasets tomorrow. I'm not going to talk about that, but to talk about the octopus. So the octopus was actually defined or built to help people who have very limited knowledge of glycans and they are not they couldn't be bothered. Drawing a glycan and interrogating database and things like that, but they know because they read the literature and they see, for instance, a glycoporium newspaper and, and the comments of the authors is that they have a complex glycans. And these are the categories of course, and for instance, they have a lot of bisecting glycans. So you don't know what bisecting is, you don't know what really a complex and so you see that vocabulary of, of, and you can put non-silated, but in fact you change your mind so you don't. So we just search glycan acts with these terms. So we end up with a big, we limit to Homo sapiens at the beginning, but if you want to see any other species, you can do it. I could actually, and what, what, anyway, I, I think I had a less populated octopus, but I lost my sheet of paper where I had this, sorry. And so here I see all the proteins that are so I can zoom in a bit, have a look. I have the composition in the middle. So I have my proteins, and that gives me me and an idea of the density of information in glycane act. And so here, each time I see all these bisecting. And I could actually, we are in the process of having categories here where we can have, we can ask for mammals, as opposed to only Homo sapiens so we can see different things. And the idea is that I see a high density in some some proteins here. And if I don't want to have composition in the middle I look at the tissues. So already it's a little bit easier to see that I have tissues that are particularly so this reflects of course the bias of the database. We have a lot of blood expressed glycane. But we can see that, for instance, which are the proteins that are expressed in the kidney. And we can see the, which bisecting group next are in there and we can browse. And of course, each time if I'm interested in seeing what's in the placenta I click here and I get to the protein and this is the, the hormones here that are heavily concentrated. So, I go back here and I want to see, for instance, I change and instead of wanting to see the structures I want to see the sites. So, I have the sites and the expression. So I can see that I have in some cases I don't have sites. I have obviously as in association but I don't have the sites. So I can also check how well defined my, my proteins are, and I can also try with disease. And I have this information as well. Go back to composition and see the connection how densities with composition and so on. So this is a starting point for exploring the data. It's not actually giving you clear results. And then, I mean, although we're looking at having an export function to export all the compositions if you're interested in having them from there, but that is one thing we can do with the octopus. We can do a bit of all links. And for instance, we look at core to, and instead of looking at properties that are there, I would look at determinants. For instance, I'm interested in all the, the, the old link structures that have core to and have a ligand that they contain a sub structure which is a ligand that is there. And I have this answer where I can do exactly the same instead of composition. I'd like to see in which tissue it's expressed and what's the density. You can see that I have a lot in there all linked with this particular determinant here and we can see that they are many structures associated with this here. So again, just as an example of what you can do to assess the density of information in GlyConnect and also query GlyConnect with very basic information as often expressed in the, in the databases. So, one interesting protein that we have because it is attracting a lot of attention. So if I go back to GlyConnect, and I just ask the protein here. And I have heritropoietin. You can see that with heritropoietin we have 18 references. The most recent have been added in 2020 and it was another large scale. So, again, heritropoietin has served us very much in establishing the reality of virtual node because each time we have new black hands with new papers and new composition and we validate. So you see here, this is what we have and you can select. So I've done already and I'm not going to. Oh, no, I have not done already. It's lost. That is not a good idea. So let me see. I had prepared stuff and it's gone. This was the other one. Okay. So, let's look at the end links. And what I was thinking is actually to show you the difference between the undefined. So here I can look at undefined. So you see here, in my first selection I have heritropoietin with the three well known sites. So assigned sites and I have a 29 composition with undefined. That means that they cannot be assigned because the experiment could not assign them. So if I compute the graph, which is going to take a little bit of time, unfortunately, because I'm a bit too far, I have to move closer to my source and it will work better. Okay, so we have a very busy because I mean 129 is quite a few. And you can see the where actually there's a possibility of assigning some sites because there is an overlap. There are 10 undefined that are not comprised in the glycoporium experiments. And so unfortunately, we still cannot map them to assign them to a site. But there's all the others. There's at least so that's one third we cannot and two thirds we can assign them to a site because there's an overlap between the site information and the composition and the possible structures that are there. So, you see, we need still quite a few virtual nodes. So, in some cases, they are not absolutely required, because here, for instance, if I, if I don't have this virtual node, I will still have the connection of of these nodes together so it's not destroying the idea is, I mean, when you think network biology, the, you always look at a node, if you take it out of the network, is it going to make the network collapse or not. Will you break it into parts or not. And this one will not. But this, this one, which is here will, because if I get rid of these two, then these cannot be connected in with the rest of the graph. So this is the, the role of these two. So why aren't they, you can see the massive structures here. Here you have a very big structure, tetraintenary with 12 Hexos. So it's a really big one. And, and that's why maybe into intermediary structures are hard to detect if they exist. So that shows you how to use compositor with a particular protein and even looking at the different sites of the proteins and what you can get. If you can guess from a certain type of information versus another. I could use again compositor to do a comparison at the. So I'm not going to take heresoperatin anymore. Moving back to the top of the list, where I have very similar proteins, but yet encoded on two different genes that are was my alpha glycoprotein one that you can see here has only n linked. And I have glycoprotein two that also have similar sites, yet one extra in position 88. So I can compare site by site, I can compare all sites together. So I start with comparing all sites together. This is not a reflection on the tool but unfortunately on my poor bandwidths. So you see that they have a lot in common, yet from the distribution of one or the other you see that there is a strong bias to salillation with the first protein compared to the second so most of the blue nodes disappear if I do that because the blue nodes are the first one. And there's a bias towards salillation. So that can be observed. And then for the second one, we have more neutral ones. So then then for the first one, you see the neutral ones so most of the red nodes that remain non common so all the magenta nodes are common to the two. So the red nodes that remain. In fact, you can actually always pull on things to when they they sort of stuck together. It's always easy, a bit easier to this is as I said, automatically generated, but it doesn't mean you cannot adjust things for yourself. So I go back to my neutral here you see that all the red nodes disappear under the the color because the remaining structures here are neutral. So you have the possibility of comparing like this, but you can also decide. So we stop again, start with like a protein one, and we're going to select only. So I unselect everything and select only 72. And I have 26 glycan for this one. And I have here 72 as well. And I can. Sorry, I forgot to add to selection. I have this one and selection I have only 19. So, whether I find again this bias towards salilation in one and neutral to the other is a possibility. I have salilation for that site in particular that is high. And the only ones that are not salilated are these ones they the neutral ones that are not in the second one. So, you see the tool is really very convenient. And we can. You have different tabs so you can see here if I get rid of that I'll have the urine glycone that I said before is there so I have it in the homo sapiens. I don't look at everything but I would like to consider the old linked and these are the 84 composition I referred to. And this is another animal now that we have with the. It is but this is the the urine glycone and you can see the structures associated and so on. So, regarding all actually I had quite an interesting example to show you. I go back here. Most of the proteins that you see, of course, have a unique process session number, but in biology, we are stuck with very difficult proteins like. Musins. And so there's a lot of very interesting guy comics experiments that are made from from a musin. And musin, as you know, is the main component of mucosa, so I will show you if I go to source, I'll go back to the musin, and I take the colonic mucosa here that I add to selection. Then I take my protein here that I add selection so you see I have 102 composition associated to those unspecified musins. And then I go back to here and again. I want to have a look at the pulmonary mucosa where I have 82. And I will compute the graph of these unspecified musins and whether I can locate them in terms of which tissue they seem to belong to. You can imagine that the colonic mucosa and the pulmonary mucosa are going to be very different. So, can I see these differences at the level of my graph, which will come to not despair. This is not a demo effect. This is a poor bandwidth effect. Here we go. So we're more fishy in this one. And I can move this a bit so that you see everything. I can only move one at a time. Sorry. Here we go. So, you have a full view on that and you can see again that the bar chart is really showing different distribution for the properties. So you will obviously have a very fucosalated musin here. And you can see that you have a very silalated colonic mucosa and again a rather highly fucosalated pulmonary mucosa. So, as a result, if you really want to focus on which part, the graph is rather partitioned. So the black ones are those that are common. So, you know, you're really regular all linked sugars with not much specificity. So they are all common. And this is really the smallest and everything stems from there, obviously. And here, if you look at these different, they are a bit more specific determinant here. And if you look at these ones, they are a bit longer. So in the colonic mucosa, you have sort of elongated structures like that. And here, you have very fucosalated structures all around. And you can see so the partition for the pulmonary mucosa is visible and the colonic, which is around there. And finally, some other musins left here with not much to say about. But there's hardly anything common between the unspecified musins and the colonic. There's absolutely nothing between the colonic and the pulmonary. There is, of course, obviously a lot of these musins were from the pulmonary mucosa. And this is what we have in the end. So the idea there is to try to, again, assess the consistency of a set of composition. And if I were to actually study colonic mucosa or pulmonary mucosa, I would start with a set of compositions that seem to be specific. According to these data. I think I will stop here. I would love to take some questions. I've been speaking a lot.