 So glycan binding as in fact, I realized yesterday that I spoke about mass spectrometry as if all of you had training in mass spectrometry, and maybe you have not. So I apologize. Somehow I was imagining that people who are interested in in glycans and glycobiology would probably know, I mean about proteins and then if they were into proteins in all likelihood they would know something about mass spectrometry. So if I went very fast on that, I apologize. It might have been an oversight but then maybe you can catch up on that side, because we are talking x ray crystallography NMR screening. This is a completely different world. And when I talk about screening, for instance, there are such things as glycan array. I mentioned that yesterday. They, they are printed arrays with glycan molecules that can be tested. And so there's all sorts of possibilities that are listed that you can you can assess with with this sort of technique and this is actually quite a challenge in glycobiology, because this is what we've been preaching through glycoinformatics informatics for the past decade that people have a different view on glycans, depending on the experimental technique that they use. And in a way, those people who are completely into glycan binding screening are not necessarily interested in the fact that a protein is glycosylated. It's not in their immediate concern. So the field of glycobiology is very partitioned between those who look at the glyco conjugates and what decorates the glyco conjugates, and those who are actually looking at those glycans as ligands, and that's all they are. So it is arbitrary in the sense that functionally, it doesn't make any sense, not to consider the three types of molecules, the, the, the glycoprotein, the glycolipid, maybe RNA we don't know, the glycans and the glycan binding molecules. They are all one and the same, I mean part of the same system. And it's only because of technology that they are separated. But otherwise than that, biologically speaking, it makes no sense to separate them. And so I'm opening the door to all sorts of other problems, apparently, yet, biologically, they are absolutely on the same line. So I wanted to make that point, because it's important and it's a challenge. We are really promoting glycoinformatics as a means to reunite those different communities that are apparently not talking to one another that much. They are, but not enough. So regarding the screening and the, the development of the microarray technology for glycans, there was in the US, a strong consortium that was the consortium for glyco, glycomics, functional glycomics. I mentioned that in the portal yesterday, when I was talking about the new website of glycomics at ExpoCy, there was one portal pointing to CFG, so I'll just show you and maybe explain why it's not active anymore. And there's another source of glycan array. So there is this lady at Imperial College, Prof. Ten Feizi, who has collected so she has an approach which is to customize glycan array. So the functional glycomics consortium were providing glycan arrays, kits of glycan arrays, so already ready to use glycan arrays with the same glycans that, I mean, it evolved a little bit, like about 600 glycans on a plate, and anybody could actually test these, these arrays. Ten Feizi is really, each time she has a sample, she's composing the array accordingly. So that's a very different approach. And she had software that was developed in-house and now she's trying to support software that could be used by other people. And as far as I know, the paper is not out, but I'm not quite sure. I checked and didn't see it. But that's the name of the software for related to this library. And then there's recently a tool which is called GLAD, part of the glyco tool kit, and I'll demo because it's a very easy interface that you can check also with glycan array data. I still haven't quite gotten what format they are processing, but this is detail that can be addressed if you're interested. So there's also another category of molecules of interest when we talk about glycan binding, which are the carbohydrate binding modules that you find in a special section of the carbohydrate active enzyme data. And then there's another base, CASI, which is much more known for its classification of glycosyl transferases and glycosidases. But this is an important part. And this is called glycan binding, it's called adhesion to carbohydrate. And these are different molecules because they are usually just a domain part of an enzyme that is binding the carbohydrate. And it's not as if it's actually reading the glyco code like lectins really trying to to get the function out of that. Here it looks more technical for the enzyme to work it has to hook on to something and this is how it works. This is also a classification and there you can find that on the CASI website, which is a bit old, but has a lot of content. I would like to go back to viruses because this is actually quite a good example in terms of a glycan binding and especially those glycan arrays. There's a lot of results that were obtained by testing viruses on those glycan arrays. So the CFG data, there's a lot of flu virus testing, same with 10 phasey glycan array customized, she has produced quite a number of interesting comments on glycan arrays for the flu virus and on the behavior of the flu virus. And actually, I think if I'm not mistaken and maybe Catherine can correct me, but the the famous difference between an alpha three or alpha six linkage for the salic acid at the tip of the hemagglutinin is and that explained differences between the human flu and the avian flu was detected with a glycan array. So this is why I'm back to viruses, not because of the pandemic. Okay, so in, sorry, I'm here on on a page on the viral zone page, where I mentioned before viral zone is this encyclopedia of viruses, and we have worked with them so that we would have at the level of interaction. And it says that the interaction is via salic acid. And so there's a link to a database which is called sugar bind, and that we actually develop. Whoops. This is a record you can't read so that's much better of sugar bind so we have sugar bind I have to be give you a little bit of history of that sugar bind was a database developed in the US. And they, the purpose of this database was to collect information on ligands, ligands only so we are talking about looking at the cell surface of the host, having receptors, glycostalated receptors at the surface of the host, and having the virus, recognizing the binding to the sugar the glycans of that receptor. So this is, if you remember just an hour ago I showed you the animation of the virus entry via sugars so it's really those sugars we're talking about and the ligands at the tip of those sugars. It's a whole different story to consider the sugars that are on the viral protein, and that also play a role, but they are, they would be recognized by antibodies or human lectins. And at the moment, I'm not in that scenario. I mean the other scenario of the virus recognizing host glycans. And so sugar bind was developed to gather this information on human ligands is in all pathogens. So that means that at this stage, the hemagglutinin is considered as a lectin because it's carbohydrate binding. And we have now you familiar with the color code, the protein, the tissue, the structure, the ligand and the disease that is in that database. Unfortunately, sugar bind is not very alive. We've improved it. It was basically a super duper Excel spreadsheet when we got it. And we made it into a decent database cross referenced with other databases and and being also cross referenced for instance from viral zone. And we are a bit short of cash to to put more data into it. What is important is that each ligand. So we have the SNFG cartoon, you can actually have some results of the SFG CFG sorry CFG like array data so you can see here with the stars. You can choose the affinity of each of the ligand on H3N6 in that particular case. So this is the swine flu. And you can see that there is a preference for some glycans some ligands, and we are chasing that. If you click on the record, you have some information on the properties, of course the disease. And what we have glide to can identifiers as well. And what is interesting and possibly connecting is the structures. So this is a substructure of a fuller structure, the full structures are in glycans. So the idea is to be able to look at what full structures contain that substructures in glycans. So you can see here we have agents are this is the taxonomy. So these are the passenger agents. And here the corresponding lectins. So if you click on those. You get to the full structures that contain the motif that was actually the substructure that was isolated. So that was the ligand for these influenza virus. And when you actually consider one particular structure here, which is an o-linked, you can see the glycanet entry and you can actually see that this is in the mucin and in the respiratory system. So this is the sort of thing we wish we could do that with far more pathogens, but we are sort of not able to do more at present. So all of that is also based on a substructure search. So each time I give a glycoinformatics training, there's something that has broken in the meanwhile, and that needs to be updated. So last year the octopus was not working, composited did not exist. And so I had to justify the fact that octopus had a previous version that needed to be updated. It was in the process so I couldn't demo the octopus. This year, we have reached the limits of the substructure search. Our substructure search needed a real rethink so that we can actually also map substructures in compositions, for instance. And it would be interesting to see if you have a composition and if you have a substructure, what are the potential correlations and so on. So this is in the making and I can't demo it and it will be for next year. Anyway, this is just to explain why maybe if you try sugarbind and you click on substructure search. I'm not absolutely sure it works. It's pre calculated so it should work but there are issues. Now, I will focus on lectins. So these are my favorite carbohydrate binding molecules at the moment, and they are really part of that picture I keep on showing you at the cell surface. Classification of these protein is not easy. And people have tried so, for instance, to look at the specificity of the lectins as a criterion for classifying them. So you can think of different types of functions, you can think of sub-cellular locations. So these are the mammal lectins, for instance, you're probably familiar with galectins and selectins and calnexin, etc. Yet, until recently, it was not clear which were the best criteria to actually organize the knowledge around lectins. Often lectins are multivalent. So that's why the specificity is challenged. They are also making a lot of, I mean, they need many subunits to be functional. So this was dropped. And finally, the idea of choosing the falls as the first. So having a hierarchical classification that would start with what are the different falls that are available to lectins for binding carbohydrates. So this is all the inspiration of Anne and Bertie, with whom I work very efficiently and have for the past four years in putting this project together. And so she suggested the hierarchy. And so the idea, and we also looked at what is done once you have falls, you have a lot of structural classifications that are based on fall like CAF, which is developed in the UK. Scope, and so on and so all of these are starting from general falls and seeing then if you can find subgroups in each of these falls that are similar. The challenge for bioinformatics is that at the level of the amino acid sequence, the level of similarity is very poor. So the similarity is at the level of the 3D structure, more so than the sequence of amino acid. The basic approach of CAF is to say that if you consider a particular fault, you can actually have some subclasses of the fault, depending on a certain percentage of identity which is above, around 20%. So it's up to 20%, I mean 20% similarity. And so here you can see that this is a sort of reference beta propeller and you have variation on the theme with these propellers, for instance, containing a number of different blades, repeated blades. This is why they really look like a propeller. And we'll see in the demo how this applies in the case of lectins. And then once you have those subsets more or less well defined, the idea is to again go down in the hierarchy and look at homologues. So those that have really a high percentage of similarity to group them together. And usually they are grouped by species, because within the same species or same film, they tend to be very similar. So, unilectin was launched in 2019. So that's relatively recent. It was based on lectin work that Anne had actually done before and that was part of another portal which is called Glico 3D so really focused on 3D. And so we started building unilectin as a platform with one module, which is called unilectin 3D. That now includes over 2000 3D structures of about 500 different lectins. So this is the current contents. Anne keeps on chasing PDB and adding some information as soon as she sees something that is of her interest. And as I mentioned yesterday, this was developed by François Bonardelle, who is now not in the group since yesterday. So, here, the numbers for what I explained just before is that she listed 35 folds and each of these folds are hierarchically connected to 109 classes. I didn't put the 109 classes there, but you can see that if you have this alpha beta OB fold, this is the class for the cholera toxin for the pertussis toxin. So you see that it's really the toxin area. So you have families here. So this is when you have 70% or more sequence similarity. So this is the hierarchy, 35 fold, 109 classes and 350 families. So, you see that here, if I look at the beta sandwich, and I have the garg-tose binding domain like, I have F-type lectin, H-type lectin, C-anamon lectin, and if I look at F-type lectin, I have also variation on the scene. So this is really the basis upon which we're working. So we are, of course, not, but we are prepared to have more folds being included. This is part of the game, creating more classes and possibly more families. This is certainly going to increase. So if I, I think there's a bug in my presentation in, I had something about bacterial folds in particular to illustrate for instance what's happening here. And for some reason this morning when I was repeating, it disappeared. So never mind, I'll show it to you live. In newly lectin, we have this, we're part of this big program which is called glycoalps. We have over the past three years come up with a number of modules. So as I said, we started with unilectin. Then we focused on the better propellers lectins that with the way we were organizing knowledge, were making sense with a classification based on the number of blades and we're going to have a look at that. And we have also only recently, a few, a few likes at the end of 2020, released Lectom Explorer, which is really gathering all the possible lectins that we could find according to our prediction, which of course I will explain in databases. So we are in a bit of a vicious circle. And this is what we're trying to break with Lectom Explorer. Lectins were poorly classified. So lectins were poorly recognized. So lectins are poorly annotated and lectins are absent of genome annotation. This is a sad observation of the situation. So we thought okay, if we are able to actually get the classification, if we're able to tidy the classification, make the prediction at the level of each class, which is precisely defined, then we'll be able to annotate lectins in complete genomes in a much more efficient way. And I hope I will convince you that's what we are in the process of achieving. So before I go to the prediction part of the talk, I would like to go to the site so that you have an idea, but I also wanted to show you a bit of the CFG before. So I go back in time a little bit. So you see that the websites, I think some pages were updated a bit more recent than 2010, but hardly. So it's still alive, but it's in the coma. So you have the data, so glycone array, glycone profiling gene microarray, because the idea was to have a combination of transcriptomic of the glycogenes, so all the glycosyl transferases and their expression could be possibly connected to glycone array. So this is the interface as it was offered for searching the arrays. So this is really if you have no knowledge of glycans and lectins, this is somehow cryptic. You could, for instance, decide to look at galactons and look at data and submit. And then the results are like this. So you have a big Excel table basically, and the only way to actually access the data is to download an Excel table. So I think it's relatively basic. And so if you look here and click here, you have some details of the galactin itself. This is a lucky situation where you have an accession number, but most of the time are very often often enough they were not. And then you can have a look at. No, that's what I just did. Sorry. Where do I see? I thought I had the results somewhere. So this is weird. Oh, yes, it should happen. No, I had another here. What happened to this one? Or maybe it's another one. Then you see this is the result of a glycan array when you see the intensity of binding and each time you can actually find out which sugar is binding and so on. There's some biology was done with that, definitely, but bifuramatically speaking, it was very, very difficult to exploit and to find anything, unless you had a dump, but the lack of cross references to sequence database was always an issue. So this is why I mean this explains partially why it was not made popular outside glycobiology because already for glycobiologists, it was limited so outside was another problem. So the glad system here, so this is the local tool kit is a is a JavaScript so really web based development made in Rick Cummings who is one of the top person in glycan array technology. In his lab and he has a dedicated person so you have a website, a web interface. My internet today is depressing. I have this dashboard so I actually selected the example to load, and you can see the data as bar chart so here they have this control set where they have a number of lectins. They expect these lectins to recognize sugars of of some types so if we look, for instance, at this one you have the sugar that is named up there but you have this fantastic little tool here that actually helps you see which sugar is there instead of reading the IUPAC. So you can see that are different lectins have different specificities and I assume that they are using this control to make sure that when they're using a lectin on a particular glycan array, they will find these structures because they are binders as specific to that lectin. So you have different. You can see that as a heat map, and you don't use information you can actually yes extend your bar chart if you focus in particular area. So RFU is the measurement, the units for measuring the intensity of binding to glycans. And you have also a force graph. So here again, you have this graph that shows you the lectin so it's the same color code here as in the bar chart and in the middle. It actually tells you the common binders between the different lectin so here is this one. And if I click on this one it's another one if I click on this one it's another one so you can see that this is infinitely more pleasant to use. And if you are so you can choose to load your data screen them. I mean, you can have some normalization so basic statistical tests on them. And you can have these variant variable various tools to visualize the data and and see what is important or not binding or binding etc etc. So this as far as I know is the only tool online that does that job and for for passing glycan array data. Just a note on Kazi Kazi is here. You have the carbohydrate binding modules that are there. And you can actually so this is the interface of Kazi. Again, unless you know what you're looking for. It's a little bit cryptic. So the information is easier to find from Uniprot entry linking to Kazi than than from the Kazi site. Personal opinion. Now, unilectin and unilectin 3d to begin with, and I will talk about the other modules. So there are many ways of actually entering the database, you can actually decide to look at the different class, you can look at the species, you can look at the specificity, and here is our famous So, the toxins that are there. So all these toxins are really binding sugars. You have the cholera toxin here. So you have different ones. And if we want to focus on these, we can actually find them here with as soon as the ligand is known of course it's registered but this is not always the same so this is all derived. Everything here is taken from the PDB so you always have a PDB accession number and different specificities for each of these that are there. So we have different species and we can decide to look at a particular one here. So we are really relying on light mold as you have seen in glyconect in sugarbind. We did sugarbind before Uniprot actually used it. And we are very keen on the plugin that was added to light mold by Oliver Grant, who is from the glycam team who are specialists of 3D modeling of glycoproteins. And I'm not talking about that really I should, but again this is another whole session that should be devoted to it. So you can see here the galactose. This is of course in the PDB. But in the PDB you have this sort of representation. Actually now the plugin is, sorry if I say things that everybody knows but the PDB is now spread over the world where there's the original site in the U.S. Europeanian site which is PDB. There's the Japan PDB and all of these are sharing the same data but not offering visualization on the same ground. So as it turns out, the PDB American site has now totally integrated the plugin to show carbohydrates with the 3D SNFG symbolism. So you can't miss it. Whereas PDB E is still not so clear on that. So this is the ligand, the galactose I mentioned so you can have a close up here on the interactions at the atomic level. We have all sorts of cross references to uniprot. So the uniprot accession number, the PDB accession number, the related uniprot with greater than 50% identity. So what you find in similar proteins in uniref 50%. Of course we have a Swiss model, so uniprot is here actually. We have the paper and glytucan for the sugar when it's there. And we have a broken link to glyconect for the same reason as the one that is broken with sugarmine because we are reshaping the substructure search. So in all likelihood it will be fixed in a few months from now. So please stay tuned. This will happen again. Then there is a Swiss model, a Swiss model, a plip interaction viewer. So we collaborated with the Swiss model people to have this plip application which shows you really details and you can actually navigate in the ligands here. And see information that you can reconstitute from this model. We have the sequence viewer that is borrowed also from PDBE and also the secondary structure viewer borrowed from PDB. So each entry, lectin entry of unilectin 3D is described in that way. And you can actually, if I go back here, so you can here do it by specificity so we can look at the difference. If I look at the, I don't know, the gluconac here, we see all the possible ligands that contains this gluconac. And then we can see all the lectins that actually recognize that particular ligand that binds that particular ligand. So you will see that in that case, we will have different folds like better sandwich here, better propellers here, and so on. So this is very defined in each time if you want to explore the 3D structure that is there, you click on this and you have this information. So here, if I go back to the, of course, the search by field is, you can ask for a specific PDB, a specific fold, you can look for a specific family, you can enter IUPAC condensed, or just a monosaccharide. You can be picky on the resolution of the 3D structure. So there's nothing much, I mean, nothing extraordinary in those fields. We just hope they are practical. Otherwise, of course, you can go straight to the sunburst. So you see each time here, it says that the Proplec 7A family has the PLLL like has four items, but if we go, the better propellers have 17 families, and so on. So you can go from there in the same way you can click from here, and you can decide also instead of filling the field for taxonomy, going to the taxonomy here, looking at the different species, animal lectins. So we have like, if you want to have a look at all animal lectins here, we have, of course, human, but we have toads and rodents and so on. So this is the various options you can do, you can use to search unilect in 3D. Since the prediction is another big chunk, I think it would be better to have another break now before I explain the prediction and how to actually explore predicted lectins. Alright, so the prediction part of it now. I share my screen again. So I will start with the, what we did with the better propellers, because it was like our training exercise before we got into predicting lectins. As I said, with one of the most clear, the clearer objective was to detect lectins in genomes where we know they're hidden and often not seen. The reason for it being interesting is that they are tandem repeats, and so they are a challenge, maybe not all of you are in bioinformatics but in bioinformatics, the repeats in sequences can become a nightmare. That's why our musins, for instance, are not so well annotated often in black and black and egg, and we can't determine what uniprot entry would correspond because the repeats are in the way and people are not quite sure which one is what. So there's a lot of pathogens with long repeated segments, a lot of virulence protein that have repeats, and these toxins and these propellers are certainly representative. And the idea is that because the bioinformatics classical tools are challenged by repeats, we thought, okay, we're going to look at the repeats of the structures that we know. We're going to look at the blades, each blade composing one of these propellers, we're going to align them together and see whether we can actually improve the delineation of those domains with these alignments. And so we actually took the 3D structures of those that had only five blades, six blades, different types of six blades and seven blades. And so we had example each time. And with this alignment. So we specified it in each class, and we build models using a very classical hidden mark of model system, which is just looking at the alignment and which amino acid is substituted with, with which other and whether you have insertion and deletion and so on. And it's modeling this system. And we actually came up with a classification and the possibility to predict on that basis. And we found also confirmation that we had the assembly that we thought were there, which is actually what we had at the time was the fact that you have six blades can be three by two, two by three, or six by one. So the six by one is this type, the three, the two by three is this one. We never saw anything of the kind before. And so the prediction was an opportunity to actually determine a new assembly. So what we did is that we took the whole of uniprot trample run the hidden mark of models, put them in in boxes depending on six blades, seven blade, five blades, etc. And we found a six blade that looked like the assembly was not looking like the others in the prediction. So because Anna Bertie has a crystallography lab, they crystallized the, they got the sequence of this bacteria and crystallized and find that the assembly, in fact, whoops, was was like, like this cutting in half and this was actually not, sorry, it's a bug in the, I did this a bit quickly in Prezzie. I did not animate it as I should have. And, but you can see if you twist your head a little bit, that you have the assembly with two pieces. And, and this was unknown, until we predicted that lectin so that got us very confident in thinking that we could probably do a complete database wide screening, using all the lectin classes that we had, taking all the lectin-serve motifs, using, so there's a tool which is called HMM that we can use to run the prediction from the hidden mark of models. So we took the whole of non-redundant and CBI, we took a Swiss proton trample. We took the extra genomes and proteomes to be, to be sure, and run the whole pipeline to detect lectins. So in other words, we had 109 classes, if you remember, 35 fold 109 classes and 350 families. So we did the same type of profile with the first class, the second class, until the 19th class. So each of them you can see have a different profile. Here you can see the predominance of some disulfide bridges and not here, but you have a very prevalent tryptophan. Here you have your prevalent aromatic and so on. So each one is really specific. And if you run through databases, you get a lot of results. This is what we stored in Lectome Explorer. And the first thing that we did was to look, so here you can't really read, but there will be a close up. This is a big matrix where we have the 35 folds here. And here we have the super kingdoms. So we have animal lectins, a bacterial lectins, fungal lectins, etc. And when you see a blue square, it means that this lectin, this type of fold is expected because we have a 3D structure, so it's expected in that kingdom. Whereas when you have a green one, it was, for instance, in this case, so if I go as a close up, the C-type lectin are expected in animal lectin, but in fact the C-type like are found in bacteria, in fungus, in plants, in protests, and in virus. The other way around, we had a lot of, so here you have a close up. This was really thought to be animal and in fact you find a lot of bacterial and it was actually found everywhere but bacteria and it turns out to be found also in bacteria and so on and so forth. So you see that you have expectations depending on what you have in the database in the uni-lectin 3D section and all of a sudden we can open our horizon by looking at the predictions and finding that the patterns are found in those particular lectins. So I will demo Lectom Explore after I finish with a nice story that we have on the bacterial lectins of the vaginal microbiome. We have worked with the several teams of the Imperial College in London who are involved in trying to understand how the microbiome, this biosis is creating problems with pregnancies and births, childbirths in the end. And so we collected on their advice the different strains that constitute the microbiome and some of course are labeled commensal and some were labeled pathogens. And those pathogens obviously are more prevalent when they have cases of dyspiosis. So this is a usual phylogenetic grouping. So these are mostly lactobacillus. So you have lactobacillus jejuni, you have chryspartis and so they are found in the microbiome. So in here you have the pathogenic strains where you have also some lactobacillus innards which is sort of in between sometimes on the good side and sometimes on the bad side. So you have garden and you have inshasha you have prevotella and streptococci. So you can see the color code here for commensal and pathogens. And here you can see the folds that we reported. So François actually run all the prediction on 90 genomes corresponding to these species. So we tried to map in this big chart the folds and the classes of lectins. So he run 109 classes profile on the genomes and what we found was rather interesting in the sense that we looked at the specificity of these lectins, whether they were associated with recognizing rather anacetic gcosamine or galactose or mannose and whether they were oriented in that way. And you can see that there's far more colors in the pathogen section than in the commensal. And that's what we found out in that chart that actually tells you that it's rather boring. So the lectins that you have in the commensals, they are not very various and there are not so many of them. Whereas if you consider the pathogens and their profile and lectin, there is a variety of them. In particular, they recognize more willingly sialic acids or fucose in combination with other like galactose. And you see here that you have the very, these profiles are very well known for being the profile of pathogenic strains in bacteria in general. So, again, we haven't this is a paper we just that was just accepted it's not in bio archive anymore it's actually in biofilm and bio microbiomes. But we can identify some tendencies and some obviously richness in the profile of lectins in pathogens versus commensals. So this was already a little bit of a hint in other microbiomes and we proved it yet again in that microbiome. So we are very much interested in in screening more genomes and finding out the lectins so we call them the lectins now of each species to see how potentially interesting it is. So I'm done with the slides and I would like now to go up. I have a question in the chat maybe. Where's my chat. Well, in which environment. Sorry, Carl, in the vaginal microbiome. Is that your question. Yeah. Well, it's the specificity of the lectin I don't know. Maybe there are other things to find and other specific. It's, it's a, it's a relationship between the host and the pathogens so I don't know what the pathogens are doing. And a lot of them have also some zealots at the, at their surface. So it's a mix. And unfortunately at the moment we're only looking again at the at the bacterial lectins looking at the host glycans and not the opposite way. We can discuss that in the afternoon if you want. And so I would like to go to have all these zoom things in the way to go to lectin explore. So I'm back in on the unit in the unilectin home page. So we have, as I said, we have the predicted lectin in, in some fungal genomes we have been working with some fungi specialist, and we will release very soon, or it is maybe already, I don't know. Yeah, it's not the travel, which are the same as the prop like. So it is Trefleck, and it's also repetitive. So here you see the, the different distribution of the, of the better propellers. So with the five blades with the six blades or the seven blades. So you can see the proportion of seven blades is, is, is there. And the blade distribution so sometimes you have actually, if you, if you look at a particular species. So let's look at, so we couldn't find any better propellers in a human. We thought that maybe we would create a bit of a revolution, but we did not. So, let's have a look at these particular bacteria here. And you can see that most of them so it happens that we miss. We miss a couple of propellers, because they just below threshold probably, or the, the blade is not as repeated as it seems. And, and so they are a few missing or sometimes also because we have a lot of partial sequence and fragment sequence in the database that we screen. So here this is actually, if you want not to have fragment or partial. That's the way to get rid of them and doesn't seem to have been the case for this one. And so here, you can actually consult the, the, or one thing I have. I failed to explain in the prediction and that is of importance is the score. We are actually scoring the, the, the prediction. And the score is a little bit tricky, because, as I explained, we have 109 classes and 109 models and each model is run independently. So the HMMR software, which is running the profiles for us is, of course, providing a p value. So it's a, you all know what a p value is. And so we collect those p values to have the significance of the prediction, but we have 109 independent p values. So François had to think of a way of choosing, of course, the, the, the best p value or the top p values, if we have very high p values for each of the class. And then, so sometimes we do have a dual prediction. We're not actually sure whether it's one lectin or the other. So we keep two. The score is normalized with all these p values. And we have here the score normalized. So here you can actually the minimum score in all our requests is by default at two at 0.25. But you can put it up and have a really good predictions from 35. It's really good prediction. Most of the time. So we have a bit less. As a result, we'll have less entries and very high score starting here. And you can view the information here below or opening. Here, let's say I asked for it below. No, okay. So I asked for it here and the information is the prediction. Here we go. So we show what we have with the this particular hypothetical protein of free bill a Solani. So this is its sequence and how it matches the profile. This is the, the profile here. And we can see which are the amino acid that matched the profile. And this is a good score because you can see, I mean, especially when you have tryptophan matching, it's always good because tryptophan are rare. So it's likely not to happen by chance. So you have a glycine and aspartic acid here, but unfortunately the two are not matched. So this is why it's not a top score, but it's a good score. This is again in my way. Back to other things. So in Proplek, I can actually list pathogens here with propellers, propellers and we have here the detail and what is interesting. Sometimes when we have the information but we don't always have it. We have the, the binding residues that are that are mapped as well. So you can see here that gets a decent score because there's some conservation, not ideal but some conservation with the residues that are involved with carbohydrate binding. This is not ideal, but this is good enough. And here we also map some CAF information, CAF is that database of domain, and we see that in fact they map the entire protein where we map only the propellers, and we can explain exactly the structure of that particular propeller. So this is a useful list to go through and see the propellers that we have for bacterial pathogens. So as I said, this is mainly found in bacteria in archaea and in eukaryotes. We're looking at fish, we have some fish, some mollusks, and no mammal involved in this particular category. So then I'll go for the big chunk, which is lectomex floor. So lectomex floor, we can, we have put the, you can explore it with the classification here, you can explore it by taxonomy, by type. You can see that the ficalin like is very, very common, we have the distribution of the top classes that are being mapped and predicted. And then you can search by field, and by field you can do lectin class, so it's the same and we have a bit more possibilities with this expand menu here, where for instance, imagine you have found a PDB accession number in some paper. So if you'd like to know whether there's a prediction for lectin there, or you have a protein accession number, so you can actually use that window to ask unilectomex floor, whether this is it or not. So you can also search for domain architecture using PFAM in particular species, also doing some filtering on the basis of the score. So if I look at my table that I showed you, and I look for instance at animal lectins that are supposed to be like the color and toxin, I want to explore those lectins. At the end, I have two, and they are found in cenarabditis, in some species of cenarabditis, you can see that it's a repeated pattern. It's a normally normal size protein. So we can have a look. So you see that the score is just above 126. If you look at the conservation, it's not very convincing, and this is why the score is not good. So you have here a cysteine instead of a glycine, you have an aline threonine, but the important residue for binding is the tyrosine here, but we have a lysine. So we have the glutamic acid that is preserved here. We do have the serine, the glutamine, and that's it, that I conserved. So another glutamine here. Maybe, maybe not. It's difficult. It means that you have to investigate the information further. As exactly in the same way that I told you yesterday and this morning again that compositor is here to sort of parse the information and present it in a way that is going to give you some ideas on what to think and what to do next. This is the same with unilectin. We do not pretend that our prediction are approved and you take this home and say, okay, I have predicted the lectin in the center of this is Negoni. This is really up to you to check the literature to check other databases to check maybe this is why we have the genome proximity so you can actually check the neighboring genes is it a properly defined gene and so on and so forth. There's really so many things you can check if I wanted to actually go back to here. Oh, yeah, good. Again, you can you can explore why would plant lectins have cholera toxin. There's one. So, is that a true one is not not a true one. It has an incredibly high score. So, for this one. It's, but I'm not sure what the species is I'm, I'm, I'm, I'm always looking at new things in you in lectin explore I really like it banana banana moosa is banana is what I guess it's banana moosa. Okay, so anyway, it shouldn't have a cholera toxin. And we don't know whether the gene is that is there. So the high score. Yes, I expect most of the. So the genome browser can actually tell you you can click on that it's a it's a plasmid. Okay, so already you have doubts so this is this is really. And food for thoughts. You, you browse, you look at the prediction, or if you don't want to browse it that way, you query it by field and you, you look at, for instance, a particular lectin class. So we could, we could look at this particular the scene here. See whether it's well represented we have 24 it's all in bacteria. So it's all clean in terms of prediction. So, let's be a bit more. Take more risk by looking at this one. Another thing that is interesting and comforting in a way is that very often here we we collect the most frequent protein name, and you can see that there's quite a few that are called either hypothetical protein or uncharacterized protein. So you have sometimes carbohydrate binding. And so you have all sorts of possible combination with other domains. And really the idea is to look to see whether it makes sense to have a lectin there and browse here and choose in one species or another, or whatever. So I think on this you have question on exploration. You cannot, at the moment, feed a new sequence. I know this is going to be. We're working at it. As I said, we release like stomach, let them explore only six months ago. So, or eight months ago. So it's really a very recent addition to unilectin. And definitely, we want to provide the service for running through a sequence finding the lectin. So, at the moment, we can organize things together in a sort of collaboration, we can run the profiles for you. So if you're working on a genome, and you are keen to know what are the lectins in your genome and your genome is not annotated, and it cannot be found anywhere else in full confidence, we can parse your genome and find the lectins for you. This is what we can offer at the moment. As before, we actually provide the service online where you can do that yourself with your own sequences. This is the best we can do so.