 So we're going to give you a duo. I will start, and Daniel will step in. So to give you a little bit of background, I'm going to set the scene at the level of cellular interactions. And you can see here that you have all sorts of receptors, membrane proteins at the surface of cell. You have antibodies. And you have those very interesting little shapes here that are assembled together to actually constitute glycans. And glycans decorate the surface of these proteins. And glycans are actually also recognized by a number of other proteins, like antibodies recognize glycan. And what we are going to focus on is those soluble. So GVP means glycan binding proteins, which are otherwise called lectins. So a lectin is actually a protein that has at least one non-catalytic domain that binds to a specific. So you will see that I use glycans. I use oligosaccharides. I use sugars. These are all synonyms. And these are the sort of representation, simplified representation. Otherwise, you have them like this. So you have a lot of carbon rings here. And this is just to simplify. And what is interesting is in these lectins. So these carbohydrate, also another synonym, binding protein or glycan binding protein is that they are most of the time assembled in the multimers. And the multivalency of this protein is very important for glycan recognition. So this is just an example. And you can see, for instance, if I take a category like C-type lectins, they are usually assembled like a nice little bouquet here, or they are assembled with different domains and available here, et cetera. So C-type C means that they are actually binding calcium and recognizing sugars, glycans at the same time. So you see that there's a whole variety of these proteins. And I'm sorry for this not very good picture, but I did to illustrate the biological role, took an excerpt of Laura Kisling, a talk she gave recently. And that was a streamcast. So a person like this is never at his best. But anyway, she showed that. And this was very telling. She's a lectin specialist. So you can see the human lectins. You can map them on the different tissues in the human body. And this is so there's a number of papers showing that you can actually track the expression of some genes in different circumstances. So SARS-CoV-2 infection. And you have here this interesting lectin that is obviously over expressed and of interest. And if you look at the uniprot entry, so it will tell you that this is a pulmonary surfactant associated protein D. And you can see down there that there is a it actually is part of the it's considered as a lectin. It's carbohydrate binding. And you can see that there's a cross link to a database, which is called Unilectin, that we co-develop with a team in Grenoble. And you can actually see that this C-type lectin is, well, you have all sorts of different information. It's architecture and so on. And it's biological role defining the number of publications. So we have put together this Unilectin platform that is actually focused, that has different modules. The most curated modules is the Unilectin 3D, so really based on the 3D shape of the lectins that are known. So we have also predictions of better propellers, lectin, for instance, large-scale prediction, fungal lectins also, and beta-trayfoil. And the idea is that we are trying to organize that information and structure that information. And based on foals, so one of the big challenge with lectins is that they are not very well annotated in databases. They're not very well characterized. They appear very often as hypothetical proteins. And here the effort in Unilectin 3D was really to create a hierarchical classification based on fault and sequence similarity so that you could actually at least look at different classes of lectins and their role is usually related to their fault. So of course, they're different in different organisms. And what is interesting also, what would be interesting is to have a classification based on the molecules that they bind to. And this is a challenge and a challenge we are going to discuss further. So the first round of prediction we did with the Grenoble team was Lectome Explorer. So actually we have 35-fold, 109 classes, and 350 families. And we actually processed any sequence under the sun that we could find in Uniprot that we could found at the NCBI that we can found in proteome, translated genomes, and so on. And we ended up with half a million lectins in 17 plus 100,000 species. And we are trying to make sense of this data. It's hard, but we try to use the classification to do so. And for instance, if you look at the human lectins, it's interesting to see that our prediction, the distribution in terms of classes, is not what you find in Uniprot. So there's a definite bias in Uniprot. So that means in public databases in general towards C-type lectins, less with the I-type lectins, which are some kind of immunoglobulin-fold lectins. And we have this potentially artifact with ficulin-like lectins and so on. The galectin are about the same and so on. So we have a greater variety in Lectomex floor, and it needs to be gone further. So what we need also to understand what lectins are doing, so there's a lot of screening method. The functional glycomics consortium was created at the beginning of the 21st century. And they particularly accumulated data with glycan array. And the glycan array is simply synthetic molecules, so glycan molecules at the surface of the array. And you can test your lectins, your antibodies, your whole virus, or some other modules, carbohydrate binding modules that exist. And there's accumulated data of glycan array that for the last 20 years have accumulated on that particular platform. So to learn about that, I hand over to Daniel. Thanks, Henry. Hi. So what is our big picture here? So the idea is that both bacteria and viruses and many other proteins and organisms interact with glycans. So in the case of bacteria and viruses, they use glycans as point of entry into cells. So critically important for infection for all kinds of pathologies. So wouldn't it be nice to have a model that tells you, given a protein sequence, given a glycan sequence, do they bind? Is there information in the sequence, and potentially in the future also in the structure, et cetera, to make these kinds of predictions? So for that, I need to make a little bit of a circle of how do you do actually model building with glycans? But glycans are a biological sequence, of course, but they are not your typical biological sequence. So you may be used to other things like that. So linear types of sequences, glycans are decidedly not linear. So they have branches. They are the only biological sequence that very commonly just spots branches. However, if you rotate a glycan, and we are not the first to point this out, it looks very much like a tree. And trees are just special cases of graphs. And we have heard great things about graphs already, including their ability to be used for all kinds of fancy deep learning methods. So people in chemistry have realized that as well, of course, you can depict molecules, so chemical molecules as graphs at various levels of resolution. And you can use that for all kinds of property predictions with, for instance, graph neural networks. And you can do the same thing, of course, also for glycans in this case. So this is a deep learning graph neural network that uses these convolutional operations to learn properties in glycans and that you can then connect to many functions or other properties that these glycans have. So just one small example here, about a month ago, the first kidney transplants from pigs to humans have been ever performed. This is not with your typical pig. This is with the transgenic pig because pig glycans are unlike human glycans. And therefore, your immune system recognizes them. So you may remember if we really pointed out that antibodies bind glycans. So antibodies may bind these pig glycans. Transgenic pigs don't have these components anymore. And therefore, you can more easily transfer their organs into our bodies. If you use a model to predict glycan immunogenicity, so this deep learning graph neural network, you apply that to pig glycans, which the model hasn't seen before. You can score glycans based on likelihood of immunogenicity. And the predicted glycans that are immunogenic are among those that people have tried to modify transgenic methods to get more organs that are easier to transfer into humans. Going back to protein glycan interactions or lectin glycan interactions, this is a model where we use both information from glycans as well as information from proteins. And the module of glycans here is this same graph convolutional neural network, broadly speaking, that I've showed you in the last slide. Whereas for proteins, we use a large transformer based model. So a language model type that has been pre-trained on a lot of different protein sequences. And as input data here, we use the exact same glycan array data that we discussed as spoken about before. So a large data set of probings between proteins and glycans, do they interact or not? So we have quite a bit of diversity as well as quantity here to predict their binding. And then hopefully get some interesting results from that. So we did lots of things with that, which I can't call into all of them unfortunately, but of course we also validated our predictions with independent experimental data sets to show that we can generalize to new proteins, new glycans and their presentation in different contexts. So different linkers, et cetera, as well as various design parts on the protein sequence, but also maybe some indications that there may be some structural features that have been learned by the model. But I just want to show two brief biological vignettes. So, and the first one is about bacteria. So the microbiome and the bacteria that constitute the microbiome, the biota interact with glycans. So we thought, and Frerick has some briefs work with that. So this was a really cool opportunity to look further with the vaginal microbiome and pathobionic and commensal strains of the vaginal microbiome and find out which lectins do they have in their genome and then use these sequences with lectin oracle to get binding predictions. And that's really cool because you can't do that currently. You can only be before a lecture oracle, you can only work with proteins that have been either characterized before or at least have some kind of experimental data associated with that that you can mine. Whereas with our generalizable model, we don't need that. And using that, we can look at what do pathobionic strains bind, what do commensal strains bind. And what we find across the board is that pathobionic microbes in the vaginal microbiome are predicted to bind a greater diversity of glycan motifs as well as more human-like glycan motifs, which potentially endows them with the ability to stick better to the mucosal surfaces and potentially lead to infection in some cases. The second example is with viruses. So this is influenza virus. You may know that there are avian influenza virus strains and mammalian influenza virus strains. These are mainly, so of course there are many differences, but one very salient difference is the specificity of glycan binding. So as basically every virus, influenza virus binds glycan. It binds glycan via its hemoclutinin protein, which is on the surface of the virus. And the avian influenza virus binds this diamond here. It's called salic acid. It binds it in a specific configuration, whereas the mammalian one binds it in a different configuration, which leads of course to a different geometry, a different structure, different pre-structure, which can endow these viruses with their specificity. Now you can go ahead and mutate the hemoclutinin of avian influenza virus and in some cases it's maybe sufficient to lead the virus to infect mammalian hosts. So they have this zoonotic jump. And this premise that hemoclutinin sequence determines its binding specificity is exactly the premise that that canonical was built on. So we thought we could maybe use that to have a second look at epidemiological data to see whether we can contribute anything from a glycan perspective. So what we did is we got strain data, so H3 and 2 influenza strain data from Taiwan for a given time period and predicted their binding of these strains to human glycans, which fluctuates over time. So that is already one observation and this matches very well experimental data of the same strains. Well, what we did then is we took excess mortality data from H3 and 2 in Taiwan, same region, same virus and overlaid that. And at least that gave us indication that there could be a correlation between predicted binding to human glycans and the corresponding X mortality of these viruses. So potentially if they bind better to ourselves, it might be easier for them to infect us in greater numbers and therefore kill us better. And we are currently further working on that because we see trends that this generalizes to more cases. Before we end, I just want to thank all the people here and our funding sources, but I'll turn it over to Federico one last slide. Thank you. Well, this one is actually related to classification and this is where I said that was going to go back to that. And this is also relating to Kasten's presentation where he says, well, after we've used the method, we have the challenge of interpreting the data. And this is exactly where we are too. You can see that if we predict a lot of results, then we have to make sense of them. And here is an example of the before and after running the method, the model that was created by Daniel. And we are after training, this is the sequence classification that with our different classes of lectins. And you can see that we change the landscape after training. So this is of course something we want to add to our thinking about classifying lectins and putting them in, I mean, relating their function, having more of a functional classification as opposed to structural as it is at the moment, try to see how the two overlap. And this is really the next step in unilectin and justifying even further if it needed a collaboration between our groups. So on my side, I want to thank the Grenoble team for the work that was done and presented between François Bonardel, who was a PhD student, successfully finished, and Anna Bertie, with whom I co-supervised the work. So and thank my group, the Proteome Informatics Group, which is a big.