 Thank you to the organizers, especially Friedrich, for inviting Raja and I to present to you resource that we hope you will find useful and that we hope the whole community will find increasingly useful as we continue to develop its features and functions. So my name is Michael T. Meyer. I'm a faculty member at the Complex Carbohydrate Research Center. I'm going to start today and talk for about 20, 25 minutes, and then Raja Mezender, the other PI on this NIH funded resource, we'll talk. I'll talk initially, you're not advancing, I'm sorry. Is the arrows, I think the arrow up and down? There we go. Okay. I'll start initially by demonstrating why and how to use Glygen in order to acquire useful information, and then Raja will follow me describing the data structure that is at the core of Glygen and also provide you some interesting examples of information that you can mine from the resource currently. So I know you all got a wonderful introduction to glycobiology and glycoscience a little earlier today. I just want to remind everybody the complexity of glycan structures that we find at the cell surface on glycoproteins of multiple different classes of glycans, N-link glycans, O-link glycans, large complex glycosamino glycans, of course glycans on glycosfangolipids, soluble glycans in the form of pyluronic acid, but also that glycans exist in every compartment of the cell from the nucleus, the cytoplasm, and through the secretory pathway out to the surface. And we know that these glyco-conjugates, whether they're secreted or found at the surface or inside of cells, are essential for regulating very important functions and contribute to both normal tissue function as well as disease. Glycans are essential not just for normal cellular function, but also for being components of very important biologic pharmaceuticals, antibodies, growth factors, enzyme replacement therapies all are centered around glycoproteins and the glycans are important for the stability and function of those biologics. So there's a lot of data suggesting and showing that glycan structural features influence the functions of the proteins or lipids to which they're attached. So where is this data captured? And can it be harvested in a way that is informative and that may reveal new knowledge? So Jamie Marth a few years ago wrote a very nice review of pointing out the four key molecules of life, the nucleic acids, proteins, lipids, and of course glycans. And for three of these four classes of molecules, there are really good, well curated, robust, well funded databases that allow people to look at information about these molecules and draw conclusions and linkages of different types of information. Unfortunately, what about glycans? Unfortunately the generation of these databases has lagged behind those for the other familiar databases for other molecules of life. So where are these glycan databases and what's hindered their development? I think there's really two main reasons that glycan database development has been slower other than just their fewer people working on it. I think one of the reasons is that we all appreciate is the structural complexity of these molecules. Out competes the structural complexity of any of these other classes of molecules, branching, linkage position, anomericity. All of these things mean that glycans are not simple linear polymers but have a much more complex chemical nature that requires a different way of describing the molecules opposed to a linear polymer that's more easily represented. So another thing that of course then hindered the development of databases is how do you represent this complexity in a way that a machine can understand, store, retrieve, and deal with. And efforts to address this sort of machine issue go back to the late 1980s in fact when some of the first efforts at building carbohydrate databases efforts such as Carbank for instance tried to come up with common ways to a way to describe a glycan so that a machine could understand it and since that time many people have contributed to trying to build a common language that could be useful for everybody so we're all talking a similar language and our machines can talk to each other. And that's really an important aspect of the development of glycan databases that it's required effort from around the world and so what we do at Glygen is we try and try and build on the shoulders of those who've come before us, use the tools that have been developed already, try and bring these things together and integrate them into a stable, broadly useful, innovative resource for interrogating glycoscience data. So a little bit about Glygen, it's funded by U.S. National Institute of Health Common Fund, UO1 mechanism. This support was initiated in mid-2017 after a year of a planning grant that was I think essential for launching Glygen towards usefulness. We're currently ending the third year of five of our initial round of funding and this first year of planning grant was important. It engaged stakeholders and people that we presume would use Glygen to ask them what they thought the resource could be, what are the needs and priorities that this resource could address so-called use cases? How would you use this resource if it was in place? And we continue to try and engage users both inside and outside the Glycoscience community to try and understand how best to provide functionalities that would make Glygen a broadly useful resource. It's important to point out and I think both Raj and I will at several points that the data is available through Glygen is data that's generated by the community. Glygen's not generating data. We're trying to integrate and provide access to data. So the more we can collaborate to capture data with users, the more valuable this resource will become. So Glygen at its core describes proteins, glycans, and glycans on proteins, glycoproteins. For each of this sort of core three data domains, there is data behind that that describes, for instance, proteins, the mass of the protein, the diseases that it might be involved in, words expressed, the sequence of the protein. All these sorts of data come from other resources and are brought together in Glygen. Similarly, for glycans, there's all sorts of bits of information, bits of data describing a glycan that we try to bring together. We work with many different partners around the world to bring this data together in the United States at NCBI and other groups, informatics groups in the United States such as Kebby and Pubchem. We talk back and forth, providing data to them and then providing data us to try and build a richer resource. Of course, we work with people that are mining data, especially glycoprotein data, to come up with descriptions of how proteins are glycosylated, and that's an important core to what Glygen tries to integrate. So we're all familiar with Uniprot accession numbers. These are incredibly valuable resources like RefSeq accession numbers that provide ways to catalog proteins. And surprisingly, or maybe not so surprisingly, similar sorts of accession numbers for glycan were not available until just within the last decade or so. And the core central resource for providing accession numbers for glycans is Gly2CAN, which Kyoko, if she has not already told you about, will tell you about in great detail. It's a great service to the community. Provides a way for individual glycans to have their own accession numbers and sort of to harmonize then how we access glycan structures across many different databases. So I encourage you to go to Gly2CAN.org and understand how this works. Anybody can provide a glycan to Gly2CAN and use it to acquire an accession number that can be used for publications. Increasingly, these accession numbers are becoming standards in the field for being able to refer to glycans. And Glygen takes advantage of these accessions to harmonize references to glycans across our resource. And you'll see that in a few minutes. So this is also an example of how Glycan database and Glycan knowledge knowledge-based efforts around the world are working together as much as we possibly can with Kyoko in Japan and Glycosmos, Glyconnect and SIB. Glygen, together we've formed an alliance where we talk with each other and try and make sure that we're not pulling in the wrong direction, but rather rowing the boat in the same direction. So again, just to reinforce, there's sort of three core domains of information that Glygen integrates, that protein data, glycoprotein data, and Glycan data. And these are the ways that one can enter this database that I'll show you. I also want to point out, and Roger will take you through this in more detail, that the data that's in Glygen is freely available. The only requirement is that you acknowledge Glygen and the original source of the data, which is visible in the resource, and point out that Glygen operates under the terms of CC by 4.0 in terms of reporting. So Glygen data is freely available in the portal. I'll show you how to get there in a second, in the data portal, data.glygen.org, or through the web interface portal. So I'm going to go live now and hope that that works just fine. I don't see any reason why it shouldn't. All right. So this is the home page for Glygen, and I want to point out a few things that right off the top. First of all, we are involved in a relatively rapid versioning cycle every two to three months, allowing us to incorporate features that are suggested by users in a relatively rapid way. And also that feedback is really important to us. Your opinion absolutely does matter. So there's a feedback form that you can fill out here and give us some general information about how you got to Glygen once you think of it, who you are. There are also other opportunities for providing more specific feedback, and I'll show you where those are as we go through a demonstration of the resource. I mentioned the data. All the data in Glygen is freely available. The database statistics are here, which describes what's currently held within the database. To get to the data, you just click on the data button here, and it will take you, okay, there we go. It'll take you to access to all of these different databases. I could scroll down forever. Each one of these cards is a different dataset, and you're free to download that and interrogate it using your own resources as you might wish to do. So I'm going to go back to the home page now, and if you scroll down to the home page, oops, sorry, scroll down on the home page, you arrive at a section that we like to think of as late breaking news and information, interesting features within Glygen that we've developed or data that's been published, and many of these items link to wiki pages that we're developing to help inform and provide additional information for various pieces of data. So there are three ways, three main ways, into the database here from the home page. One of the simplest ways, if you just want to understand and see what Glygen can do, is to try these try me questions. The try me questions are questions that have already been answered, and you can click on these questions and see how, what kind of information can be mined for these sorts of questions, such as how do you make a particular glycan, what proteins carry a particular glycan, and how might that glycan, what kind of glycans might be synthesized by a particular enzyme. So this is a way to see what Glygen can do. Another way to gain access to Glygen's data is a quick search. So these are essentially the use cases that we had heard about, heard that would be important from users, what sorts of questions one might like to ask. So for instance, if one is thinking about a glycan, you have a glycan of interest, your favorite glycan, you'd like to know which protein that glycan's on, well you can enter the gly2 can accession for that particular glycan, and go learn about what proteins carry that glycan. I'll show you an example of that as well. So these are sort of questions that are common to a lot of glycoscience-based studies, and an easy way to get to answers to those questions. Okay, the other way to enter into Glygen's data space are through these sorts of searches, a glycan-centric search, a protein-centric search, or a glycoprotein-centric search, and you can enter these by clicking on these panels on the home page or pulling them down from the explore menu. So we're going to take a look at the glycocentric search. Raja will give you a little deeper view into the protein-centric searches. I'll show you sort of the nuts and bolts of how to move through them, and you'll appreciate a little more of what they can do for you when Raja talks after me. So here's the glycan search page. The first thing to note that there are tutorials available, so at your leisure, you can look at some of the ways that you can use the glycan search space, and then there are, on every page, actually you will see on the right, a feedback button, and if you click on the feedback button, you'll get the opportunity to give us some more information about what you think or what you need. The nice thing about these feedback buttons that are on these pages is that the information you provide will be linked to this page so that we can understand a little bit about the specific need, problem, or suggestion for improvement that you might actually have. So we encourage you to help us by approving the tool and giving us feedback as frequently as you can. Okay, so glycan-centric searches can be a so-called simple search where you can enter a gly-2-can accession, such as this number right here, protein accession, perhaps the name of an enzyme. You'd like to know what glycans can be made by that enzyme or some other keyword. So let's try just entering this preset gly-2-can ID for a particular glycan. Now, this is an explicit glycan. This gly-2-can ID refers to a particular instance of a glycan entered into gly-2-can. So here's the cartoon representation of that glycan. If we click on the gly-2-can, so it gives you the cartoon representations, mass, as a native glycan or as a permethylated glycan, how many glycoproteins that glycan has been described on. So if we click on the accession number, you again see the cartoon, some basic general information about the glycan. And over here on the left is a menu that describes all the different sorts of information one can learn about this glycan. So let's look first at motifs. So this was a bi-intendery, disilated, complex glycan. And it's comprised of motifs that have specific meaning. These are motifs that we can look at across all glycans. So for instance, the core of an N-glycan, a complex glycan that's been extended with anisoterolycosamines, capped with lactosamine units or silo lactosamine units. So one can search by motifs to look at what glycans may contain these particular structural epitopes. Clicking on found glycoproteins, one gets a list of proteins on which this glycan has been described in the literature. And each of these instances carry badges that allow you to go to the original published data or the database from which this data was acquired. And in fact, go to the literature and ask yourself, is the data that supports this assignment sufficient? So you're freely able to navigate to the original descriptions of the entry of this particular glycan and glycoprotein. So if we go and look at the entry for this particular protein, uromodulin, we have now jumped from a glycan centric search to a protein centric search. So we're now looking at a protein detail page that we got to through the glycan for this particular protein, uromodulin. And again, all the different types of information that one can look at in glygen related to this protein are listed here to the left. So if we now, we're interested in this glycoprotein and how it's glycosylated. So if we click on the glycosylation tab, we're taken to a table that describes the glycans that have been assigned to this glycoprotein. So you can sort this table by various parameters, such as gly-2-can accession. And I want to point out here on this page where I've simply sorted, I've simply sorted based on, based on gly-2-can accession, you'll note that there are compositions within this database, not just explicit structures. So gly-2-can is able to house compositions in addition to explicit structures. That can be useful, for instance, if one's using an MS-based tool and looking at glycopeptides and finds a glycopeptide with a composition, and one might like to track down what proteins have that composition on them or I'm sorry, what proteins might have that composition on them. Let me get back to my search page. I hit a funny button. Okay, right. So we looked at the glycoproteins that carried that particular glycan. We went to Uromodulin as an example. And we looked at the glycosylation table for Uromodulin and all the glycans that have been reported on that protein. Another way to look at this, any particular protein in glygen is to look at the sequence and you can click over here to see where the n-link sites are, where the potential o-link sites are, where the sequence that define n-link sites are, and the mutations that have been associated with this particular protein. And Roger will show you some ways that you can get to some interesting information through mutations. Another way to look at the protein is on as a linear sequence in cross-witch, across which we go back to protein detail, across which the glycosylation sites are mapped. For instance, the sparigine 80 here and you can just hover over and get access to a table that describes all the glycans associated with that particular glycosylation site. Okay, so I'm going to go here back on the protein detail page and I want to scroll down just to show you the different sorts of information that one can look at, isoforms, homologs, diseases, mutations, expression of the protein, disease-related expression, and I wanted to just point out the cross-references table where you can link it to other resources to get additional information about the protein of interest. Right, so that was a that was a glycan search, a simple search where we simply entered a gly2 can accession. There are also advanced searches. An advanced search, perhaps you know a limited amount about your glycan. You don't know it's gly2 can accession, but you've got some information from say a mass spec experiment. So maybe you know that you have a glycan that as a particular mass, and you know that that glycan is an n-glycan, and you know that it's in fact you know it's a complex glycan. So you can search based on that information, and the search will return gly2 can accessions that meet the search criteria that you entered, including now a gly2 can, so now you have a gly2 can accession on which you could click and learn more about proteins that carry that glycan or other features of that glycan. Right, so that was a advanced glycan search, and all these different parameters can be used to search glycans, and then there's a composition search where perhaps you've done an MS experiment and you've been presented with a peptide that has a post-translational modification, and a composition associated with that post-translational modification that suggests that it has say five hexoses, four hexenacs, a few glucose, and two sialic acids. So you can do that composition search and end up with the results. The composition searches are a bit slow because they're searching through a lot of glycans, and I'm not able to click on my pre-competed search, but the search will come up with a table, again that returns gly2 can accessions, sometimes compositions that are consistent with that, and then structures which are consistent with the composition, and again now you have gly2 can numbers that one can click on and then learn more about the proteins that may carry that glycan. Okay, so let's go back home again. So we've talked about different ways that one can access glygen through glycans. I'm just going to spend a minute talking about access through protein searches and glycoprotein searches. So what's the difference between a protein search and a glycoprotein search? Well, glycoprotein search does everything that a protein search does, but it adds glycosylation specific features as additional search parameters. So glycoprotein search narrows the scope of the search to proteins that have been identified to be glycosylated. So protein search, again, just like the glycan search, there are tutorials that are available. One can enter a protein ID, even a key term, and get access to particular proteins. The advanced search, you can see the different sorts of parameters that one can use to search proteins by protein names, organisms, GoIDs, etc. And then if we just take a quick look at a glycoprotein search, the glycoprotein advanced search is essentially similar, except it adds a few parameters. So glycans that may be associated with the protein, particular amino acids that may be glycosylated, and the type of evidence that supports this as a glycoprotein. So for instance, we can select, let's select a human, a protein name, a hepatocyte growth factor, and we're interested in glycosylated asparagines, search for those parameters, end up with a list of proteins related to your search terms. Here's a hepatocyte growth factor, clicking on the unit product session, now brings you again to a glycoprotein detail page, where all these same sorts of information are available, as you saw for previous searches, including the ability to go and look at the glycosylation that's been reported on this particular protein. Okay, so we've navigated around a portion of these integrated data sets that comprise glycogen, hopefully demonstrated some basic entry points into this resource, and at least hopefully provided a foundation or motivation for you to look on your own around glycogen, and appreciate what it can do for you. I just want to finish up with two slides, summarizing what I've just told you, glycogen is conceived to present novel opportunities for exploring links that cross-disciplinary boundaries, and this will become more easier and more informative as we put new and different types of data into the resource, which we're constantly trying to do. The value of glycogen is going to be directly related to the quality and the type of data that gets integrated into the resource, so ultimately our success is in the hands of all the people that use it and help us improve it. The ways we're looking forward to in the near future to improving new features coming along, we're always interested in adding new types of data, new species, additional species, protein variants, phenotypes associated with proteins in glycosylation, binding proteins, protein interactions, protein glycan interactions. We're also interested in new types of annotation, more types of glycosylation, how glycans are related to each other structurally and biosynthetically, biosynthetic enzymes, and how they map to biosynthetic pathways, and some new features we hope to bring online soon, some enhanced ways to filter and sort data, enhanced hot topics, entries that allow people to learn new things about glycoscience, as well as more statistics and intuitive displays of the data statistics to understand how deep the database actually is. And I'm going to stop there and turn it over to Raja, who will take you on another sort of tour through what glycans can do for you. So I will stop sharing and turn it over to Raja. Hi, good afternoon and good morning for us. Thank you, Mike. So I'm going to just jump in to the demo. So I'm Raja Mazumdar. I'm a professor at George Washington University. And again, I want to thank Fredric for inviting us to do this. And my part of the demo or the webinar will pretty much go over some types of searches, how you can use glycan to answer some interesting questions. So I will first explore a couple of glycoproteins. So I will first use interferon gamma receptor one as one example. And then I will use the spike protein of SARS Coronavirus 2. And after that, I will show how you can explore data.glygen.org. So we have actually a unique way of exploring the data. So the front end definitely is what majority of our users will use. But we also see a value where people may want to just download tables. Give me a table of glycosyl transferases in humans. So sometimes these tables are not completely available in the front end in the way like you cannot do the exact search that you are interested in, you can just download the table and explore it in an Excel file or something similar. And then the last part I will highlight programmatic access. I'll talk about APIs and Sparkle endpoints. So let's start. So I'm going to show here a search which is going to search everything in glygen, which basically means that it is really easy to use but it will retrieve items that what maybe you are not thinking about. So if I just type in ifngr1 here, it will tell me exactly where that text is in glycans proteins and glycoproteins in gene in function and so on. So I know that ifngr1 is a gene symbol, right? So I'm going to just go ahead and click in the protein section and it pulls out ifngr1 in Homo sapiens, in rat, and in mouse. So this is one of the easiest way if you have a gene symbol or if you have a species name to pull records from glygen. So if I click here on the Uniprot kbx section here, I end up on the protein page that Mike showed before. So on this protein page, I'll spend a little bit more time. So on this protein page, we're going to explore a few things. Of course, Mike showed you the feedback mechanism. You can tell us if there's a problem in the protein page, if you have a question or if you have a suggestion. And one good thing it does is that it will tell me when I or we get it, several of us get it actually. We know which URL you are referring to that we don't want to type it in or copy or paste it in and we can easily tell your feedback where why where it came from. And it allows us troubleshoot and also provide you answers or improve our resource. So on this page, for this protein page, I'm going to go and explore a few things here. So if you see, we have of course the Uniprot kbid and the Refsic accession, we have the Refsic name, the Refsic summary, and so on. And I'm going to go later on on a few other sections of this protein page. I will click on the glycosylation. I will click on the mutation section. And I'll also click on the sequence section to show how you can explore the protein. So let's first click on the Refsic accession. So this cross references are really, really critical because what it does is that it allows a user, not just tied to glycan, they can explore other resources. Here we are in NCBI. We are very closely working with NCBI actually with the Refsic, with Terence Murphy and others to improve our individual resources and users should be able to get information from both of these places. So in this particular case, you can of course go through the information that is in Refsic, but I also want to show that you can also come back to glycan. So if you had started off with Refsic, you have a cross reference back to glycan and you get back to the same protein page. So once you are, once you are looking at this entry, the first thing or the subsection that I want to go to, of course, is the glycosylation subsection. And in the glycosylation subsection, you will see two tabs with reported glycans and without reported glycans. Now we will have an additional tab here, which will be called literature mind data. And that's really important. So one of the biggest challenges of having information in a database is there are lots of publications which talks about the glycan, the protein, the site, and even the glycan structure. But that information is not in a format that can be easily captured by a database. It's in a publication. It can be an abstract. It can be in a legend. It can be in a supplementary table and so on. So for the last year or so, we have been working with Dr. Vijay Shankar to collect basically mind literature and add to our knowledge of glycans and proteins. And that's extremely important. Of course, there will be some false positives in there because it's completely automatically mind, but still nonetheless, you will find by exploring some of the literature mind data quite a bit of information. So here in this case, I'm going to now click on one of these residues. So there is as per gene 34, as per gene 79, all these residues are basically, there are lots of pages, by the way, for this particular protein. It has, well, in this case, there's two pages, but in some proteins, you can have lots of pages. So it's useful sometimes to explore a particular site. So I click here on 79 ASM. So when I click here, I see that for that particular position, there are a number of structures that are there. The other thing I want you to pay attention to. So this N is 79. And then there's I and there's S. I'm using in serine. And this is 80, position 80. And this is position 81. So in 81, there's this serine. Once you, if you want to explore this serine, if there is an imitation, that's what we're going to do next. But before we do that, I want to scroll down a little bit and show that we also get n black oscillation data or unicarp kb from and also the seek and the seek one data. And in this case, we have from unicarp kb data. And we also have integrated into our database data from Frederic's group like connect. I'm going to show you later on, it is still not in the in the front end yet, the way we want to see it, but it will shortly be there. So if I go back to my entry page and I click on sequence, I want you to then click on end link sites and then mutation. So when I click on mutation, remember this is this is position 79. And then the next one is 80 and then 81. So position 81 actually has a mutation. And that mutation lead to loss of and link like oscillation in that particular site. So if I want to explore that mutation, here is position 81 in colorectal cancer. This is mutated as a point mutation is S to Y, which would lead to loss of glycosylation. So this is this type of information is quite valuable to explore your protein and see how mutations or even polymorphisms can potentially affect the function of a protein. So next, I'm going to go to without reported glycans. And without reported glycans. Remember, I said glyconect is not in the place that we want it to be. So this glyconect will show up with reported glycans. And I think in the beta site, it's already doing that. But this is something we were going to fix. But here we are getting data mostly from Uniprot kb. And in some cases, we are also getting data directly from users, submitters, who have given us site specific information. So I'm going to go next to the Go annotation because it's also something that you can use in your exploration. So here in Go annotation, you have a couple of two cellular components, let's say plasma membrane. There are two in cellular component category, you can actually then use these Go terms to get additional proteins, which are in the plasma membrane, for example. So if I go to explore, and then I go to protein search, and advanced search, I can put in my Go ID, I can also select an organism here. So let's say I want only Homo sapiens and search proteins, it will retrieve all the proteins with that specific motor. So it is 4500 or so proteins that were found. So this is a good way, for example, you want all the proteins which are extracellular or all proteins which are nuclear. So you can do these type of searches. And of course, you can download the list in an Excel file and save it. So next I'm going to show how you can also use the Glide2Kan IDs. So in this particular case, I use the Glide2Kan IDs to explore other resources. So I want to see this structure, let's say in PubChem. So we have been working for quite some time with PubChem trying to harmonize our resources and also making exploration easy in PubChem or GlideCans. And if I take this Glide2Kan accession and I do a search, I end up with this result. And if I want to explore the compound CID, in PubChem, we end up with the record which has an SVG image of the record. It provides the IUPAC condensed form. And of course, it will lead us back to Glide2Kan from this record and also provides some motive information. And another interesting feature of when you integrate data into PubChem or Kebby, they also run their automatic algorithms and it provides the related compounds and the related records. So there are eight records with the same connectivity and the parent connectivity. And of course, almost 10,000 records with just similar compounds. So exploring this can be quite useful when you're trying to explore other compounds which are similar. The same way through collaboration with Kebby, if I search for the same Glide2Kan accession, I will end up in the Kebby record for that Glide2Kan. So this way, you can explore in different resources. And also here, you can get the automatic cross-references from Kebby to see what other resources, what other cross-references you can explore inside of Kebby. So now that I went through a couple of these examples, I want to, I have some time. So I want to explore the spike protein. So SARS coronavirus, of course, it's in everyone's mind and I'm going to use this example to show what we have in Glide2N for SARS. So we do not integrate just GlideKan. So the whole idea is that our users need to be explored the GlideKans within the context of the gene and protein. I mean, that's what our project is all about and our proposal was all about. So in this particular case, when I typed in SARS, I have 29 records because in species, because there are 29 proteins, you know, half of them almost are in the SARS coronavirus 2 and some of them are in the coronavirus 1, which is the older SARS. And the reason for this we put these two in together is because users may want to compare between the two SARS coronavirus. So what I am interested in here is the spike protein. So I'm scrolling down and I see this is the SARS coronavirus spike protein, spike glycoprotein. And if I click here, it shows the protein record. Now in the protein record, you will have glycosylation information. And in this particular case, this data is coming from this particular bioarchive paper. So this is important enough before even something is in the publication. This is a paper by Max Crispin, and we felt that this is important enough to make it available in glygen as soon as possible. So I'm going to close this record or this web page. And without reported GlideKans, the same thing, you have positional information for the different glycans. You can also go to the sequence to quickly see how many end link sites are there. So there are 22 end link sites and two O-link sites. We do not have the mutation information yet here, but if there was mutation information, then one could see how any of the mutations potentially is affecting the glycosylation sites. So if you want to explore what is present on our beta site, you can do the same thing. I want to show you, there are a couple of things on the beta site for the SARS protein, which has not made it to the front end. So in the beta site, we have, you don't need to have a login or registration for the beta site. You can directly go in there. So let's see. I want to go back to the same entry. And here I want to show, so for the disease, this is mapped to COVID-19, the SARS coronavirus 2. And we have now homologs that for the SARS coronavirus 2, which is just a spike protein from coronavirus 1. And for this homologue, we used Uniref 90 from Uniprot. You know, a Uniref 90 is a cluster, it's a cluster's protein, similar proteins and sub sequences in the clusters of 90s and 50s, for example, kind of 90% identity and 50% identity. So it allows us in this particular case for the viruses to identify the homologs quite easily. So next, I'm going to go to the data portal. So the data portal, it says data.glygen.org. So this portal is not supposed to be as interactive as the portal that I was in earlier. So this data.glygen.org is the place where you can download individual tables. So you want to download the human proteomaster list or the canonical sequences or the EBI, Uniprot, KB, and T5. So there are several things you can download here, or for example, you want to download the glycosyl human glycosyl transferases. So if you click on view details, it actually tells you in great detail how was this table created. And I think it's important because provenance is really important in any database. So tables can come from so many different places and curators may get it, but if it is not recorded exactly how it was created, you may end up with rows in the table that you don't really know why they are there. So here, it will tell me exactly why this rule merits being in this table. And all of this information we have recorded as a bio compute object. And it can be parsed into a nicer looking text like it is shown here on this page. The other thing in data.glygen.org is I want to show you how users can also submit their tables. So for example, I will type in the name of one of the submitters, Christina and search. And so this is the table that Christina Wu's group submitted to us. So this data set provides information on obliques oscillation sites in human proteins. The data is submitted by Dr. Christina Wu and so on and so forth. Glytokin accession is annotated to the glycan composition based on the author's recommendation. So not only we can get data from the papers, but if there are labs who are willing to submit their data and then why did they choose the particular structure that is on the protein, we need to provide the users information about that. So this is an easy way to submit tables. In her case, she just submitted a four column table and we added the additional information into the table. So in this example and other examples that I showed here, one can explore the different tables that we already have. Next, I will show you some programmatic access api.glygen.org and also I will show sparkle.glygen.org. So in the api.glygen.org, you see glycan search, glycan details, glycan image, synthesized enzymes and so on. So there's a list here. I'm going to click on get and next I'm going to click on try it out and it shows the query JSON and I'm going to execute and this gives you the server response. So it provides information about the glycoprotein, the residue, the canonical sequence, the protein name and so on. And for programmatic access, let's say you want to get the data from glycan and integrate it within your website, you can use these APIs. So these APIs also support the front end and what it means is that any data that we collect, anybody else in the world can use it. I think that's our model and we want to make it as easy and simple as possible. So I want to next show sparkle query. So here is an example, you load examples, here's an example sparkle query and it will retrieve protein enzymes involved in the biosynthesis of glycan and this is the gly token accession in mouse. So you can submit the query and you get the results in XML. You can also get the results in JSON format or you can get the results in CSV format. So as you can see, there are multiple ways to get to the data and from the front end also you can search and you can download any results that you get in the search. So those are simple ways. So everything that I mentioned here is also in the slides which you have access to. So I tried my best to follow exactly what I showed. So there are screenshots for everything that I did so you can easily try it out if you want to later on. There's one more thing that I want to show which is our contact page. So if you have any questions, please contact us. It can be a general comment or it can be a technical issue or anything else. Enter your email and we'll try to get back to you as soon as we as we can. The other thing that we always I also want to emphasize is that everything is on the cc by 4.0 which means that we would like you to cite us if you use glygen and there are two papers that are already published. So paper number one is glygen computational informatics resource in glycopiology and paper number two is glygen data model and data processing workflow. So this one is a is a recent paper. It provides you more information about the APIs and the sparkle queries. And finally of course we have the I won't get knowledge. The funding glygen is supported by NIH life of science common fund program and that's it for today and if you have any questions I'll be happy to answer. Mike and I will be happy to answer.