 So same slides, but this is about databases for biological interpretation. So there's really two types of interpretation in metabolomics. One is the multivariate statistical approach. It's the same thing that you do in proteomics, transcriptomics, genomics. You have lots of data and lots of parameters, then you need to use advanced statistics. But that's only one part. The other part is to actually understand the biology. And so getting nice clusters or heat maps doesn't necessarily or network diagrams doesn't tell you about the biology. And this is why databases have been created. So I want to teach you guys a little bit about the organism, some specific and purpose specific databases that help with metabolomic and metabolite data interpretation. I want to talk about some of the applications of these databases. And I also want to show that there are other paths or processes where you can use pathway databases to understand the connections of metabolites to genes to proteins to physiology and biology. And from my perspective, I think pathways and pathway databases are probably the most important component of understanding biology or interpreting biology. Now there are limitations with pathway databases, but it's evolving and I think this is where the field has to go. So we spent the lab and part of day learning about how to go from spectra to lists of metabolites and concentrations. So those are the CSV files that hopefully most or all of you have uploaded to the Dropbox. What we're going to focus on now is going from those lists to biology. And we're going to do some of that tomorrow. We're doing some of that today. So what we normally get when we do genomics, proteomics, transcriptomics, microbiomics, whoever metabolomics is a list of names. So it could be genes, proteins, metabolites, and their relative values, concentrations, whatever something name and number. So in the case of metabolomics, how do we determine if a certain metabolite level is too high or too low? What's normal and abnormal? How do we determine the origin of those compounds? So if you say, you know, what's gamma-amino-uteric acid doing here? Is that exogenous? Is it something produced by the body? Is it produced by bacteria? Is it coming from the food? If we have some unusually high metabolites or some unusually low ones or some unusually strange ones, what pathways are those associated with? What pathologies or diseases are they involved with? And then if we've been looking at metabolites, we have to remember that these are usually transformed, catabolized, absorbed, processed, bound to proteins. And proteins are coated by genes. So what are the genes and proteins that are responsible for these metabolites? What are the ones that are responsible for producing them, catabolizing them, modifying them? And that's also what connects metabolomics to genomics and proteomics. So the answer to all these questions really lies in databases. And that's why I'm talking about databases today. So lots of chemical databases, and many of you probably have heard of PubChem. Some of you have heard of PubChem spider. There's also Kebby. These are databases that are generally structure and nomenclature databases. PubChem tries to cover all known chemicals. It doesn't care whether they're synthetic compounds or not synthetic or from snails or from humans or from ocean or something else. It's a collection of all known chemicals. So there's very little biology associated with them. They're very little organisms specificity that's associated with them. Unfortunately, a lot of people make the same mistake, which is to just say I found a peak. I found a mass. I'm just going to do a chem spider PubChem search. In those cases, 99.9% of the time, your search will be wrong. You'll find the wrong compound. So I'm not sure why people keep on doing something where you're guaranteed to be wrong, but a lot of people do that. And the reason why you're going to be wrong 99.9% of the time is that 99.9% of the compounds in PubChem or ChemSpider are not metabolites. They are synthetic man made chemicals that are used in drug screening libraries. Some of them are theoretical compounds. In fact, many of them are. They have never been released into nature. They don't exist outside the lab. So there's no way they could be in you, or in a rat or a mouse or a snail or fish or insect or grass or soil. So that's something that people need to remember. Now Kebby's different. It's a, it's a little different chemicals of biological interest. So a lot of them are metabolites, but again, some of them could be from exotic plants in Indonesia or Botswana. Some of them could be isolated from the bottom of the Marianas trench in the ocean. So the, again, these are compounds that you're not going to find in you or any other organism you're working with. So when we need our metabolomics databases that link the biology to chemistry, and also link the chemistry to specific applications. So if something's a drug or something's a toxin or cosmetic or something is plant only or human only, those are the types of databases and that's the type of information we need. So this was brought up again about six or seven years ago where a number of people wrote that we need to focus on model organism metabolomes and model organism metabolomic databases. And this is just highlighting that, you know, trying to find a human metabolite but looking in a plant database or trying to find a plant metabolite and looking in the human metabolome database just doesn't make sense. There are food databases, they're micro databases or at least there are micro metabolomes and food metabolomes. And so we need those kind of specific organism databases. So we've been working on this in teamic and in my lab for many, many years since 2006. So we started for the human metabolome database. And then we also developed drug database, we also at the same time developed drug database. Subsequently developed a yeast metabolome database and E. coli metabolome database, a toxic exposome database, can contaminate database food database. But we've also tried to create pathway databases like Smith DB small molecule pathway database and the path bank database. Exposome and polyphenol databases, these have all been created over the years to help make the understanding of metabolomics and metabolomic data much easier. So going to talk about some of them in more detail. So the human metabolome database has been around since 2006. This is the website. Most of you probably have used this database or seen it at one time or another. And if you haven't, this is sort of the standard database for metabolomics. So it has information about the human metabolites descriptions their structures their pathways origins concentrations functions, and their reference NMR GCMS LCMS CCS RIR T data. There's a quarter million metabolites, about 130,000 pathways, millions of NMR mass spectra, and then marker data and concentration data covering 1,600 diseases. So it's not just a list. This is an encyclopedia. And it's designed to get those reference values for human diseases human exposures and population health. So it's intended to be useful not only for targeted but also untargeted metabolomics and also for things like exposure. When you pop open the database, you'll find out very detailed descriptions of metabolites, you'll get their reference spectra NMR BSI EI. And you'll get information as I said about the retention times retention indices question cross section. Some of it's predicted some of it's measured. So you can search chemicals can search for similar chemicals through something called a tiny motto score, which measures the overall structural similarity. You can match formulas you can match masses you can match spectrum. So here's my spectrum. Does this match to something in the database. So this is used a lot to help identify or confirm compounds that people have either isolated or purified or think they have found. So here's the actual searches so here's a list of masses to two or three decimal places tells us what addicts we're looking for. Hopefully these have been de addict but then you can also choose certain addict tables and it will produce a list of possible matches and indicating if these are addict forms or just pure MS forms. So you know what you put in and you can also sub select to make sure that you were choosing the right mass or mass to charge set. So I'll give you the match the compound name a link, and how close you are in terms of a match. Generally mass matching alone isn't the best way to go I indicated the problem so typically might want to use MS MS matching. So you can also do the same thing you can submit a list of masses from a spectrum that you've collected on a triple quarter q top or whatever, along with their intensities. Feed that in and again it'll search and list the best matching metabolites and give you essentially sort of a match factor like score and rank them as most similar release someone. You can compare this to NMR as well. If you have a isolated pure compound want to know what it's like what it's similar to, you can compare this to the experimentally measured and even computationally predicted NMR spectrum. Some cases you have a structure, but you want to know if it's similar to other structures, or you've determined a structure and you want to know is it unique. And as I said, most of the time it's not because the structures are almost every metabolite have already been determined. So here I'm here it's essentially the hits based on the Tanny Moto score and listing them in terms of most similar to least similar. There are databases within HMDB that have the reference metabolite concentrations for different diseases as well as normal values so there's normal and abnormal. There are 15 different biofluids that can include blood and CSF and urine, and it covers up to 5000 different metabolites depending on the biofluid. So it's important to help with clinicians, chemists, researchers working on human or largely mammalian metabolomes. As far as we know this is the largest and most complete resource of its kind. So people use the HMDB to learn about metabolites because there's detailed descriptions about them. It tells you about the biology, the chemistry, the functional rules. To determine normal and abnormal concentrations for different tissues of different biofluids. To determine the origin of compounds, is it a microbial compound, is it endogenous, is it a food compound, that information is available, usually through HMDB. But because it also has this link to pathways and enzymes and receptors that are involved in metabolites, it also gives you that information in terms of their biology. So we do a lot of search on a given compound and read all about it in some textbooks and other ones. But the idea is to try and consolidate this into a single resource. Now that's HMDB, there's another database which is less widely known, but it is I think useful and this is called marker DB. So this is developed as a biomarker database. And with a focus of primarily validated biomarkers. And biomarkers can be diagnostic, prognostic, predictive, or they can also be just simply exposure markers. Now, marker DB is not restricted just to metabolites includes protein markers and gene markers. There's a lot of metabolite markers in this database. And that's growing. There's a small number of protein markers, and that's sort of a reflection of the challenges of doing proteins and proteomics. There's a lot of genetic disease markers. These are just the literally thousands of mutations that are found. So the CSF or cystic fibrosis gene has about 2000 mutations. That one gene gets 2000 markers. So the total number of genes with markers is probably a couple hundred. So overall, the number of markers overwhelmingly is metabolites and that's just a reflection of how important metabolites are. You can search the database through structures you can search by sequences and search by text. So a lot of people can use a resource like this to find disease associations for metabolites. So what is butyric acid? What's it a marker for? We found a sample that had very high values. What does that mean? In other cases, people may find that I'm seeing a marker consistently for lung cancer in this sample. Has anyone else found the same markers? That helps you save on both the research but also exploring it. There are exposure markers. There's a lot of interesting compounds that tell us whether you've had citrus fruits or asparagus or wine. And when these compounds show up, this can give you a lot of information about diet. It also allows you to assess your newly discovered markers against the same markers or the same conditions. Have I found something truly novel here? Or have I found something that everyone else has found? Now, when you're looking at large populations in metabolomics, you're not only going to measure endogenous metabolites, but you're going to measure drugs. Now, if you take a population of a thousand people, you actually may see about a thousand different drugs. Now, if you take one person, you may see zero or one or two drugs because most of us hopefully aren't taking a thousand drugs. But in large population studies, you need to know that and it was partly because of that that we established this drug database called Drug Bank. Now, it was sort of naive because our focus was how do we find these drugs and if we're doing metabolomics, how to identify them. But we also started including not only the small molecules but include the biologic drugs, the antibodies, the nutraceuticals, the experimental drugs. And then we started including information about the drug target. And that's when the database took off. So this is actually our most popular database and it's actually been commercialized, although it's a public version is available now. But because of that drug target information, that was the first time it was released or available. Now, most of the pharma industry uses this to rediscover and repurpose drugs. So we've estimated that the information in this database is generated about a billion dollars in terms of repurpose drug discovery. But as I said, it was originally developed for metabolomics. It's got information about how drugs are metabolized, which enzymes are metabolizing it has spectral data. And it has quite a bit of information about sequences, spectra, structure, text, and so on. So if you are doing metabolomics drug bank allows you to find the origin of these drugs or drug metabolites or drug like molecules that might be found in urine or biosamples or blood. In some cases by looking for similarities between a product that you found in a plant to a drug that's known or vice versa, you might be able to identify roles of certain metabolites. Say, oh, this looks a lot like a drug. Maybe it has the same target or the same physiological. Because it also has pathways. It helps also understand mechanistic or pathway processes. And as I said, the main purpose of drug bank, which wasn't its original goal is for doing drug discovery and drug repurposing. Now humans don't live on air and water and light. We have to eat food. And it turns out that food compounds food additives and food metabolites make up a substantial portion of what's in our metabolome. And looking at a population of 1000 people is going to be 1000 different diets 1000 different weird foods they like. And there's going to be compounds reflecting that. So we've started gathering this information for about 720 what we call fundamental foods. So we had pizza today, and there are all kinds of recipes for pizza, but fundamentally pizza is flour, cheese, tomato, and then different types of meats or vegetables. So the total number of ingredients on the pizza, even though the recipe may differ, it's probably about 15 to 20. So those are fundamental foods. So just about everything you eat can probably be described as some combination of these 700 or 800 fundamental foods. So that includes, you know, milk and butter and tomatoes and apples and whatever else. But we wanted to include information not only about what's in them but their flavor and their aroma and their color and the effects of some of these food components on health. So we wanted to get information about the concentrations so that people could identify certain food source compounds and to identify which ones are rich, you know, our cashews rich in selenium or our tomatoes rich in in keratin. So like our other databases that has tools to support sequence spectra spectral searches and text searches, but this is intended to cover not only the food components as well as food additives micronutrients. So, as I say if you've got a metabolomic study got these interesting molecules are not sure where they come from. This can help you find the origins of certain molecules or food metabolites or unknowns that are found in food samples. It's unusual to expect that the unknowns you're seeing, or some of the other compounds that seem to be rising and falling are actually a food origin. Again, people are identifying drug like roles for food compounds. People talk a lot about polyphenols and how they help cure all kinds of diseases. Fact is that most polyphenols don't last much beyond your stomach. They're transformed into a whole bunch of smaller micromolecules, and a lot of those actually do have drug like effects. And that's been the fascinating discovery and I think the same sort of thing is trying to understand what are the compounds in food that make them taste or smell really good or really bad, which ones are the compounds that really are healthy and not so healthy. So that's the intention of food DB and that's what it's being used for. Another database that we've been working on for a while is called the natural products magnetic resonance database or npm or D. It's a database focused on NMR spectra, although we will be adding MS spectra. It contains edible plant medicinal plant microbial natural products. And it contains about 300,000 of them. And it has millions of predicted and experimentally measured and simulated NMR spectra. So it has NMR spectrum, their assignments, their distinctions of the compounds potential medicinal uses chemical properties, and the species, and often there's many species that produce these compounds. So it has what we call chemo taxonomic information, like all the other databases people do spectral searches structure searches and tech searches. And we're using a variety of methods to improve curate annotate and maintain the database. So if people take natural products if they have really unusual diets. If they've been exposed to unusual things you can find the origins of these unknowns or these strange molecules that might be in someone or some animal. Again, because we can do structure comparisons we can compare a natural product with a known drug or suspected drug and see how different or similar they are, which then might get us an understanding why or how it could be causing some health effect. So natural products are critical to understanding what's in plants, what's in microbes and other exotic organisms and as I said this chemo taxonomy approach which is finding the biological species of origin of compounds. Humans aren't the only organisms on the planet, we don't produce all the chemicals plants produce a lot microbes produce a lot, and they're pretty interesting. And as I say a lot of them serve as leads for new drugs. Natural products are one thing foods or another drugs or another but then there's other things like pesticides and herbicides and solvents and cosmetics and dies. And these are things that you're exposed to all the time they may be in your water they may be in your clothing. They may be something that you put on your face or skin. And these are sometimes innocuous, but some of them aren't. And certainly if you work in the agriculture industry or garden, you're going to be exposed to some of these things like pesticides and herbicides which also end up in the water system. So we wanted to get information about these things about their effects on the bio biology targets protein receptors is data that we collected from EPA. We collected spectral data toxic targets, and like drug bang can hMVB they have sequence and spectral searches. So game if you're finding something that's strange unknown or not expected. And that's how you can find about these compounds, large production volume LPVs household chemicals pollutants drugs, maybe not for you but maybe for dogs or cats, other unknowns. So these are helping people identify again the origins of certain peaks in their untargeted studies. In many cases you could use the same compound list to start targeted studies to better understand the human expose zone and to measure consistently in it very low concentrations. Again because we have structural data and target deaf people are learning about how these things work, why they're toxic, why they're dangerous, what levels they're dangerous at. Another one that's really interesting is is yeast. This is used in bread but it's also used in beer and wine. In fact yeast account for about a $1 trillion business in food and yeast grow on all kinds of things. And they take unusual substrates and convert them to unusual compounds. So yeast are not just simple microbes we've determined there's a list of about 16,000 different yeast metabolites that can be produced with yeast growing on grapes, yeast growing on malt and barley, yeast growing in wheat. We've generated reactions pathways NMR data protein enzyme associations. So it's a rich resource, a lot of information that's really useful about understanding your flavor and aroma with fermented beverages. As we went into yeast I think we realize there's also need to understand more about the guts and the microbiome. So we've been working on E coli database, classic model microbe. We've got almost 4000 metabolites E coli can grow on a lot of different substrates, and you can find that there's lots of reactions and pathways and references spectra. It's a pretty complicated organism. It's much more complicated I think people realize and also in terms of its metabolism is quite diverse. And that's because it can grow on different substrates. So if you grow your microbes or bacteria and you know, minimal media, you're not going to get a lot of metabolites. E coli can grow on something like catacol, which is moderately toxic, but it produces amazing number of metabolites because of that substrate that can grow on anything. And again, the list of metabolites that bacteria can produce keeps on growing as we try growing microbes on different substrates. Now microbes live in our guts. That means they also grow on the food we eat and the food chemicals we eat, and they process it too. So that's a pretty diverse cat collection of compounds that we give our, our gut micro flora. Now, because of the work on the yeast and E coli, we started getting more and more involved in the human microbial metabolome. So the gut isn't just one micro there's thousands in fact, according to our databases at least 2000 unique microbial species in our gut or on our skin or in our mouse. We've been able to identify about 25,000 metabolites that correspond to microbial metabolites. And so we've tried to put this into a database called mind DB. So these are microbial derived metabolites with information about their microbial origins, their food origins, their reactions and their effects on human health. And we've been able to connect to a whole bunch of different diseases as well. And so this is a fairly comprehensive resource because it's trying to connect not just metabolites, but to, you know, microbial genomes and microbial genomics and human conditions and the fact that most of the things that microbes produce come from the food we eat, which then they grow or convert into a variety of very strange chemicals. There's a whole list of some of the resources still being evolving. We have, you know, data from back map and marker DB and hMDB and path bank and bio transformer. We have information about medications environment diet, there's a variety of databases that have been published that have micro but metabolite connections, things that are consistent, or sometimes inconsistent for different body sites. We've created a number of different views information about the host microbe, the health effects views, the reactions view pathway view to describe and explain how different metabolites are produced and what they do the body. So like hMDB, every metabolite has a detailed description. It has structure as synonyms and other physiological data. You can also look at metabolites in terms of their host source where they've been found and whether they've been found in blood or urine, which concentrations, which sex which health condition they're associated with. And then there's specific health effects, and again, whether some are neuro inflammatory whether some have agonistic antagonistic or inductive effects, some are pro inflammatory some will bind to certain proteins. And then in references to tell you why or how that actually happens. Many microbial metabolites start off with a precursor molecule and a product and so if we wanted to know how does tryptophan help with the production of indoxal sulfate. We know that tryptophan is converted to indole, indoles and converted to indoxal, and then Indoxal is converted to Indoxal sulfate, and these are the enzymes that are used some of them are mammalian some of them are microbial. So for talking about certain types of metabolites. Where would they come what are they associated with so we can get exposure sources or food sources some of these are coming from food or crabs or cocoa beans. Again, food source gives you that information. When you have all this information covering genes, microbes metabolites health effects food sources. You typically have to generate some kind of network view. And this is what the mind DB network viewers looking like, although it's still needing some refinement. So the point of mind DB is to help people understand the microbially derived metabolites. There are some that are purely microbial, and there are others like a puric acid which is called a co metabolite. So this is a compound where both the human endogenous enzymes and the microbes work together to produce a compound. So we wanted to use it to help identify microbial derived food metabolites. So triptophan, something that we don't naturally produce we have to eat it. But triptophan is typically converted into a whole range of compounds which some of which are healthy, some of which are dangerous. We're using this information to get the metabolite data for thousands of microbial genomes to associate these compounds with specific genes and specific genomes. And by connecting microbes to microbial metabolites we can also connect things to health and disease. There are healthy bacteria there are unhealthy bacteria, but bacteria are only healthier healthy because of what metabolites they produce. So microbial produces toxic proteins but most of the harm or most of the benefit for microbes comes from the chemicals they produce, not from just them being bacteria. So microbes are microscopic chemical factories. I've given you examples of, you know, microbial databases, yeast E coli food databases drug databases toxic exposure databases natural product databases. So these provide information about the origin or provenance of many metabolites. They're specific to the biological system. So if you're studying E coli use the E coli database, not the yeast, not the food database. So by having specific organism databases that avoids the mistake of misidentifying biologically impossible or chemically infeasible compounds, like searching through pub chem or chem spider and say oh I found a hit. It must be this obscure cocaine metabolite that only mice smoke. It's not possible. So it's it's something that that mistakes like this are constantly being made. By having organism or person. I guess characteristic databases that are purposes to purposes, you can get a more complete and direct interpretation of your data. It gives you biological context. So what we try and do in metabolomics is not only take our lists and learn more about the metabolites themselves, but to try and gain contextual information, and that's going from lists to pathways. Now you'll get some of that tomorrow as well with some of the work that's done in the tabo analyst. So let's talk about pathway databases. So these are typically a little different from organism specific or purpose specific databases, because those are text based their lists of, you know, numbers and text tables. So pathways are image databases, like if you visual data that relates metabolites to genes proteins diseases signaling events and processes. So there's a lot of what molecular biology or physiology is a good databases good pathway databases give you tools to give you better visualization to do gene mapping protein mapping metabolite mapping. And many pathway databases will cover multiple species. What's known most popular is the keg database that stands for the Kyoto and psychopedia of genes and genomes. I think many people are familiar with the type of structures the pathways that are generated in cake. They have short synopses about specific compounds. There's about 500 canonical pathway diagrams that cover about 6000 organisms. So when you multiply everything together there's about 600,000 pathways in keg, but that's just because they've done 535 times 6000 so it's really a small set. So really about 170 metabolic pathways that are in keg overall that include the compounds of interest. They have some disease pathways they have some signaling pathways and they have some biological process pathways. Pathways they have are quite schematic size they're, you know, very essentially wiring diagrams, and they're limited only to catabolism and anabolism. So actually keg was not developed specifically for metabolomics. It's just that people found it and said oh I can use it. There's another database the reactome database which is maintained both with the EBI and the University of Toronto. It covers 15 model organisms and instead of having maybe 150 pathways per organism it has about 1500 pathways. Now most of the pathways in reactome are protein protein pathways. So there's not so much focus on metabolites, but a lot of protein signaling pathways. There are disease pathways, there are signaling pathways, there are apoptosis pathways, transcription pathways. There's another database called BioPsych which has been around for a while and it covers a collection of things like bacterial databases. They've got seven manually annotated databases including E. Coli database, 71 semi manually generated and 14,000 other pathways for many, many other organisms where they've just taken the genome and said, here's the pathway. There are metabolic pathways. Their concept of a pathway is basically a reaction. So if A goes to B that's a pathway. Whereas in keg you know it has to be the citric acid cycle or something else. So within BioPsych they've got about 250 pathways for humans. Ecosyche there's about 400 pathways and they cover relatively small number of compounds. And this is true for keg, this is true for reactome, this is true for BioPsych. So all these pathway databases, which typically have a couple hundred pathways for animals or humans or bacteria, only cover a few thousand metabolites. Yet I've told you a few times that the number of metabolites in the human metabolome is 250,000 at least, and that in Easter 16,000 E. Coli is 4000. And the pathways and metabolites covered by keg, BioPsych, reactome, tiny, tiny fraction, 5% at most. So most of the pathway databases that have been created over the last three or four decades are just for catabolic and anabolic events. And that's a problem, because metabolites are more than just the bricks and mortar. And metabolites are really critical for signaling. The most important signaling molecule in the body is glucose, but probably none of you knew that. And that's because the only way that glucose is depicted in keg is it's a fuel for the TCA cycle. And that's it. It's an important immune function. And I highlighted some of those discoveries about succinates and fumarate. Important roles that branched chain amino acids have an inflammation in homostasis or to hydroxyglutarate in epigenetic events or in disease processes or in drug action or tissue repair. None of that is in keg, none of that is in reactant, none of that is in BioPsych. And that's a problem, because all of the most significant discoveries in metabolomics weren't about catabolism and anabolism, they're about these things. And unfortunately, there are no pathways in keg, reactome or BioPsych for that. So, again, the discoveries, they didn't use those pathway databases because they're irrelevant. So it's because of the irrelevance of keg and the irrelevance of reactome and the irrelevance of BioPsych that we started working on another database specifically for metabolomics. And this is called a small molecule pathway database or SmithDB. So, initially SmithDB had about 45, almost 50,000 hand drawn pathways, and we started drawing drug action pathways, because drugs have an effect. But we also had a lot of disease pathways and a whole bunch of metabolic pathways, not 100, not 200, but 27,000, and then a whole bunch of signaling pathways. In addition, we wanted to depict more than just the wiring diagram, we wanted to put in things like organs and cell compartments and organelles and protein locations, and the coturnary structure of proteins because that's often important in understanding processes. We also wanted to be able to allow you to map microarray transcript or gene chip and metabolomic or proteomic data to these pathways. And we also wanted to provide facilities that will convert gene and protein or chemicalists into potential pathway sets or disease diagnoses. So this is a typical pathway in SmithDB. This is alanine metabolism, showing how a cell processes alanine. It has links to HMDB, it has links to unit approach, it has detailed descriptions about the pathway with references. And it's colorful. It shows the nucleus, it shows the mitochondria, it shows what's happening inside the mitochondria, it's showing what's happening inside the cytoplasm. So you can provide lists of metabolites and then it will map those metabolites to pathways. These are highlighted with red, as opposed to sort of blue highlights. And so you can see where the compounds are active in which regions or domains. You can put in concentration data and that too also allows you to see a map with orange, yellow, red depending on the concentration of certain metabolites for certain processes. So metabolites are linked to genes, which are the proteins in green, but there's also depictions of the organs where these things happen where the process is important. So the pathways are generated by hand, but they are converted to machine readable formats. So systems biology markup language or systems biology graph networks or biopacks or pathway ML. You can save them as SVG or PNG, you can change them, have them with color, white background, blue background and black and white, keg-like printer friendly. They all have text descriptions, they all have references, they're all linked to HMDB or unit approach, because it's a pathway database for human metabolites. And they're all downloadable and they can all be edited to a software tool that was developed in our group. So again, this is a depiction of the citric acid cycle. Most people don't know that the TCA cycle takes place in the mitochondria, because that's not depicted in keg, it's not depicted in reactant, it's not depicted in biopsych. This shows you that it happens all in the mitochondria. So these cell actions, it can show neuron function, neuroactivity, how metabolites are used as signaling molecules and what they do and how they're transported and how that information is moved from the chemical level all the way to the brain. You can look at gastric acid function, how these things also play a critical role in this is actually muscle contraction rather than gastric acid function. Pancreatic function, same thing, organs depicted, islet cells are depicted, functions and roles and chemistry and biochemistry that they do is also shown. During COVID, we learned a lot about the roles of metabolites in COVID processes and sepsis, which is the viral sepsis that killed a lot of people. There's some very clear metabolic processes that are involved, very signaling processes, various organs that are affected, and this is a collection of all that data from all of those papers that depicts the process. So it's also possible to not only have people generate the pathways, but to have you generate pathways. So we've developed software called pathways. This is a web server that allows you to create your own biological pathways that are machine readable and interactive. You can replicate pathways, you can propagate pathways. So replicating means maybe essentially pathway looks very similar, but it's just in a different system, a different organism, or you can propagate a pathway and say I've drawn it for human. I want to generate one for chimpanzee, orangutan, and gorilla. Again, you just put in different protein names or sequences. They're tools for pallets to draw things, pallets for rendering, pallets for annotation. This also allows you to convert these things to biopaks and SBML. So there's certain methods and protocols, and if you want to learn more about this, you can all go to Epony. She'll raise her hand now. She is the expert on pathways and has been working on it for many years. And she's got some videos and tutorials about how to generate pathways and do it properly. There are SOPs for this. Add reaction elements, add reaction enzymes. There's lots of very simple pull down things, hundreds of different images that you can use to make sure that you're consistent. This is an example of the rendering in pathways. You can zoom in, zoom out. In most cases, you don't have to draw a structure. You just have to choose a name or type in the first few letters. And the same thing with proteins. You can drag and drop, put arrows in, customize components, change the rendering, add organs, add organelles, play around with receptors, type in certain co-factors and add them in various ways. There are YouTube videos. There's a journal and visual editing or JOVE article about how to use pathways. As I said, you can propagate or replicate pathways. Propagation is mapping to other organisms. Replication is making similar pathways. Antibiotic pathways, they all have similar mechanisms. People can actually generate hundreds of pathways at a time through propagation. And as I said, Eponine is the expert. And if you're interested in contributing, happy to have you help in this long process of providing resources for the community. Now, SMIPDB was our first attempt. We realized that, you know, people are interested more than just humans. There are other model organisms. So we created PathBank, which is an extension of SMIPDB. So it includes eight other model organisms. So yeast and rat, mice, cows, E. coli, Drosophila, C. elegans and Arabidopsis. For some of you that may not cover all the model organisms you want to see, but it's grown from about 50,000 pathways to about 120,000 pathways now. So it includes disease, drug, metabolic signaling, protein signaling pathways for these organisms. So microbes are different than humans. The cell is different. They usually have a double membrane. They don't have a nucleus. So you have to depict things differently. So this is an example of a microbial pathway, and it shows how microbes degrade cyanide and cyanate. So what you'll find in a human pathway, they can also make some unusual acids like calanic acid. And so, again, here is a microbe bacterium synthesizing calanic acid. And again, highlighting that there's no nucleus. There's no mitochondria. It's just a bacterial cell with this double membrane. This is sucrose metabolism. Again, something that microbes are pretty good at. And again, it's a pretty complicated pathway, but it is different. It's not in not found in humans. So these are pathways that are specific to microbes. There's an example of protein signaling. So it's not just a metabolite database that is including fairly complicated protein interaction protein protein pathways. So I think it's important to know that databases are important in metabolomics. They're important in all fields of science. And that the databases we've been working on have been trying to include organism specific and purpose specific databases. We've also been building pathway databases, and by including pathway databases and organisms specific databases, the biological interpretation of processes is much easier. I think it's important to know that a lot of the pathway databases that you mostly use, or mostly heard about including keg or reactum really were not developed for metabolomics applications. They give you a key whole view of metabolism and metabolomics. And that's unfortunate because it's not helping the community a whole lot. And most of the discoveries that are highly impactful in metabolomics have nothing to do with what you'll see in keg. So I think the metabolomics community really needs its own databases. And it needs databases that are designed to meet the needs of the community and the community's needs are pretty extensive. I tried to show you some of these specialized metabolomics databases. These are not the only ones, but for a long time they were. And I think these types of tools are really ideal for doing biological interpretation. And I'm hoping, because we do have some time left, although lots of you are pretty tired, to take some of the data that you measured with your quantitative metabolomic analysis and to look at some of these databases and see if you can help, you know, interpret some of the data, understand some of the data. It's not going to be trivial, but you've got a single sample or two samples. How are they different from normal or abnormal? What's unusual? What are these compounds? What sort of pathways might they be involved in? I'd encourage you to use or access some of these databases if you have time. As I say, most of you are pretty tired. It's been a long day. But that was the point I wanted to leave you with. And tomorrow, you guys will be taking some of the data, some of the tools and learning about how to more fully interpret the data. And I'll wrap up and thank you for listening.