 So, as Anne said, I'm Dave Wishard and I've been teaching this course I guess now, it's going on six or seven years. I've been involved in metabolomics for about 20 years. I started actually before it had a name. Our original term for it was actually chemical genomics and we started a company called Kinomics, which was a merge of chemical and genomics. But that name didn't stick, so metabolomics is what most people now call it. Before that it used to be called metabolomics with an N. And over the years I've been involved in a variety of things relating to the development of databases and the software and also technology and techniques to make metabolomics a little easier. And it was because of the work we've done that we started this program or course with CBW. I've actually been involved in CBW for also 20 years teaching things like proteomics and microarrays and general bioinformatics. But this is sort of a continuation of something that I'm quite interested in and certainly glad that there's a lot of you that are also interested. So this is a casual course. I think you can tell by the fact that we're eating and drinking and sitting around tables. So if you have questions, feel free to interrupt me. You might have to wave your arms because I think some of you will be out of my field of view. It's a little unusual format in terms of the structure of the classroom, but hopefully it's conducive for people to talk. So I'll start because this is a fairly long initial lecture here. So as Anne said, all of these things are part of the Creative Commons licensing so you can use the slides as you wish. This is an introduction to metabolomics and as Anne said, it's really an opportunity to get everyone up to speed. There are some of you that have more of a background in computing, others more in analytical science, others perhaps more in statistics. And I just wanted to try and give an introduction so that people are kind of aware of what we're talking about or have the same set of knowledge. So each of our presentations typically has some official learning objectives. So we're going to be talking about metabolomics and defining the size of the different metabolomes. Everyone seems to be working on different, very different metabolomes from microbiomes to plants to animals. We're going to talk about some of the applications of metabolomics. We're going to talk about some of the operational principles of metabolomics techniques. Some of you do NMR, some of you do GCMS, most of you probably do LCMS if you are in the lab. And then we're going to talk about the differences between targeted and untargeted metabolomics. Most of our work for this workshop will be targeted, partly because of the enormous resources that are required to do untargeted metabolomics and the limited time that we have. So this is just a reminder about the schedule and I think most of you have already seen it. So I'll dive right in. This is a slide I use a lot that I usually call the pyramid of life. It's a way of relating the genome to the proteome to the metabolome. So the base of the pyramid is the genome. And that represents, from which all things are coded, DNA leads to or is translated to proteins. And proteins are essentially designed to act on or transport metabolites. Metabolites then also act back on to the genome and the proteome. And so, in fact, there's a nice feedback. Of course, the study of genes is genomics. The study of proteins is proteomics. The study of metabolites is metabolomics. One reason why I've put it into a bit of a hierarchy of pyramid is to illustrate that there isn't a difference in influence. As you go up this pyramid, there is an increasing influence on the environment on the different owns. So what you're eating or drinking hopefully has no influence on your genome. Otherwise, you'll all be turning into green slime. So our genome is well-protected. It will also, what you're eating, drinking will slightly modify your proteome. It's going to change some levels of insulin or girellin, a couple of other hormones, but not many. But what you are eating and drinking and breathing is changing your metabolome every second and quite substantially. So the metabolome is influenced by the environment. It's also influenced by the genome. And so anything that lies at the interface between the environment and the genome makes for a really good tool for measuring the phenotype. So metabolomics is actually the best method for measuring a phenotype. So it's technically a chemical phenotype, but it's a wonderful way of exploring those things. The other thing to remember is that as you climb up the pyramid as well, there's also a change in physiological influence. We tend to think of things in a reductionist way. We look at even humans as a single cell organism. We often look at cells and think of cells. But in fact, humans, mammals, most vertebrates have very complicated physiology. And in fact, most of the organs in your body are designed to have very distinct and very specific metabolism. So your heart has a very different metabolism than your brain. Your skeletal muscles have a very different metabolism than your liver. Kidney has a different metabolic function than the lungs. Each of those is unique. Each of them expresses a unique metabolome, the tissue level. Each contributes to the metabolome that you ultimately see. Genetically, all your different organs are identical. At the protein level, there are subtle differences in your proteome. But metabolically, there are profound differences in each of your organs. So this again is something which makes metabolomics quite useful. It allows you to explore physiology generally more precisely than either genomics or proteomics. One way of relating metabolomics, and a lot of people struggle over the definition, all of you I think are familiar enough so we want to dive into it. Typically, we're using high throughput technologies, whether it's for genomics or proteomics or metabolomics, and generally trying to characterize as many or most of, in the case of genomics, genes, in the case of metabolomics, small molecules or metabolites. And typically, we're looking at either cells, tissues, organs, or organisms. And so the collection of all of those components is the metabolome, just like the collection of all the genes is the genome. Now, there's a lot of debate about what is a metabolite. There probably never will be any perfect clarity. Generally, we include essentially all small molecules. Most people just include organic molecules, but metal ions are also important. We use a cutoff. Generally, the cutoff is about 1,500 Dalton. Some people use a cutoff of 2,000 Dalton. So at that level, that includes some peptides. It includes some small, short DNA, RNA fragments. But it mostly includes things that we largely believe to be metabolites like sugars and nucleosides, organic acids, amines, amino acids. Those are sort of the natural or endogenous things that we think of in terms of the metabolome. But there's a lot of other things that we also need to include, particularly things from food. This could include plant alkaloids, food additives. It could include things you're not supposed to eat like toxins, pollutants. It also includes the drug or drugome and the drug metabolites. And as many of you are also studying the microbiome, it also includes a lot of microbial products that are produced by your gut or by other places where bacteria grow. Now, a metabolite and metabolome is essentially something that's defined by what we can detect. And so that size of metabolome grows with our improved ability to detect things. Right now, the absolute lower limit of detection is perhaps a few picomolar. Most things are generally nanomolar and above. But as instrumentation improves, as our techniques improve, the number of metabolites that we will see and measure and what we define as the metabolome will invariably grow. So the metabolome, as I said, is this collection of all those small molecules. It includes both endogenous and exogenous. It includes transient molecules that are intermediates in various reactions. In many cases, it includes theoretical molecules because in most cases, actually 99% of the cases, we have never fully isolated those molecules, yet we do know they exist. In fact, a good portion of what you know or learn in biochemistry, those molecules are not purchasable. And perhaps maybe you were once isolated but then lost. The size of the metabolome is technically ill-defined. It's ill-defined because of the differences in detection technology. So unlike the genome where we're trying to get a pretty firm number, and in many cases we have an exact number for the number of genes in E. coli, an exact number for the number of genes in yeast, a rough number for humans, metabolome will never have a firm number. So one way of thinking about it is there are different metabolome sizes for different types of organisms. Mammals have about 80,000 endogenous metabolites that have currently been tabulated. If we look at a collection of all microbes, it's probably about 100,000 compounds that they produce. Any given microbe might have around 3,000 to 5,000 molecules associated with it. The greatest diversity of metabolites is found in plants. It estimates around 300,000 different chemicals across all members of the plant kingdom. Those numbers are always under revision and likely we'll find that there are roughly 1 tenth of what the true numbers are. But this is what we know right now. This is what we can detect right now. Now if we look at the human metabolome, this is something that we've been working on in my lab and Jeff has been working on since he was a little boy. Some of my staff who are also here have also been working on this. These are databases and resources that we've created to try and understand what is in humans and in many cases many other mammals. Right now among the endogenous metabolites we have this resource called the human metabolome database. It includes both endogenous and exogenous. So there's about 80,000 endogenous and 30,000 exogenous. So there's about 114,000 compounds in the human metabolome database. Concentrations of those metabolites range from low picomolar to almost the molar levels, in particular urea and urine can get up to that level in some cases. There's another set of compounds, the drugs or the drug ohm, whether they're prescription over the counter or illicit drugs. There's about 2,300 that we have tabulated. Those are kept in that resource called Drug Bank. And again, those are ranging typically from micromolar concentrations down to pico- or femtomolar. Another resource called FoodDB or Food Database that we've been developing includes almost 30,000 chemicals that are found in foods, common foods, about 700 that we've tabulated so far. Includes the phytochemicals and the food additives as well. In addition to the drugs, there are drug metabolites. Every drug is metabolized in some way or another. Generally, they're at lower concentrations. They range in the submicromolar or high nanomolar level down to pico- or femtomolar. In some cases, drug metabolites are just as important as the drug themselves. In fact, some drug metabolites are as primarily the effective portion of the drug. At the lowest level, hopefully, for healthy people, is a collection of the toxins and environmental chemicals. And we keep track of this in a database called the T3DB or the Toxic Exposome Database. There's about 3,600 that we've tabulated in there. But we've also tabulated another 60,000, which are considered contaminants, which are not necessarily toxic but are just simply there. And so this represents another collection of exogenous molecules that we'll learn about a little later. So those are things that we know. What about the things that we don't know? Well, there's large collections of lipids. More and more of the human metabolism database is including those, but there are also lots of other lipids. And we estimate there are anywhere between 100 and 200,000 lipids and lipid derivatives. They're at very low concentrations, often below nanomolar, even picomolar. Well, we know that there's 1,200 drug metabolites that have been reported. There's probably closer to 10,000 drug metabolites that exist for all the drugs. We also know that, well, there's maybe 30,000 food chemicals. There's also food metabolites or derivatives of those food components. Again, it could be anywhere from 100 to 200,000 of those. And then within our own metabolome, the endogenous molecules, they're also transformed through phase one and phase two biotransformations. We call this the secondary endogenous metabolites. And there could be anywhere from 100,000 to a million of those molecules. So these largely represent theoretical metabolites. These things have not been formally identified. They're not in the databases yet. But they're all of that dark matter that some of you see when you do untargeted mass spec, or when people talk about things that they just simply can't identify. And currently in metabolomics, we're generally lucky if we do untargeted metabolomics to identify more than about 2% of all of the peaks or features. So it does tell us there's a lot of the unknown or theoretical collection of metabolites. And that's both in humans and plants and other organisms. And most of you are doing metabolomics and obviously you're attracted to it because it's of interest and we think it's important. And I think there's some useful statistics. When we look at the common clinical diagnostic assays, more than 95% of them measure small molecules. In Canada, there's about 330 metabolite tests that are performed or that are approved. Whereas when you look at things like genomics, there's about 130 gene tests. When you look at protein tests, there's about 110 clinically approved. When you look at transcriptomics tests, there's about five. And then when you look at actually true proteomic tests, it's officially zero. So in fact, metabolomics in terms of clinical use is more widespread than any other field of omics. We also know that almost 90% of all drugs are small molecules. Yes, there are big molecules like antibodies, but 90% of the drugs that are sold today and 90% of the new drugs that are discovered are small molecules. Of those drugs, half of them are derived from known metabolites. So most drug discovery is inspired by the metabolites in either ourselves or other organisms. Even in the realm of genetic diseases, a third of all the identified genetic disorders involve disorders of small molecules called inborn errors of metabolism. Yet the thing we tend to forget is that small molecules play a critical role in signaling, and they are cofactors for many enzymatic processes. Fortunately, most of that information is not captured in existing databases like KEG, and unfortunately not yet captured in Metabolanalyst. What metabolites are are really the canaries of the genome. They amplify small changes in the genome. A single mutation, a single base change in a specific gene can lead to up to a 10,000-fold change in metabolite levels, either going up or down. And it's because of that that historically the first genetic tests done in the early 1900s were not looking at genes, but were simply looking at metabolites. These were diseases like alcapdanuria or phenylketanuria, all discovered by Archibald Garot in the 1900s, where genetic diseases all characterized by metabolites and metabolite levels. So a metabolome is not only sensitive to the genome, it's also sensitive to time. This is an example more theoretical than real, but how different metabolomes, proteomes, and genomes would change over time. So if you get a big meal and eat it really quickly, like this person, you'll see rather profound changes at many levels in the metabolome, relatively modest changes that will be slower at the proteome, and essentially no changes at the genome level. So a metabolome is sensitive not only to the environment, but it's also sensitive to time and temporal courses. This is both good and bad. The metabolism itself, that is, catabolism and anabolism is well understood. Some of you are perhaps old enough to have seen these types of charts stuck on walls. This is a chart illustrated in the 1960s, and it really hasn't changed, and there's no need to change it, because by then most of the metabolic pathways were well understood. As I say, this focus is on catabolism and anabolism, but what we don't understand, and I'll emphasize a lot of this over the next day or two, is we don't understand how metabolites play a role in signaling, which is ultimately their main role. The other thing to remember is the metabolome is connected to all the other ohms. It's connected to the proteome, it's connected to the transcriptome, it's connected to the genome. Each of them informs the other, and this is something fundamental to systems biology or multi-omics integration. So if we're looking at it in detail, we can say that the small molecules are obviously like AMP, CMP, GMP, TMP, are the components that make up RNA and DNA. Without these small molecules, we wouldn't have genes or transcripts. Likewise, amino acids, the 20 naturally occurring amino acids are the constituents of the proteome. The lipidome is what gives all of our cells their shape, their integrity, and their structure. Of course, other small molecules are the source for all cellular energy, which keeps you going. They are also the cofactors in signaling molecules that essentially intermediates, that allow signaling to go on between the proteome and the genome. One way that people have actually sort of turned things on their head is to say that basically the genome and proteome have largely evolved to catalyzed chemistry. Most life processes probably could have or would have occurred without proteins or genes, and perhaps actually did on a very slow basis. But to sort of accelerate things, genes and proteins appeared to help move that chemistry along. The other thing we're going to emphasize the next few days is that metabolomics essentially helps enable systems biology, and we see this over and over again where people come to us with data where they'll have genomic data, proteomic data, and metabolomic data, and they'll first bring it to the genomics people and they'll say can't touch it because it's got protein in it. And they'll bring it to the proteomics people and they'll say we can't touch it because it's got genomics and metabolomics data. But usually when they give us all three data sets, we're actually able to handle all three because in many respects metabolomics lies at the root to all of these other omics. And we're going to learn about some of those tools that are typically used in metabolomics that help connect you to the proteome and the genome. By connecting through both bioinformatics, statistics, and cheminformatics, you can kind of blur the boundaries between metabolomics, proteomics, and genomics. And again, that's something that we're going to try and do over the next couple days. And essentially allows you to do things like systems biology. There are lots of applications and all of you have, I think, given some other really fascinating examples of how metabolomics can be used. And I'm not going to go into all these details of what you can use it for. But some of them perhaps are surprising. Some in the areas of imaging, oil and petrochemical analysis, water quality testing, obviously lots in food and beverage testing. I'm going to switch over now to talk about methods. And again, many of you are already doing metabolomics. So this may be review, but there's a few of you who are relatively new to the area. And so we're going to try and do this so you're all up to speed. So in terms of a workflow, metabolomics typically starts with a biological sample, often with tissues. If you are working with tissues, you will have to extract metabolites from those. Tissues are alive. They have to be quickly frozen so that you don't get any perturbations. And they have to be handled well frozen if you're dealing with tissues. Even the extraction process has to be done so that you quench all of the metabolism. If you don't do that, you're going to get garbage for data. Now, another route rather than going to tissues is actually go to biofluids and to let the body do the extraction itself. So the body is pretty good at extracting metabolites, especially in the blood or urine or cerebral spinal fluid or tree sap with plants. And in those cases, you can generally work with those fluids for a few minutes, even up to a couple of hours, without having to panic about whether they'll be destroyed without metabolic quenching. But again, they have to be handled with care. In the end, what you're trying to do is get a fluidized collection or mixture of metabolites. It's when you have a fluid that you can start using the analytical equipment like NMR, GCMS, LCMS, liquid chromatography to start analyzing those things. The chemical analysis and the tools for those have been around for 50, 60 years. So most of the steps from tissue extraction and biofluid collection and chemical analysis are pretty standard. Where the revolution has been over the last 10 or 15 years has been the software and the databases that allow you to work with mixtures as opposed to pure compounds to extract and interpret data and knowledge and information from those spectrum. So it's the last step, which has been the most important. It's the last step that we're going to focus on for most of the next two days. Now there is a weakness to the different ohms, particularly with metabolomics. So I've been talking about how great it is, but it's also pretty weak right now. In terms of what you can cover. So there's literally hundreds of thousands of molecules that we know of or think we know of in the metabolome, but most of you, if you are doing metabolomics, will never measure or quantify or identify more than 200 molecules. So that represents a fraction of 1% of the known metabolome. If you're doing proteomics, most methods can give you pretty comprehensive coverage from anywhere from one quarter to one half of all the proteins. And with genomics, routinely, we completely characterize the entire genome. So the coverage goes from 100% to maybe 50% to less than 1%. So that is an issue, and one that many people are trying to resolve, but it is a challenge in metabolomics. The reason why coverage is so difficult has to do with the diversity of the chemical space we're looking at. So in genomics, there are just four bases. The chemistry for those is pretty well known. And in fact, chemists worked out many of the techniques to do sequencing as far back as the 50s and 60s. Same thing with proteomics. There are just 20 amino acids. They're all joined by a single type of amide bond. And again, the chemistry to characterize proteins and sequences, sequence them was worked out in the 1950s. Right now, there's something on the order of 300,000 known chemicals in the different metabolomes, probably more than that, probably close to 3 million. But it's a vast number. And none of these compounds necessarily fit into nice, neat categories. Currently, we estimate you could group them into maybe 2,500 very broad categories and tens of thousands of subcategories. Each of them requires its own type separation and chemical analysis to be able to identify and quantify them. So the fundamental reason why metabolomics is difficult is the chemical diversity is greater. The result is that we have to use lots of different technologies in metabolomics. We're not wedded to one tool, unlike, say, in proteomics where it's just LC-MS or unlike in genomics where it's just a sequencer. Typically in metabolomics, you have to know something about UPLC and HPLC and capillary electrophoresis. You have to be able to work with liquid chromatography mass spec, gas chromatography mass spec, FTIR, as some of you have mentioned, NMR spectroscopy. Some people even use crystallography. We'll use different types of mass spectrometers, MS-MS, MS to the end, single MS, with all kinds of configurations, single quad, triple quad, QTOFs, Fourier transform, cyclotron, orbitrap. So it is difficult to do metabolomics, and there are very few people I know of on the planet who could handle any or most of those instruments. So it makes for a challenging protocol to be able to characterize the metabolome. So I'm going to talk about these technologies again. Some of you can zone out because you know these technologies fairly well. Others, if you I know, will be pretty new to this, and so if you want to tune in, that's great. So the first thing we're going to talk about is chromatography. This is the separation of mixtures. Whenever we're doing something in metabolomics, we are dealing fundamentally with mixtures. And in many cases, we need some kind of chromatography to separate those components out. So chromatography has been around for centuries, literally. It's basically separating components by passing mixtures through dissolved in a mobile phase, usually water. So it could include blood or urine through a stationary phase, and that's the matrix, and it could be powdered ceramic, silica, it could be sephrose. But that solid stationary phase is what allows the analyte to either temporarily or transiently bind, thereby allowing it to separate. So the differential partitioning between the mobile and stationary phases is a key to compound separation. You can do chromatography through columns. You can use it through thin layer plates that can liquid chromatography, gas chromatography, ion exchange, size exclusion, reverse phase, hillock, gravity pressure, high pressure, ultra high pressure. Today, most of the separation done for small molecules is done through high pressure, liquid chromatography, also called high performance chromatography. It's been around for more than 40 years. Typically, the pressure is around 5,000 to 6,000 pounds per square inch, and they use very small, about 5 micron, pressure-stabilized particles. If any of you have tried to do separations on columns with proteins, the particle sizes are much, much larger. Chromatography has been used to actually detect things at the parts per trillion level, depending on the detection technique. And it can be used to separate both polar and non-polar compounds. With HPLC, we have three different modes. There's reverse phase, which is the most common. Normal phase, which is the least common. And hillock, or hydrophilic interaction, like with chromatography, which is the second most common method. So reverse phase is to separate non-polar molecules, hydrophobic molecules. Usually, you're working with a polar mobile phase, often things like acetonitrile water mixes. And the matrix, the column matrix, is very hydrophobic. Normal phase, generally, you've got a polar stationary phase and an organic mobile phase. For hillock, you're trying to basically work both polar stationary phase and a relatively polar mobile phase. Hillock, as I said, is great for separating hydrophilic molecules and a lot of metabolites in the metabolome are hydrophilic. There are different types of columns that are typically used in HPLC. Some are metal. Some are made from a plastic called peak, polyethylene ether ketone. Others rarely made out of glass. They all have to be able to tolerate high pressures, and in some cases, moderately high temperatures. They can be anywhere from 20 millimeters long to half a meter long, and they can be as thin as 1 millimeter to as wide as 5 centimeters. Every HPLC column is packed with these tiny beads, about 5 micron size. They're very round. And on their surface, they are decorated with a variety of hydrophobic, or in some cases, hydrophilic compounds. And so that's just showing what a schematic of a reverse phase column bead would look like. There's an electron micrograph of some of them. They're very, very round. And then below that are some of the molecules, the aliphatic chains, or bifinal chains that can be attached, depending on the type of column and the type of separation you want to do. Now, efficiency of separation is measured typically by how well you can separate peaks. And generally the longer the column, the better the separation. So you can see on one side at the top, there's a 50 millimeter column. Two peaks are barely separable. With a 100 millimeter column, the two peaks are nicely separable. The problem with long columns is it takes a long time to run the separation. Another approach is to shrink the bead size from five microns to about one or two microns. And so for the same length of column, 50 millimeters, you can get modest separation with the five millimeter, five micron bead, or very good separation with the 1.7 micron. And 1.7 micron bead size requires much higher pressures. And so this is what you find in UPLC, or Ultra High Pressure Liquid Chromatography. Now, with HPLC, what you'll typically do is you will have a buffer, a solvent. It's called the Mobile Fies. You'll have a pump which pushes things through. And then you'll have a sample injector on Autosampler, which will take tiny amounts of the sample. It will feed it into the line and push it into the column. Every column of HPLC will have some kind of detector, sometimes multiple detectors. Often you'll have a UV visible detector. Other cases you'll have a mass spec, or maybe you'll have both. Those detectors are attached to computers and you generate a chromatogram. Now, rarely do we do HPLC with just a single buffer. Most of the HPLC separations and UPLC separations are done with gradients, meaning that we mix two different buffers, solvent A and solvent B, together with a special mixer. And we'll go from high levels of solvent A to high levels of solvent B over the course of the time. And over that time you'll mix them from 95% A, 5% B to 5% A and 95% B. That mixing of solvents actually allows people to get much better separations. And there are some people that will do three buffers and even four buffers to come up with some pretty complex but elegant separations. But the rest of it is very much the same, same sort of injection, auto sampler, same column, same pressure, very similar detectors. The result is, especially with HPLC, some very nice separations. These are some of the peaks. Typically an HPLC run will take 30 to 40 minutes. HPLC runs can be done in as little as 10 or 15. The separations you get by HPLC are much better than you get by gravity feed, but not quite as good as what you get by gas chromatography. So liquid chromatography is certainly the most common one, but the older method and still arguably the best method for separation is gas chromatography. How many people have used gas chromatography? One, two, three, four, five, six, seven, eight, nine. So maybe about half of you. How many people use liquid chromatography mass spec? About 80% of you. Anyways, gas chromatography requires that the sample has to be vaporized to a gas and that's its main limitation. Most things in our bodies are not that volatile. But if you can make them volatile, then you can use gas chromatography. They're injected into a column, but rather than using a liquid as a mobile phase, the gas and inert gas, usually argon, is the mobile phase. Column itself has a polymer stationary phase that is stuck to the surface of the column. It's not stacked with small beads. It's actually a plastic coating. Columns rather than being 5 or 10 centimeters long are measured in meters, up to 10 meters in length. And while they may not be half a centimeter or a centimeter across, they're more like a couple millimeters across. To make GCMS useful for metabolomics, usually you have to derivatize the compounds. Derivatization is a chemical reaction that makes them typically volatile. And the chemical that's most frequently used is something called trimethylsilane, or TMS. And it will react with a variety of things, and you can also modify it, so it will react with a variety of things. And when you put silil or trimethylsilyl groups on just about any compound, it will lower its boiling point considerably, or its melting point considerably, so that then can be volatilized at modest temperatures. So you can take an amino acid, which is very hard to melt, or you can take even some polymers or sugars and again volatilize them by attaching TMS. So that chemical derivatization is what enables gas chromatography to happen. And it's essentially an absorptive phenomena, so things that have high affinity to the stationary phase come out slower. Things that have a low affinity to the stationary phase come out faster. The separations you get by chromatography are 10 to 100 times better than what you get by HPLC. So the columns, as I say, are very long. They're often wound into coils, because you don't want to have something stretching for 10 meters from one end of the room to the other. So if you wind it up into a small coil, it'll fit into something the size of a telephone. And the column itself is illustrated here. There's a plastic coating on the outside. On the inside, the stationary phase is this polysiloxane polymer. So again, it's got silicon in it. It looks a little bit like polyethylene glycol, but with silicon instead of carbon. And it'll have either benzene or methyl groups along its length. So this is a hydrophobic plastic, if you want. What you measure in gas chromatography and even in liquid chromatography is a retention time or a retention index. Retention time is the time it takes for something to pass through a column. It's affected by a variety of things. So retention times are measured for liquid phase, as well as gas. It can be changed by the flow rate, the carrier, pressure, stationary phase, everything. If you want to be able to compare retention times, what you do in at least gas chromatography is convert it to a retention index. And that's essentially a retention time that's been normalized to retention times of experimentally measured alkanes. And they'll range from essentially, say, hexane all the way up to, say, dodecaene and more, and use those elution times to calibrate retention times and normalize them to a retention index. So with gas chromatography, because it's quite reproducible and because the plate count and resolution is very high, you can actually identify and quantify compounds just from a gas chromatogram. As long as you are able to detect it, the detector could be mass-back, it could be ELSD, it could be UV. But the position, that is, the retention index can tell you a lot about what the molecule could or should be. And then the intensity of the peak tells you how much is there. And so if you have a calibration sample that's been added to your mix, it is possible to absolutely quantify with gas chromatography. So this is an example of a GC chromatogram. We're using mass-back to detect things, but this is the kind of resolution you get. And as I say, it's 10 to 100 times better than liquid chromatography. So it's an exceptionally good chromatographic technique. Now, I've mentioned detectors. The favorite detector for most people, especially with liquid and gas chromatography, is a mass spectrometer. So mass spectrometry, it's an analytical method, as most of you know, just to measure molecular or atomic weights. Many of you have already seen a mass-back, many of you have used them. The principles are pretty simple. Essentially, compounds can be identified uniquely by their mass. Arguably, we could identify every one of you if we had you stand on a weight scale and we knew your weight. And we had a list of your weights and your names. Same thing essentially is done with compounds. We have molecular weights for every compound that we know the structure for. And so if we measure a given molecular weight, we can identify that compound in principle. But there are obviously situations where some cases, like sugars, C6H12O6, many different isomers, exist will have exactly the same molecular weight, but are chemically different. So that's one of the challenges in mass spectrometry. If we can measure with very, very high resolution down to perhaps 1 ppm or 0.0002 Daltons, then we can actually figure out what molecules are, at least their molecular formula, in some cases just from their mass alone. Mass specs are obviously used in proteomics. Higher resolution ones can measure a molecular weight of proteins down to 1 Dalton. So 1 Dalton, again, is just the weight of a hydrogen atom. So we will link up mass spectrometers to different chromatographic systems. Gas chromatography and the mass spec, GCMS. So it separates volatile compounds, so our volatilized, derivatized compounds in a gas column and identifies by mass. LCMS, which most of you have done, separates a little more delicate compounds by HPLC or UPLC. And then MSMS is often used as another tool, almost as a separation tool where you separate first by mass and then you fragment the parent ion into smaller fragments and by looking at those smaller fragments allows you to identify a compound. So mass spectrometers are getting better every year and in the old days you used to be happy if you could measure just the average mass of a compound. These days it's routine to measure the monoisotopic mass. This is an illustration of a compound. This is actually, I think, a peptide here, but showing different masses and abundance for the same peptide. So there are actually four peaks for this particular short peptide and their intensities are defined by having different isotopes, heavy isotopes where there's a change of 1 Dalton or one proton from one isotope to another. This is illustrated here where we're looking at bends of chloride and we can recall that there are isotopes of hydrogen, proton and deuterium, isotopes of carbon, C12 and C13 and also isotopes of chlorine, chlorine 35 and chlorine 37. So deuterium is a very rare isotope, carbon 13 is a relatively abundant of 1.1%. Chlorine 37 is a very abundant isotope, about 32%. And so you can see different masses going up by roughly one Dalton, starting at the most abundant isotope, that includes proton, carbon 12, C, chlorine 35, which would account for 70% of the intensity. But there's another peak that'll show up at 113, another peak at 114, another peak at 115, 116 and 117. The intensity of those peaks are defined by the abundance of those isotopes. So in case of the bends of chloride, we see this isotopic pattern where there's one giant peak, which is the parent ion, next to it one smaller peak, and then the third peak is the second most abundant. This has to do with chlorine. So by the appearance, both the intensity and the number of visible peaks allows you, in many cases, to determine what the chemical composition or the chemical formula is of a given compound. So isotopic distributions are useful, and they are with the high-resolution mass spectrometers very visible. So mass specs have very, very similar principles. Typically, a sample is put through some kind of ionization system. It charges the molecules. They carry either a positive or a negative charge. They go into a mass analyzer, different types of mass analyzers, and then those ions are detected with a detector. So this is a typical mass spectrum of aspirin. You see a tiny peak of about 180. That is the parent ion. This is the fragment peaks. This has to do with the electron ionization process, which shatters the molecule but produces ion fragments. So unlike HPLC or even gas chromatography, mass spectra are very, very sharp, very narrow peaks. The x-axis is the mass-to-charge ratio. So we actually don't literally measure the mass. We have to measure the mass to charge. We have to reduce charged ions. The height of the peak is the relative abundance. It is not useful for absolute quantitation. It always has to be compared, if you want quantitation, with some isotopically labeled equivalent molecule. The intensity really in mass spectrometry has more a measure of the compound's ability to fly or disorb. It's not a measure of abundance. With mass spec, we are always worried about how narrow the peaks are. The width of the peak is a way of measuring the resolution of a mass spectrometer. The better the resolution, the more expensive the instrument. Obviously, the better it is. So the resolving power is measured as the mass number of the observed mass divided by two masses that can be separated. So m over delta m. So here are two masses on the right side. In some cases, they're nicely separated at the 10% level. Others are moderately separated at the 50% level. We talk about a half-width at half-height, just as we do in NMR, same thing in mass spec, to measure the resolution. So this is a schematic way of measuring resolving power in a mass spectrometer. This is perhaps a more useful illustration. This is top mass spectrum of a Q-trap ion. And this is from a peptide, but to gain just for the use of characterization here, we see one large mound, and we get essentially unit resolution for this particular peak. If we go from a Q-trap or ion trap to a Q-toff time-of-flight high-resolution instrument, we now see about seven or eight peaks, and they're all resolved down to two or three decimal places. So one just gives you an average mass. That's the ion trap. The lower one gives you a collection of all the monoisotopic masses. This is a little more detailed, and we can see the different resolving powers. So the monoisotopic mass for this particular molecule is down to three decimal places, 3482.747 plus or minus something. The average mass is 3484. That's where the peak of that big round blue peak, I guess, and we can see with different instruments, say one would be a low-resolution ion trap, red might be a slightly higher resolution linear ion trap, green might be typical of a Q-toff, and black would be typical of an orbitrap or an FDICR instrument. So quite a difference in terms of the peaks and resolution that you can see. So when we think about a mass spec, we can think about it in terms of four or five different components. The inlet, which allows you to put the sample in, gas or liquid, and there's the ionizer or ion source, different methods for ionizing things. There's the mass analyzer, different tools for separating ions by either having them fly through certain times or spin around for certain times. And then there's a detector. Most detectors are now multi-wire or micro-channel plates. All mass specs have to be under high vacuum and all of them have to be connected to computers. So we're going to talk about the ion source a little bit more because they're different types of ion sources used for different things. Electron impact or electron ionization is what's standardly used in gas chromatography mass spec. There's chemical ionization, which can be used, both with GCMS but also LCMS. Both are used for small molecules historically. EI gives you lots of information about structure. Electrospray ionization is for not only small molecules, but can be used for proteins. It allows you to analyze some very big molecules. Another technique called matrix-assisted laser desorption or MOLDI allows you to look at very, very large molecules, almost mega-doltants, but also allows you to characterize small molecules as well. All of these are ways of adding charge to uncharged molecules. All of these are ways of, in some cases, fragmenting molecules so they can be characterized in greater detail. So this is a schematic of an EI, or electron ionization system. So what's shown here is you've got essentially a filament which generates electrons at a precise energy, 70 electron volts. And then you have an inlet which releases gas molecules, things coming from your GC typically. And those electrons collide with the gas molecules, essentially those are the volatilized metabolites, and they fragment them. And as they're fragmented, and they're sent through with a strong charge, with a repeller and a tractor, it sends those ions off into the mass spectrometer, the analyzer. So what you get with EI is both ionization that is charging, but simultaneously fragmentation. So sample is introduced. It's bombarded with electrons from a filament, either rhenium or tungsten. Molecules are shattered because the electrons have such high energies, about 10 to 15 times more energy than the strength of a normal covalent bond. And those fragments are sent to the mass analyzer. So if we had a simple molecule like methanol, these are some examples of the fragment ions that would be produced. First, the methanol, which is neutral, is ionized, so positively charged. But then also we get fragmentation. We can get the formation of double bonds, or we can get cleavage, or we can get the formation of triple bonds. And the result is that we see in an EIMS, or GC-MS spectrum of methanol, is not one peak, but four, or more. First is the molecular ion at 32 Daltons, but then we'll see one that has the double bond and the various fragments. Now, in GC-MS, historically people have used it as a standard way of identifying molecules. There are moderately predictable fragmentation paths for GC-MS or EIMS. And the fact that you can look at those fragments, and people who are very skilled in this area can often piece together the structure of those molecules and figure out what the actual parent molecule is. Now, those are hard ionization methods. Most people work with soft ionization methods, either through moldy or through electrosplate ionization. These don't fragment molecules very much. Often just the parent molecule is still there. In electrospray, what we do is we'll send a liquid sample through a tube, and it's surrounded by a high charge. And it's not unlike what an aerosol spray can is able to do. So it sprays things out, and the course of spraying things out applies a charge. The sample then is sprayed out into a vacuum. So the droplets that emerge from an electrospray system initially are relatively large, but in a vacuum, they start to evaporate. And as they evaporate, the charges start to build up within the droplet, leading to an explosion again, where these small, eventually single-ion droplets start appearing. So it's an elegant way of essentially producing single ions just through the process of evaporation and charge repulsion. So typically with electrospray ionization, we don't use thermal filaments and tungsten orenium. We just simply pump a liquid sample with a volatile buffer through a stainless steel tube, capillary, at a fairly low flow rate. We apply a strong voltage at the tip, and this causes the fluid to nebulize or to spray out, just like with an aerosol spray can. And so the aerosol is then sent into a vacuum. The droplets dry, they shrink, and then charge repulsion causes them to blow up again. They dry and shrink, they blow up again. Eventually you're left with essentially single-molecule droplets that end up hitting the detector. So the charge that you apply, or the voltage that's applied to the tip, varies with the amount of polar and non-polar solvent. So if you use a lot of water and a small amount of acetonitrile as your carrier solvent, it requires much higher voltage to cause nebulization. If you have more acetonitrile, the nebulization charge or voltage is lower. You can go from microspray to nanospray, very low flow rates. You can get away with tiny amounts of material. With electrosplainization, you can't work with salts or detergents, so the sample has to be very clean. To try and get things charged in a positive mode, you'll add something like a formic acid. If you want to do negative mode electrosplainization, you add something like ammonia. Now, once you've been able to ionize things, you have to send these ions into a mass analyzer. And there are different types that have been around. The first one used was called a magnetic sector analyzer. They're still used. Or in geology. They're big and expensive. Commonly used in clinical labs are the quadrupole analyzers, or in some cases triple quads. Low resolution, fast, robust, cheap systems. Some of the higher resolution mass analyzers are the time of flight. They're relatively high throughput. And then there are things like the orbitrap and FTICR, which offer the highest resolution and are generally the most expensive ones. So this is just a table of the mass accuracy that's available with different analyzers. The best, as I said, are obtained with the Fourier transform ICR. These use big magnets or orbitraps. Interesting magnetic sectors, which are the oldest mass spectrometers have for a long time offered the best mass accuracy. And then you can see the time of flights. Some cases, if you use triple quads correctly, you can get high resolution, but often they're more around 200 ppm. When you're running an LC-MS or GC-MS experiment, you'll end up with different types of chromatograms. You can end up with a total ion current chromatogram, a base peak chromatogram, or an extracted ion chromatogram. And they have acronyms like TIC, BPC and EIC. And they're shown below to illustrate just the type and shape that's produced. So almost no one displays the total ion current chromatogram because it looks really messy. The most frequently used or displayed is the base peak chromatogram. It's only displaying the most intense peak from each spectrum. And then also when people get into fine detail, they'll show the extracted ion chromatogram, which just contains a single analyte and its mass spectra, which often may have just a single peak or a couple of peaks derived from some of its fragments. So these are examples of base peak chromatograms from a mass spec, from tomato or plant extracts, where you're seeing peaks corresponding to their elution times, but also the masses as measured for those peaks. So that's a quick run-through for mass spec. You're not going to become experts if you didn't know anything about mass spec, but at least for those of you that have never seen it, you have a bit of an idea. How many people use NMR spectroscopy in their metabolomics? So we've got three people. So every year it shrinks, but we still talk about NMR. So NMR is a technique that's been around for 60 years. And it is, in fact, the original technique used for metabolomics. The whole field got started with NMR. It uses a large magnet, a super-cooled magnet, and the results are spectra that look like this. Very narrow peaks that look just like mass spec or GC or LC chromatogram. In NMR, what you do is you put a sample in a glass tube and put it under a very strong magnetic field. Then you apply radio waves to it. You excite the nuclei, and the nuclei absorb the radio waves in different frequencies. Result is a spectrum, an absorption spectrum, not unlike an absorption spectrum that you would measure for UV, except the peaks are much, much sharper. And they absorb at different frequencies or wavelengths, just like UV absorbs at different wavelengths over the spectrum. NMR is not radioactive. The end for nuclear comes from nuclear magnetism, which is the magnetism of protons. It's spectroscopy, so it's not essentially a non-destructive technique. It sends in radio waves. You can only get the NMR phenomena to occur under strong, very strong magnetic fields. Basically, the magnet has to be strong enough to lift a city bus. The nuclei absorb at different energies or frequencies. The nuclei, mostly made up of protons, will have spins. Some are spinning clockwise, some are spinning counterclockwise, so we call it upspin or downspin. Whenever you have a charge that's spinning, it produces a magnetic field. So spin up, the North Pole is up. Spin down, the North Pole is down. So every molecule, every atom that has a proton or multiple protons essentially has these tiny, tiny magnets. And so what we do is we help orient them by putting these tiny magnets under a super strong magnetic field. Some of them are naturally oriented downspin, others naturally oriented with an upspin. If we excite them with a strong radio pulse, this is a radio frequency pulse, so this is basically a light wave that we don't see with our eyes. It will cause some of the downspins to flip up into upspins. And they'll sit there briefly. And we normally had a movie here, but this doesn't work. But while these spinning tops are going around, the radio frequency will cause those spins to flip down and then flip back up. And they would have this oscillation that comes out from flipping down and flipping up. And it's not unlike if you think of a carillon with many bells all ringing. Big bells and little bells all ringing at the same time. Each of those bells has a specific frequency. In the case of NMR, an amide bell will have a frequency, a methylene bell will have another frequency, a methyline or a methine bell will have a different frequency. So different chemical groups have different or ring at different frequencies, and when they all ring together, you get this. So this looks not unlike something you would, if you were talking into some sort of voice recorder that tracks all of the sounds or the timbre and tone of your voice. All of those are collections of frequencies of all the nuclei bouncing or oscillating back and forth because of the excitation. We use a technique called Fourier transform to convert the frequency or free induction decay, which is a time dependent decay, into a frequency or NMR spectrum. So what we can't see, let's say a whole bunch of oscillations, is now converted to something with three visible peaks. So bigger magnets in NMR are better. Increasing the field strength increases the frequency. It increases the dispersion or separation of the peaks. It also makes the peaks narrower, and that's what we typically want. So typically with modern NMR, a lot of things are automated. Samples can be injected directly into these superconducting magnets. Radio wave receivers and transmitters are sent into the sample, and then the frequencies and free induction decays are measured, converted fairly automatically into spectra. The magnets, as I said, are enormous. They're about the size of a refrigerator. They are superconducting. They have layers of liquid helium and liquid nitrogen. The magnet itself is relatively small, and the inside what occupies almost all the space is the insulation and the cooling. So this is a cutaway view of an NMR magnet. This is more of a schematic, but just illustrating how there are layers and layers. It's just a giant thermos bottle designed to hold liquid helium and liquid nitrogen. And then inside of it is a small probe, which is usually a few centimeters in diameter, connected to some electronics. Inside that probe is a tiny, tiny saddle coil that's made up of wires. It looks like an antenna. And in that coil, you're able to drop an NMR tube, which is about as thick as a pencil, containing about 500 microliters of a sample. It's the saddle coil that is responsible for generating the radiofrequency pulses. It's the saddle coil that's also responsible for detecting the frequencies, the radiofrequencies, that come from the nuclei oscillating. The spectra that we see in NMR have chemical shifts. They have splitting patterns from spin coupling, and they have different peak intensities that tell us the number of hydrogen or carbon atoms that are there. All of that information allows you to characterize and determine a compound structure. So even though most of you use mass spec, the ultimate method for determining the structural molecule still is NMR. It is the gold standard. Chemical shifts tell us a lot about those chemicals. Different hydrogen atoms exhibit different absorption frequencies due to the neighboring atoms, bonds or groups. So depending on whether something is close to or is electropositive or electronegative, based on its chemical shift, we can pretty much identify what group it belongs to. From the chemical shifts, we can also extend further and figure out through coupling patterns where those groups are relative to one another. So this is Bromoethane, and it's illustrating how we can see the methyl group, which is around two parts per million, and the methylene group, which is part A, is around 3.6 ppm. In one case, we can see a triplet, that's for B, and the quartet for A. And this has to do with the coupling of the hydrogens beside each of those groups. So by analyzing NMR spectra, we can generally figure out structures of small molecules, but we can also use those to figure out components of mixtures. NMR spectra have to be processed, and we'll learn a little bit about the processing in the next lecture. They have to be phased. They have to be properly referenced. They have to have water suppression to remove the water signal. They have to be shimmed, and they have to be baseline corrected. So these are processes that sometimes need manual intervention, but the result is a nice pleasing spectrum that looks a lot like a base peak chromatogram for a mass spectrum. This just explains some of the techniques that we use to fix up NMR spectra. Not unlike GCMS, LCMS, NMR, the spectra all kind of look the same. There's just lots of peaks, and they're separated, in this case not by time, but by chemical shift. But in all cases, sharp, fine peaks, and those peaks can be identified or numbered and quantified. So one of the reasons why NMR is not particularly popular is it is the least sensitive of the metabolomics techniques. And the reason why mass spec is particularly popular is it is the most sensitive. So LCMS allows you to get down to nanomolar or even subnanomolar detection levels. Well, NMR is generally good at 10 micromolar and above. So in NMR, metabolomic analysis generally you're lucky if you measure between 50 and 75 compounds. With GCMS, which is in the intermediate zone, you can often identify or partially identify 200 or so compounds. And with LCMS, it's possible to measure or identify 5, 10, or 15,000 features. Now the problem with LCMS is that these things are features. These are not identified compounds. In most cases, even with LCMS, people are measuring around the same number of compounds or identifying the same number of compounds as they get by GCMS or by NMR. So there's an issue with both sensitivity and number of compounds detected. In NMR, because most things are behind micromolar level, everything you see by NMR is known or knowable. In LCMS, 99% of what you see is not known. It makes it hard to publish if you don't know what you're measuring. And so this has been an ongoing issue for metabolomics. Still is. So what we talk about in NMR are the known knowns and the known unknowns and the unknown unknowns. In the LCMS world, while we're seeing lots and lots of peaks, we're mostly dealing with the unknown unknowns. That's hard to process. It's hard to interpret. And it is essential limitation in metabolomics. Now there are two routes to metabolomics. There are targeted routes and there are untargeted metabolomics. Targeted essentially is a method for identifying and quantifying a small set of compounds. Untargeted is essentially a hypothesis-generating thing. It's a way of identifying thousands of features and using statistics to some cases, some situations to identify those features that are driving and sometimes even the metabolites that are driving those differences. So targeted versus untargeted. Targeted, limited coverage, pre-selected molecules. It is not the best tool for hypothesis testing or discovery. It is the best technique for getting quantitation. It is the most rapid and automatable method and it can be very standardized or standardizable. So it is the technique that's used in clinical practices. Untargeted methods essentially have unlimited or nearly unlimited coverage. 10,000, 20,000 features. Obviously, there's appeal for being able to discover something that has never been seen before. It's great for hypothesis-generating. But with untargeted, you only have relative quantitation. It is not yet automated. It is not very fast and it is not very standardized. How many of you do untargeted metabolomics? How many people do targeted metabolomics? How many don't know what they're doing? Okay, so typically you're doing one or the other. Now, in targeted metabolomics, what you'll do is you'll take your sample. You'll start off with a metabolite identification and quantification. You'll produce a long list of metabolite names and concentrations, and then you'll do multivariate statistics. After you do the multivariate statistics and have identified the important up- or down-regulating metabolites, then you do the biological interpretation. In untargeted metabolomics, you'll collect samples, but you have to work with dozens to hundreds of those samples. You don't identify anything in particular except the peaks. Then you use the peaks to do the data reduction, the multivariate statistics, and then you identify the statistically significant peaks. And of those statistically significant peaks, which you've gone down from 10,000 features to maybe 30 features, then you try and identify those features. Now, in targeted metabolomics, there are different methods. There's NMR, there's GCMS, there's direct injection mass spec. There are different volumes that are required. NMR generally requires more direct injection mass spec, less types of metabolites you measure in NMR, mostly water soluble, whereas in mass spec, they're generally hydrophobic. GCMS is kind of intermediate between NMR and LCMS or DIMS. There are different preparation times that are needed. NMR requires very little preparation. Mass spec generally requires more preparation. Collection times relatively short for both NMR and LCMS, relatively long for GCMS. Data analysis typically shorter for NMR, little longer for LCMS. Protection limit is about a thousand times better for LCMS over NMR. The number of compounds that can be identified typically ranges between say 50 to 100 by NMR to perhaps up to 200, in some cases up to 400 by MS. Across the different platforms, you will find that they all measure something a little different. And there's usually only about 10 to 20 molecules that one platform will measure or overlap. So this underlines the point is that when you do metabolomics, it's a good idea to do most or to try two or more platforms, to do GCMS and LCMS, or to do LCMS and NMR, or to do NMR and GCMS, or if possible to do all three together, because together you'll get a more complete picture of the types and collections of molecules. So this just gives you sort of the upper and lower limits in terms of the number of compounds that can be identified and their general sensitivity for NMR, GCMS, DIMS, LCMS, lipidomics. Lipidomics is sort of the biggest win possible because many of the compounds are very similar. So there are tools available for targeted metabolomics or lipidomics that can measure up to 3,000 lipids now. In untargeted metabolomics, what you're dealing with here is thousands of metabolites and hundreds of samples. And with the fact that LCMS is not terribly reproducible. You get drift with liquid chromatography. You'll get drift often also with mass measurements unless you're calibrating. So you get a phenomena called batch variations. You get changes in retention time. So what you have to do in untargeted mass spectrum or untargeted metabolomics is collect a lot of spectra. And these are shown in this diagram of your spectra, whether it's GC or LC. And you have to correct them. So if we collected things over day one, day two, day three, some cases we'll see the LC is running fast. Some cases it's running slow. Some things are spread out. Some things are not as intense as before. So when we merge all of these things together, we get something a little noisy. And what we're trying to do typically in untargeted metabolomics is to try and realign and renormalize those things. So these are some examples and tools that are available for aligning base peak chromatograms in mass spec. And you can see the different colors represent the different positions over the different days and times. And you can see the spreads that are produced. And there are different tools from cow to XCMS to block shift to performing these alignments. So this is an example of the alignment. And then there's the scaling and normalization that also has to be done with untargeted metabolomics. All of this is done so that you can compare all of the features and align all of the features so that you're essentially talking the same language. So again, over the hundreds of spectra or the multiple days or weeks you're collecting, you then have to calibrate. So in some cases, sample intensities continue to drop. You have to rescale them so that they are all the same intensities. Some have very unusual intensity changes. They go down, then up. Again, normalization has to be done to try and get those properly scaled and properly normalized. So with an HPLC system and an MS system, you'll have both retention time and the mass to charge. And you'll produce these two-dimensional plots which will show both the positions of the peaks and their intensities and also their mass to charge ratio. In an ideal situation, this is the sort of thing you'd get. If you had measured several thousand spectra, several thousand chromatograms, you'd get some very, very clear distinct peaks. Some of the isotope patterns are visible in this. This is the ideal. This is typically the reality with untargeted mass spec. You have things with very different retention times. You have the scaling, the normalization, and overlap problems. Whereas in this case, there are maybe ten different compounds clearly identifiable. This might suggest that there are literally hundreds of different compounds if you haven't done your alignment or scaling properly. So what a lot of the software tools that are used in untargeted metabolomics are to convert the mass on the top to something that's simpler at the bottom. And then from that simple set to identify or produce this list of peaks and intensities, in some cases named compounds and quantities. So in metabolomics, whether it's untargeted or targeted, our first task is to go from the spectra to lists. That's what we're going to focus on for the next lecture. So these lists, as I say, could be named compounds or unnamed features. They could be absolute concentrations, relative concentrations, or peak intensities. After you go from your lists, the next thing is to try and interpret them. So that is to either identify from lists to pathways or significant compounds and their association with pathways. We'll talk about that later today. And then also from those lists to models or to biomarkers. And we'll also talk about that tomorrow. So there's lots of informatics challenges in metabolomics going from the spectra to lists and lists to pathways. There's assessing data integrity and quality, alignment to normalization, classification, assessment of significance, metabolite identification and quantification. And those things can be in different orders, depending on whether you're doing targeted or untargeted metabolomics. And then the lists to pathways and biomarkers involves a lot of the biological interpretation, which we'll focus on later today and also tomorrow.