 Thanks, everyone, for giving your little bios and discussions about your background and your interests. I think it's really informative for me and for the TAs and for the rest of the crew, including Jeff. It's interesting to see the diversity of backgrounds that everyone has. And this is something that's often characteristic of these workshops. So hopefully you'll find something for everyone here. Obviously, there'll be some people will be learning about some of the techniques who are interested in some of the analytical techniques. Some are interested in doing data analysis and data integration. Some of you are veterans in metabolomics. Some of you are relatively new. Some of you are veterans in bioinformatics and some of you are new to it. So we'll try and accommodate all of those, I think, different backgrounds as much as we can. So without further delay, I guess we'll dive into this. Please feel free to interrupt. Most of you are muted, so you can turn or unmute if you need to talk, but you can also use the chat and some of the other devices that Rashad has shown you. This will be the first course I've given large scale in Zoom. I've done a lot of smaller courses. So there may be a few technical challenges as we go through. So hopefully everyone will be patient as we adjust. So these are the standard slides Rashad has already shown you about how you can share the slides, working on a Creative Commons license. And what we're going to really do for the first part of this course is just a gentle introduction to metabolomics. As I said, some of you are relatively new to metabolomics and we'll talk about some of the technologies for this first lecture and then I'll start diving more and more into the data analysis. This is the outline. We're a little behind schedule right now, so I might have to rush a little bit. This is on Eastern Time. So it's now a little after 11 for those of you on Eastern Time. It's a little after 9 o'clock for me. So as I said, they'll lead this introduction to metabolomics, then we'll go into metabolite identification. We've identified half hour breaks throughout the course. These breaks can be used for lunch for some of you, depending on your time zone. Others, they might be bathroom breaks, but we're also hoping that you can use the breaks to discuss things with us. We can go into various chat rooms. And we can talk about whether you'd like something changed or added or if there's some other questions that come up. The breaks are a good time to do that. So you can see that we've got at least three or four breaks through the day. Around three o'clock, we'll be having a short lab. That lab tends to stretch longer than the hour and a half that we budgeted. And so likely you'll see the lab going into the break. And then we'll close off with a short discussion on databases. And then we have an optional program at the end, seven o'clock your time if you're still awake or still keen to work on this to do a spectral process and functional analysis. And that will be led by some of the TAs. Tomorrow, the focus is on statistics and metabolism analyst. That's a very popular part of this course. And Jeff will be mostly presenting tomorrow. And then at the very end, we'll have a very brief lecture on the future of metabolomics. So hopefully everyone can stay and stay awake for the rest of the lectures and labs. This particular lecture is to give you a background or a feel for metabolomics and the metabolome. We have some standard learning objectives, so to try and identify some of the applications of metabolomics, including lipidomics. And I think many of you have just in your presentations have highlighted just the variety and diversity that metabolomics is being used now. We're going to get into some of the techniques, the platforms that people use, LCMS, GCMS, NMR. And then we'll also, as you guys have heard, talk about the differences between targeted and untargeted metabolomics. Some of this might be new to you, some of it's relatively old. So we're diving into a picture I often use to introduce metabolomics. And to sort of, I guess, emphasize the role that metabolome has in connecting to the proteome and to the genome. At the base of this pyramid of life, we have genes or DNA that codes for everything in a cell or in a body or in an organism. When we study DNA, we call it genomics. A collection of all genes is called the genome. Genes code for proteins. And those are kind of the workhorses of the cell, whereas the DNA is sort of the memory bank of the cell. Proteins are there to facilitate chemical reactions and to essentially create or destroy metabolites. And so metabolites are part of the metabolome. And studying metabolome is metabolomics. When you go up the pyramid, you'll find that there's an increasing influence of the different components from genes to proteins to metabolites based on the environment and based on physiology. And sort of that interface between the environment and the cell is where metabolomics is particularly useful. And metabolites are excellent readouts of the phenotype or the phenome, as people tend to call it. It lies between the environment and the genome. And in many respects metabolites tell you what is going on, whereas the genome tells you what might be going on. And so in that regard, metabolites and metabolomics gives you essentially a chemical phenotype readout. What you eat, breathe or drink affects your metabolome, and that's how the environment affects it. It doesn't hopefully affect your genome. Otherwise, we'd all be really mutant monsters, I suppose. So the genome is very stable to environmental perturbations. The proteome a little less, but the metabolome is quite sensitive to that. The same with physiology. Many of us are taught to biochemistry to think of it in terms of purely cells and only cells. And in fact, the organs in any organism, including our own bodies, are essentially metabolic organs. Metabolism in the liver is very different than metabolism in the brain. Metabolism in the heart is very different than metabolism in the gut. And so the physiological influence also is amplified as you go up. All cells in your body have the same DNA, but all tissues and organs have very different metabolomes. So physiology and environment play a critical role, and this is why I think metabolomics is becoming more and more popular. The contrast between metabolomics and genomics is sort of here. Obviously in genomics we study genes, and so it's high throughput characterization with next gen DNA sequencing or transcript analysis. We're looking at genes in a given cell tissue or organism. In metabolomics is the same idea. You want to use the same concept of high throughput analysis using probably not sequencers, but things like mass specs and NMR to characterize all of the small molecules in an organism. So that collection of small molecules is the metabolome. We use a definition in metabolomics to talk about essentially organic and even inorganic molecules detectable within an organism that are less than 1500 Daltons. It's not a hard cutoff, but it's one that's been convenient and it's one that's now standardly used. So that means if you use that cutoff, it can include things like peptides. So looking at small peptides is considered metabolomics. Looking at short DNA RNA fragments is also considered metabolomics. Looking at contaminant metals and or salts is also metabolomics. But more traditionally it includes analysis of things like sugars and nucleosides, nucleotides, organic acids, you know, acids, ketones, aldehydes, steroids. It means looking at foods. It looks like looking at microbial products. That means looking at toxins, pollutants, drugs and drug metabolites. So it's just about anything that an organism or body can be exposed to. It includes endogenous things and exogenous things, which might include, if you want, the microbiome. And it ranges over a huge range of concentrations from picomolar is the most sensitive techniques to concentrations almost up to one molar. So metabolites is what we define. The metabolome is the collection of those metabolites. And it can include looking at metabolites in a cell, in an organ, in an organelle, in tissues or an entire organism. What we are also realizing is that what we can see is in the metabolome is often far less than what we know is there. And that includes things like transient molecules, intermediates that occur. It includes even theoretical molecules in the case of the lipidome. We know the structure or can imagine or draw the structure of literally hundreds of thousands of lipids, most of which have never been officially seen. What we detect is really defined by the detection technology. So unlike genomics, which has sort of one kind of technology for sequencing. In metabolomics, you have to use a lot of different technologies. And because of the issue of the techniques, the issue of transient molecules of the technology we have even now, the metabolome, the number of molecules in a given organism is always ill-defined. We know pretty much the exact number of genes in E. coli, the exact number of genes in rabidopsis and drosophila, and pretty close to the exact number of genes in humans. But we still have no idea about the size of the metabolome in most animals. This is an estimate. It's not guaranteed, but this is a rough estimate on the number of metabolites that we know for different kingdoms of life. If we look at all plants, there's somewhere between 250,000 to 300,000 is the common number that people bounce around. It's probably more than that. If we look at the universe of microbes, we can see perhaps 150,000 products that microbes code for produce. A given microbe maybe produces around 5,000 metabolites or compounds, but because they are in such diverse niches around the world, the total number of the microbiome metabolome is quite large. Mammals, including humans, have relatively smaller metabolomes, but it's still on the order of about 120,000 chemicals or metabolites that have been enumerated so far. David, can I interrupt? If you go back a slide. In the mammalian metabolome, how can you differentiate from the microbial metabolome? It's really hard. In fact, there's a lot of overlap between metabolites. You're exactly right. We often can't distinguish, except usually by looking at specific organs. If we look at feces and things like that, we can generally come up with a better idea about what are microbial, but it's still a challenge. There's about a 95% overlap. We know about 200 or 300 metabolites in the human metabolome that we are absolutely certain are microbial. Then there's things that are called microbial co-metabolites, like hypoallergenic acid, which is a mixture of both the human and the microbial metabolism. These numbers, I presume, are for complete. They only include all the breakdown products. I'll be getting into that. Yes, these are the original molecules. There are other things that suggest that the metabolome is probably about 10 times larger than what I'm showing here. In terms of what we can track and what's being kept, and a lot of this is in databases that my lab has been maintaining for the last 10 or 15 years, this is the collection that we call the human metabolome. Humans have a lot of endogenous metabolites. Actually, the number is probably less than 114,000, but they range, as I say, from femtomolar, which is below what we can normally detect, to almost molar ranges. So these are the metabolites encoded by our genes and produced by the proteins and enzymes in our body, including some of the microbes. So we take in things. Humans eat other metabolomes, and we also take in synthetic things. So there's about 2,600 known drugs that are approved by Health Canada and the FDA. So in humans, we will find drugs. Hopefully you won't find all 2,000 drugs in one person, but in a population, you will find a few drugs in one person, or several tens of thousands. You probably will get a good portion of those. Drugs tend to be at a higher concentration, but at lower levels than what you'll find in many endogenous metabolites. Also in our body, you'll find foods and food metabolites. These can include food additives. They can include sort of the xenobiotic compounds you get from plants, all those polyphenols and terpenoids and alkaloids, in the coffee that you're drinking, or the tea. And the diversity of those is also quite large, about 70,000 compounds that we've catalogued in a database called FoodDB. So they are about in the same range of drugs, and this is why many food compounds essentially also act like drugs. Drugs are broken down, and they produce drug metabolites. They are at lower concentrations on average than drugs themselves. Some drug metabolites are actually more active than drugs. Some are very harmful. So we track a lot of that information about drugs in a database called Drug Bank. And then there's a whole variety of toxins and environmental chemicals, and there's far more than 3,600 that I've mentioned here, but these are the most toxic ones, and the most abundant ones. And we keep those in a database called T3DB, or the Toxic Exposome Database. I'll be talking about other databases later today, and they're much, much more in these tools or resources. And again, hopefully in the case of toxins, these are at much, much lower concentrations than food or endogenous metabolites, because obviously they're harmful. So a whole range of compounds in the human body, from the exposures to drugs to foods to endogenous metabolites, and spanning many orders of magnitude in terms of concentrations from 10 to the minus 15th to roughly 10 to the minus one molar. Now, we as I think Francis highlighted is there's a lot of other metabolites that likely exist. And the magnitude, at least in the human, is quite significant. We can look and think about known lipids or predicted lipids. So there's probably another 100,000 lipids at least that are not cataloged in databases. Given what we know about drug metabolism from the 2,600 known drugs, there should actually be on the order of about 10,000 drug metabolites. So those are some of them predicted, some not known. We also know that the foods compounds that we take in are also metabolized. So the roughly 70,000 different products that we know in foods are probably multiplied by a factor of six to 10. And these are food metabolites, breakdown products that also produce some strange and interesting compounds. Some of them are done by your liver, some of them are done by the gut. And then there's a whole range of promiscuous free enzyme reactions and other metabolites of metabolites. So your endogenous metabolome is processed by your gut and by the liver to produce other secondary metabolites. And I'm calling this the second dome or second dome. But it's essentially another collection of metabolites that we believe exist. So when you add these up, we're looking at at least another million compounds and probably far more than that, that are probably in your body right now as we speak. So the size of the metabolome is still growing and something that is still somewhat ill defined. Does anyone have any questions about that because sometimes we do at least in in person classes, people have a fair number of questions about these, these numbers. Does anyone want to ask a question. All right, I'll carry on. The next one is really just a quick question address why metabolomics is important and obviously everyone who's here thinks it is. There are some interesting statistics about small molecules. When you look at the clinical assays that are run if you go to the doctor's office or to a hospital, most of the diagnostic clinical assays that are done are for small molecules. So if they're doing a urine test they're measuring things like glucose or cholesterol or bilirubin or creatinine. We know that almost 90% of the known drugs are small molecules. They are often derivatives of existing natural products about half of them. So they're either inspired by natural products that have similar functions or in some cases, they are the natural products and they're still used as drugs. If we look at the number of genetic disorders that we know in humans about a third of them involve diseases of small molecule metabolism. So yes, they're genetic, but they're also metabolic disorders. And then when we think about the body of any organism, ours or other organisms, small molecules are the cofactors and signaling molecules from most of the proteins in a cell or in our bodies or in other organisms. And we tend to forget that. We generally look at small molecules as things to be either consumed or destroyed or assembled in catabolism or nabolism, but small molecules play a much, much more important role. One way of looking at them is that metabolites are the canaries of the genome. There's a small change, a single mutation in certain, say, phenylalanine oxidase or synthase genes can lead to a disease like phenylketonuria. So one mutation can lead to a 10,000 fold increase in certain metabolite levels. So it allows, or historically, the measurement of small molecules is allowed to detect genetic changes, which up until relatively recently were almost impossible to detect. So the fact that you can have a single chemical change and DNA amplified to the point where you see a 10,000 fold change in certain metabolite levels is quite striking. And so this is why the concept of the canary in the coal mine, coal miners use canaries to help detect very low levels of toxic gases in the 1800s and early 1900s. This is how metabolites can serve the same role as being able to detect those changes in the genome. As I mentioned before, what you eat and drink and breathe changes your metabolome. And so in that regard, metabolomics is very time sensitive. So like this person, if you're really hungry and we're eating a lot of spaghetti, what you would see is that if we metabolically measured you, we'd see wild variations and whole range of metabolites over the first few seconds to the first few minutes to several hours. Some going up, some going down, some oscillating. So in that regard, metabolomic readout just from eating a meal would be quite impressive. Temporally, if we looked at your proteins, we might see changes of a few proteins like maybe insulin, gherelin, a couple other small ones, but there won't be a whole lot of change proteomically over that period of time. And then your genome, hopefully shouldn't change at all based on the meal you ate. So in fact, the genome will change or should not change at all during after or over multiple meals. So that time sensitivity makes metabolomics both useful, but also challenging. Because if you aren't designing your experiments to include certain aspects about diet or time after consumption or exposures, that makes interpretation of metabolomic data a little more difficult. Likewise samples that you collect blood or urine or any other tissue from other organisms is metabolically active. There are enzymes that are converting, there are metabolites that are spontaneously converting, and those change over time. And so if you don't quench or metabolically quench a sample, you're going to get the same readout that I'm getting here in terms of these metabolic responses. So people who tend to collect things for DNA or RNA or protein, don't worry about those time changes very much. And sometimes they'll leave samples sitting out for several hours just because they know that DNA or RNA and even the proteins are relatively stable. But you can't do that for metabolomics. If you collect the sample, you have to quench it fairly quickly, typically within a few minutes, and you have to store it typically frozen if you want to get a real measure of the metabolome. Metabolism is relatively well understood. Pathway diagrams existed for metabolism as far back as the 40s and 50s. This is a picture of one of these wall charts that some of you, if you were born in the 60s or 70s, might have seen them when you're in university. Some people still have them. But the point is that this was something that was known as far back as the early 70s. These are pictures of metabolism and metabolic pathways. So in that regard, our understanding of metabolism actually is probably better and deeper than it is for proteomics and genomics. One of the strengths of metabolomics is that we can go back to a vast body of literature and use that to help interpret and understand the results we measure. Now in the pyramid that I showed you at the beginning of this talk, I had these three different groupings and colors, the genome, the proteome and the metabolome. On a large extent, they still are these silos where we often don't connect. But I think one of the strengths and hidden strengths I think of metabolomics is that it helps connect the other owns. That in order to understand a pathway, a metabolic pathway, you have to know something about the enzymes. And to know about those enzymes, you have to know about the proteins and also the genes. And so people doing metabolomics inherently have to understand or integrate both proteomics and genomics. Now people doing genomics don't necessarily have to think about the metabolome. They can focus on the genome or the transcriptome. And in many cases, people doing proteomics also don't think a lot about small molecules. They're not measuring them. But I think more and more, and as many of you are seeing, it's important to be able to connect these three different forms together. So when we look at the connection, we can think about the small molecules from AMP and TMP and ATP are all the constituents. So the genome and the transcriptome, they are the nucleotides and nucleosides. The amino acids, of course, are the constituents of the proteome. They all have to be strung together to make proteins. The lipids, or the lipidome, are responsible for giving cells their shape, integrity, and structure. So they're crucial. So the small molecules make the genome, the small molecules make the proteome, the small molecules make the lipidome. They are also the source of all energy in our cells, or any cell. They need sugars, lipids, amino acids, and ATP. And then, of course, I've mentioned that importance of many small molecules as cofactors and signaling molecules. What people don't often realize, and this is sometimes considered scandalous if you walk into a room of geneticists, but the genome and the proteome largely evolved to catalyze chemistry of small molecules. It is likely that the very first, what we might call living organisms, didn't have much either in terms of either DNA or RNA. They were just simply a collection of chemical reactions that allowed other things to proceed in some enclosed body or cell. And so what genes and proteins did is that they emerged to help speed up the chemical reactions that were essential to life. So the chemicals really drove the evolution of the DNA, the genome, and the proteome. The other thing that we're trying to do in this modern world is something we call integrative biology or systems biology. And that collection of different silos of genomics and proteomics and metabolomics need to be integrated. And we call it systems biology or integrative biology. And the way to do that is to use tools like bioinformatics and what you're going to learn about in the next few days is cheminformatics. And there are two related branches, one primarily to deal with genes and proteins and the other one to deal with small molecules. But even bioinformatics and cheminformatics are subsuming each other. The applications of metabolomics are diverse. Many of you are using it in microbiome research, some of it in environmental metabolomics, some in cancer, some in a variety of organisms from small to large. Metabolomics really got its start in the world of genetic disease tests. Many of the first chemical or metabolite tests were specifically for them. Food analysis and food chemistry uses metabolomics a lot. All the clinical tests, whether it's in blood or urinalysis, use a lot of the instruments in metabolomics. Drug compliance and drug monitoring, transplant monitoring, looking at the drugs and levels that you see in people with organ transplants. Imaging, CAT scanning, magnetic resonance imaging, these use chemical shift imaging or PEP scanning to see specific metabolites. Tox testing, clinical trial work, fermentation monitoring is becoming bigger and bigger, whether it's for wine and beer but also for the production of pharmaceuticals. Drug phenotyping, water quality testing, petrochemical analysis. The list goes on and on. So this is why basically we've been seeing more and more people enrolling into these metabolomics courses and more people from more diverse backgrounds realizing that metabolites are really useful and then it's not just about metabolites but it's about all of the chemical universe that our world is immersed in. So I'm going to dive in, switch over to metabolomics methods but I'll stop here briefly to find out if there are any questions that people have about what I've just talked about. Everything's about the right pace or am I going a little too fast? Not getting a lot of feedback so I assume I'm not talking just to myself. I'm going to put the green tick at least, yes. Thumbs up. Yeah, I enjoy it. I'm just listening. Okay. All right. We're going to dive into the workflow a little bit. And this is trying to describe processes by which we do metabolomics and for some of you this is old hat so you might fall asleep but for a number of you based on your introductions this will be kind of new for you and so I'll dive into it and hopefully get everyone up to about the same speed. So in metabolomics we start with often biological samples sometimes we'll start with cells. We can start with soil we can start with tissues organs plants animals whatever we want actually. But we usually convert those solid things into liquids. So we might use a sonicator will do some kind of solvent extraction. What you're doing is you have to work quickly we talked about this idea of metabolic quenching so it's important so the enzymes aren't converting all those metabolites into something else. So the extraction will produce a fluid. Now we can save ourselves a lot of effort if we collect fluids directly. So in the case of animals and humans we can get some you know blood or urine, and that's a lot faster. So tissue extraction on an organ will get something that kind of looks like blood or tissue lysis. But in the end what we're trying to do is create a set of liquid samples, whether they're extracted components or the actual bio fluids, because working with liquids is really best for chemical analyzers. And the chemical analysis we typically do in metabolomics is either liquid chromatography gas chromatography mass spectrometry NMR spectroscopy, a range of other things and so the chemical analysis we do in metabolomics is actually looking at mixtures. We actually look at pure compounds, but we've got a collection of mixtures. And so what we have or what I'm showing is is just the tools but but what metabolomics, I guess the technological leap that happened, wasn't the introduction of molecular NMR, which have been around for 60 70 years. It was the introduction of data tools. It's this arrow here that allowed people to go from analyzing a single molecule at a time to dozens to hundreds and even thousands. So a lot of the strengths and metabolomics isn't necessarily in the technology, but it's in the software and this is why this course is being taught. And it's important that you understand how to look at and use the technology and so we're going to focus on this for the next hour and then this for the next day and a half. So when we look at the different ohms and the different omic techniques with genomics through next generation sequencing it's pretty routine to be able to sequence an entire organism. Human 22,000 genes microbes around four or 5000 genes with proteomics with different techniques you can get up to around 8 to 10,000 protein so not quite as many of proteins as all the genes and we know that genes code for probably around at least in humans about 100,000 different proteins. So proteomics doesn't give us all the coverage we expect a typical metabolomics run. If you're doing pretty good job will lie to identify about 200 metabolites. Some techniques that can get a little higher. But whether it's targeted or untargeted, most people are happy if they can identify that number. So in terms of the coverage with omics right now. If you want something that's really complete go to genomics metabolomics still has a long way to go. And so, as you go up this pyramid the completeness of coverage is still woefully weak in the field of metabolomics. We'll talk about some of the techniques that are improving that or can improve that. But we'll leave it at that and just say that there's rather limited coverage the metabolom offers reason why is because it has to do with the complexity of the chemistry. Sequencing genes means you only have to analyze four different bases and we really understand the chemistry of those bases and the fact that they're in polymers also makes it easy. Proteins are made up of 20 amino acids. Proteins and so the chemistry and proteins is a little more complicated. And that's why we can't sequence proteins quite as easily as we can sequence DNA. In the case of metabolites we're dealing with hundreds of thousands of different chemicals and around three to 4000 different chemical classes. So to look at that wide diversity of chemicals and chemical classes intrinsically makes metabolomics more difficult. This is why we have to use a whole bunch of different techniques and tools and platforms to figure out what we're seeing. So the techniques and metabolomics include chromatography. They include capillary rectal phoresis and microfluidics they include liquid chromatography mass set is a variety of mass platforms triple quad toffs Fourier transform. You can also use gas chromatography and GCMS infrared spectroscopy NMR spectroscopy. Crystallography there's even efforts now using electron crystallography or electron microscopy to identify metabolites. So an incredible diversity of techniques just to be able to handle the wide diversity of chemical classes that you see. None of us can be experts in all of these but to do metabolomics comprehensively you often have to access these and this is why a lot of centers have emerged. Which offer essentially all of these techniques. So metabolomics innovation center which is one that I run offers most of these technologies right now. So in terms of chromatography, this is a separation process to separate small molecules and chromatography is often coupled to liquid chromatography MS GCMS. It's an old technique it's been around for more than 100 years and traditionally used to separate chemicals for organic synthesis. On the right you're seeing an example of a chromatogram of what you can see was you separate individual molecules and the peaks corresponding to them. So typically the definition of chromatography is to take something a mixture and pass it through that or have that dissolved in a mobile phase a solvent could be water acetoneitrile methanol. And then to pass that through a stationary phase which is usually a powder or a mixture of certain molecules. And as it passes this mobile phase passes the stationary phase things separate based on their interaction with the stationary phase they partition between the mobile and stationary phase. So with chromatography you can have column chromatography thin layer chromatography gas or liquid. You can have affinity chromatography on exchange size exclusion reverse normal. Hydrophobic interaction or hydrophilic interaction and gravity high pressure and so on. These are all the techniques that are used and the type and methods can vary tremendously. Most people use high pressure or high performance liquid chromatography in metabolomics. This has been around for about 50 years. You use relatively high pressure 6000 pounds per square inch and very small particles about five microns to fill the column that's the stationary phase. But with the high pressure. You're able to get greater separation. You're also able to detect things because we're able to use much smaller columns. And in some cases people can detect things that are parts per trillion level. Now you can adjust and modify high pressure chromatography to separate polar compounds in things like HILIC, H-I-L-I-C or non polar compounds in reverse phase. So the different modalities modalities are a reverse phase for non polar. And it uses a non polar stationary phase, aliphatic compounds attached to a bead substrate the normal phase which is not that commonly used anymore. HILIC hydrophobic interaction liquid chromatography was what it stands for. And it uses a polar stationary phase and a mixed polar non polar mobile phase but this is ideal for separating polar molecules which are quite common in bio fluids and tissues. As I mentioned the columns are generally small and designed to sustain very high pressure. Most of them are made out of stainless steel. Some can be made out of plastic called peak. Very few now are made out of glass. They can be analytical or preparative. Analyticals are the most common ones, very small columns, very narrow. Preparative columns are used to prepare things in large scale. So if you're a natural product chemist you typically use preparative H-I-L-C columns. Dimensions are given here 1 to 50 millimeters inside and about to some of them up to half a meter long, but most are much smaller on the order of about 20 or 30 centimeters. So reverse phase column is typically made up of silica beads, five microns in diameter typically, and they're typically decorated with hydrophobic either aliphatic. So a C16 or a C18 or a C4 column tells you the number of carbons that are on the aliphatic chains that are attached to the silica beads. And so you can change or modify those lipids or fatty acids or alkanes that you stuck onto them with things that are aromatic. So you can have a bifenyl group. You can also have this divinyl benzene. All kinds of columns substrates can be attached to change what you're going to be separating. And essentially you have to remember like dissolves like so an alkane prefers interacting with an alkane. A bifenyl prefers interacting with aromatic molecules. A cyanogroup prefers interacting with sort of polar groups. So these are things that you can do to modify what's in your column. So you can play around with the separation efficiency by either adjusting the length of your column. So a longer column, which takes more time will give you better separation. You can also adjust the bead size. So smaller beads will also give you better separation. So when you get smaller beads, you actually have to use much higher pressures. And so the development that happened over the last 10 years was to switch from 5 micron beads and HPLC, 1.7 micron beads and UPLC or ultra high pressure. So let's say simple tricks, either longer separation times or smaller beads and higher pressures. So basically an HPLC separation is done where you'll have a solvent. What's this immobilized phase you'll have a pump that provides the high pressure. You'll have an injector, which you'll allow you to insert or inject your sample column will be used it separates the material in the mixture, and then you'll have some kind of detector. It could be a UV detector, a fluorescence detector or a mass spec detector, even an NMR can be attached to an HPLC or UPLC. And that is then connected to a computer and then of course the solvent is either collected or sent off to waste. You can get more enhanced or improved separations if you use more than one solvent, you can use two solvents, you can even use three solvents. And you create a gradient or a mixture. And by changing the solvent, the mobile phase over time, you're able to enhance the separation. Generally, you can't change the column over time, but certainly the solvent can. And so by changing the solvent, you can either improve or enhance the separation improve in some cases the resolution. And a lot of work is done by different groups to play around with the different mixtures and optimizing the solvents or mobile phase to get better separations. This is a picture, a typical diagram of a chromatogram where you've got a biological mixture of probably several hundred components, maybe even several thousand components running on an HPLC column for about 50 minutes. And that's a long separation and many people in in metabolomics like to get separations down to 10 or 12 minutes using UPLC. And what we're saying is the intensity milli absorbance units. So this is a UV HPLC separation and the peaks correspond to individual chemicals or in some cases dozens of chemicals under a peak. But this is why separation is important and that it helps simplify the mixture to either individual compounds or a small number of compounds that can then be analyzed by mass backer in a month. Another form of chromatography is gas chromatography. So in this case, the mobile phase is not a liquid, it's a gas. And typically, gas chromatography occurs with a column, a very long column in an oven, relatively high temperatures. And you can change the temperature of the oven just like you can change the solvent so that oven temperature is a technique, not unlike in liquid chromatography where you have gradient liquids, changing the oven and temperature allows you to create a gradient and improve the separation in gas chromatography. You can have a flame ionization detector, or you can have a mass spec detector in gas chromatography. So this is why we got the term GCMS. So for gas chromatography to work with sample has to be vaporized, it has to be something that boils reasonably well. And it's relatively stable the temperature. It moves through the column, and it's pushed by gas, usually hydrogen or argon. And then the column itself isn't packed with beads, but it's actually hollow column where they're getting certain chemicals are attached to the surface of the column, and you're getting interactions of the small molecules with the surface. Like small molecules interact with the beads and the aliphatic or hydrophilic components in liquid chromatography column. Columns are very, very long, usually around 10 meters in length, not on the order of 20 centimeters as an HPLC. And the inside diameter is also tiny, a few millimeters, whereas most HPLC columns are measured more like a centimeter in diameter. In order to make compounds boil or vaporize, sometimes we will derivatize them, or often we have to. And derivatization is a technique that's critical to GCMS. We can either use a technique or a compound with methoxamine, which is used to sort of open up some sugars and to react with certain aldehydes or ketones. And then we can also do silation, which is attaching trimethylsilane groups to hydroxyl or carboxylate or amino groups. And this chemical derivatization allows many compounds that are not otherwise susceptible to heating or boiling or vaporizing to be vaporizable. Some liquids, certain terpenes, alkanes, alkaloids that also will don't need derivatization. Many of the flavor components in food don't need derivatization. But many other metabolites like amino acids, many sugars, most organic acids, fatty acids, have to be derivatized. So if you start with a mixture of compounds and they've been derivatized, you push it through your column and the gas flow helium allows just like, let's see, the nitrile methanol or water and a liquid column to separate things. And you will have high affinity, those things move more slowly and low affinity, those things move more quickly. And the gas flow towards the detector. And below is a picture of a gas chromatogram. And typically they take about 45 minutes to run. So they're not as fast as UPLC. But the resolution you get with gas chromatography is much, much better. If you look at a gas chromatography column, they're not typically straight, they're wound in a coil and inserted in a cubicle oven. On the side, you'll see that they're essentially coated, they're hollow tubes with a polysiloxane coating. That's a mixture of this fused silica with the stationary phase of the polysiloxane. And that can be methyl or benzyl groups, which the metabolites or chemicals interact with. And liquid chromatography and in gas chromatography, we measure how long compounds take to come off. And so that's called the retention time or RT. So liquid column, liquid chromatography, how long, how many minutes or seconds for something to start coming off is a measure of either how hydrophilic or hydrophobic it is. It's affected by the column type. It's affected by the flow rates, pressure, the temperature in the case of gas chromatography. So there's many variables that go into it. So it's not something that is easily predicted, nor is it something that's consistent, at least for liquid chromatography. We use a standard column, standard separation process. Certain labs will develop a way of comparing or standardizing retention times. In gas chromatography, it turns out things are much more standardized than in liquid chromatography. And in fact, there's now a standard protocol in gas chromatography called the retention index and the CoVets KOVATS retention index, which can be pretty much universally used. And this is the retention time in a gas column normalized to the retention time of a bunch of alkanes, usually six or seven alkanes of different carbon lengths from say C6 to C15. So by standardizing retention times to standard columns and standard alkanes, gas chromatography has actually leapt far ahead of liquid chromatography and being able to use retention time information to identify compounds. Now, whether it's liquid chromatography or gas chromatography, the same sort of spectrum or chromatogram is generated. You look at that time, so this says you've got something coming off at 2.85 minutes. And then you have a peak and the intensity of the peak, either as measured by ultraviolet, infrared fluorescence or mass spec, or even NMR tells you how much. So the intensity is for quantitation, both absolute and relative. And then the retention time, if you use it properly, can be used as a consistent measure to identify a compound. So if this compound always comes off at 2.85 minutes plus or minus 10%, you can be pretty certain what the compound is without actually having to do a detailed MS or even NMR analysis. As I mentioned, gas chromatography gives you much nicer and better separations than liquid chromatography. You can see the very, very narrow peaks that you see with gas chromatography. These are peaks that have been annotated. In many cases, the peak corresponds to a single compound. Sometimes there's a couple of compounds that coalute. But the separation for gas chromatography is really quite spectacular. And for many labs, GCMS is the preferred way to do metabolomics because it's cheaper and the separations are actually predictable. Now, liquid chromatography and mass spectrometry are coupled to detectors. And in many cases, one of the best detectors is a mass spectrometer. So in mass spectrometry, we try and weigh or measure the molecular atomic weight of samples. That's fundamentally what we're using to distinguish or identify things. And it's basically saying that if we had a list of everyone's weight in this class, and then we were blinded, but all we could see was how much you weighed on a waist scale, in essence, we could identify everyone in the class unless one or two of you had exactly the same molecular weight. But molecular weight alone is often sufficient to identify people as well as molecules. So mass spectrometers are expensive. They're often fairly big. This is an example of an Orbitrap mass spectrometer coupled to a liquid chromatography system. Just as I said, different molecules just like different people can be identified by their weights. And unlike humans who can go on diets, molecules don't. And so the weight stays the same based on its structure. So these are different structures, different molecules. They have very characteristic, very specific molecular weights. And so it's a simple idea. And in fact, the idea has been around for more than 120 years to identify molecules based on their masses. If you've got an Orbitrap mass spectrometer or a QTOF, you can measure the molecular weight to within about one ppm, one part per million, which is actually sufficient to determine the molecular formula of a small molecule. And molecular formula tells you a fair bit. Now if you're looking at proteins, you can determine things down to maybe about one Dalton, for say a 40 kilo Dalton protein, which is pretty good. And that's also been used to identify proteins, although not with the same accuracy. So one Dalton is equal to one atomic mass unit or one AMU. And so we'll flip back and forth between AMUs and Daltons. So we couple columns to mass spectrometers. So we can have a GCMS, we can have LCMS. So most of you have heard those terms before, but you can also couple mass spectrometers to each other. So an MS MS is not unlike an LCMS, but it essentially separates masses by mass in two different ways. One is to fragment a parent ion and to separate them into smaller fragment ions. So tandem mass spectrometry. Now with mass spectrometry, you can get accurate masses or average masses. So lower resolution mass spectrometers, like a triple quad or linear ion trap, produce an average mass on a fairly broad peak. And they're able to get to about one Dalton or 0.3 Dalton resolution. A high resolution mass spectrometer can measure the mono isotopic mass. And because they're different isotopes, you will see different levels of abundance for different isotopomers. So this is a picture showing a molecule that has a molecular weight of about 1155 Daltons. The average mass would come out to a big peak, broad peak of 116 Daltons. But if we have a high resolution machine, we'll see things that will have the single mono isotope. Carbon 12, hydrogen 1, oxygen 16. But then we'll see other isotopes with carbon 13 or nitrogen 15 or combinations of carbon plus nitrogen or deuterium. And so we will see diminishing intensities corresponding to the abundance of those isotopomers. And we can see the individual masses. So let's take a benzoyl chloride, where we've got these known abundancies for hydrogen, deuterium, carbon 12, carbon 13, and chlorine 35 and chlorine 37. So the mono isotopic mass, the most abundant one is 112 Daltons. And that would include carbon 12, hydrogen and chlorine 35. And one Dalton higher, 113 Daltons would include the carbon 12, but also it can also have some carbon 13, or it can have deuterium. That would give you enough and based on the abundance of the percentages up there, you can predict there'd be an intensity of about six. But then two Daltons, we have a case of two, either two carbon 13s or two deuteriums, or one chlorine 37. And because chlorine 37 is quite abundant, quite abundant isotopomer, we got a big peak of about 32%. And then you can get other combinations. So this is how you can use isotopes to, in some cases, figure out the formula, but also identify whether you have something that has chlorine. So this is what we would see in a mass spectrometer. With this chlorobenzene, we'd see a peak at 112 Daltons. And then we'd see a fairly significant peak at 114 Daltons with about a third of the intensity. And then we'd see other ones further down with lower intensities based on the isotopic abundance. So mass specs use an ionization technique to convert molecules so that they're charged. We can't measure the mass of neutral molecules. We have to charge them. And so mass spectrometry uses an ionizer. And there are different ways of ionizing things. Most use electrospray or electron impact. And when something is ionized, it has charges and then it goes into a mass analyzer, like a TOF or a quad or an orbitrap. And then it's sent off to a detector to produce a signal. So if we're looking at an MS-MS spectrum where something has been fragmented or an electron impact spectrum from a GCMS, we will see fragment ions. So this is the parent ion. The parent ion has, like the weight of aspirin, about 180 Daltons. But then we see smaller fragments at 140, 120, around 91, and 42 Daltons that correspond to fragments of aspirin. So unlike liquid chromatography or even gas chromatography, a mass spectrum is characterized by very sharp peaks. So the x-axis indicates the mass to charge ratio and the height of the peak indicates the relative abundance. Now the intensity of a peak in mass spec is not very useful for getting actual concentrations. It's the intensity is really a measure of an ion's ability to ionize or fly in the mass spectrometer. So predicting and interpreting intensities in mass spec is difficult. It's not like NMR, if some of you know it, or it's not like liquid chromatography or gas chromatography, where a big peak means there's a lot of material. A big peak in mass spec may not mean that there's a lot of that ion. It just means that it ionizes well. So in mass spec, because peaks are narrow, especially in high resolution, we have something called the resolving power or resolution. So the better the resolving power, the more expensive the instrument and the more accurate the mass. So we define it by essentially the mass relative to the mass difference, or M over delta M. So delta M is the difference between two masses that can be separated. And the mass is the actual measurement of the mass or mass to charge we measure. So when you think about resolution, you can think of a peak sort of looking a bit like a triangle. Delta M is sort of a, if you want the width, and it could be the width at either 50% or at 5% of the height. And so what you're seeing are examples where we're resolving two peaks. Typically humans and machines can resolve peaks at about 50%. It's pretty easy for anyone to resolve peaks at the 5% or 10% separation. So as I mentioned, there are certain types of mass analyzers that have low or high resolution. So an iron trap, linear iron trap or a triple quad, typically can only measure masses at one Dalton or half a Dalton resolution. So you will see typically an average mass for a molecule and they produce this broad kind of ugly looking peak. If you use a time of flight mass spectrometer or an orbit trap or an FTMS instrument as an analyzer, you get much higher resolution. You can see things at the levels of two, three and four decimal places. And so in this case, we're seeing a mass. This is actually for a peptide, but for even a small molecule, we'd see the same sort of thing where you're seeing a bunch of peaks, all corresponding to those individual isotopomers. So a low resolution to a high resolution instrument. And this is just illustrating again the resolving power for different instruments and the resolving power for a lower resolution instrument and a higher resolution instrument. So with the blue broad peak, that's what you'd see for a triple quad resolving power about 1000. And for a QTOF or modest or be trapped, you would see something in the black, which is a resolving power around 30,000 and you can see those peaks are very, very distinct. And that obviously means you can get a lot of information from very narrow peaks. So mass spectrometers have an ionizer called an iron source. They can be electrospray, ion spray, atmospheric pressure ionization, electron impact, chemical impact. They all run high vacuum systems to help make sure that there's nothing contaminating the ions because you're working with small numbers of molecules. We've talked about how LC and GC send these things into the mass spectrometers, but they still have to be ionized. So in terms of different ionization methods, we can use EI for GCMS, CI for GCMS, typically have lower mass limits, usually around 700 Dalton's is the max. Some are called hard ionization methods like electron impact and some are soft ionization methods like electrospray or ESI or Maldi. And ESI is the most common method for ionizing in LCMS and it's used not only for metabolomics but also proteomics. You can also use Maldi as a way of ionizing samples and this is used in metabolomic imaging. So the electron impact ionization uses a standard powered electron gun that sends things at electrons at 70 volts or 70 electron volts. So when a set of molecules is fed in from a gas chromatography instrument, the gas floats into this ionizer. An electron gun unlike a cathode ray tube with the old TVs is fired and it shatters the molecules into their fragments and those fragments all typically have positive ions and they are sent into the mass analyzer, usually a single quad mass pack. So it's a very standard approach to ionizing and it fragments things into very tiny fragments. So the sample is already evaporated, it exists, it's in a gas phase, you hit it with these electrons, they're shattered, shattered the molecule and then as I said it's one used in GCMS. So if you're fragmenting a molecule like ethanol, you'll see or methanol, you'll see the methanol parent molecule but then you'll see things that have lost the hydroxyl groups, things that have lost other hydrogens. So you'll see a range of masses, including the parent ion. And so this is an example where we're seeing methanol with a paradigm of 32, a loss of a single hydrogen to this produces stable or ionized double bonded form at 31, the sort of pseudo aldehyde at 29, and then you'll see the methyl group at 15. So these are the fragments and in fact, people who are pretty skilled in GCMS have a pretty good idea of how molecules will break up in their predictive or in a predictive way. And so this is how you can actually figure out the chemical structure of small molecules. Now the soft ionizations like Maldian ESI use different things, Maldi uses a laser ESI uses essentially a gold tipped hollow needle in a high electrical field to ionize things. So you spray things out kind of like on an aerosol can. And it produces a mist or a spray. You have electrodes that surround it. And that actually helps ionize the spray to add charges. And then things are sent from a relatively high pressure atmospheric pressure to something that's very, very low pressure. And that also helps to evaporate the spray down to essentially something that is visible to essentially something that's invisible. So if you could look at it, you'd see these droplets coming out of your aerosol sprayer, the capillary has this high charge which helps ionize the spray, but as it moves from atmosphere to low pressure, it starts evaporating. So the acetonitrile methanol water disappears quickly. And as it shrinks, this is droplet shrinks, which contains many ions, they burst off into tiny single ions. And now you have single ions floating through the mass analyzer off to be detected. So an electric spray, you have to have something that usually a fairly volatile buffer, you don't want salts. You have to pump it through a capillary at a relatively low rate of microliters per minute. You apply really strong voltage to this aerosol nebulizer to aerosol, aerosolize things and then as things evaporate, these tiny droplets still carry charges. So you can play around with the voltage to enhance the spraying. And you can also play around with the viscosity of the solvent to enhance the spraying. And these are just sort of showing some conditions depending on which voltage you use. And then at some point, if you have the right condition, things start spraying nicely and produce a really nice signal. You can go from micro spray to nano spray. If you use nano spray, this is ideal for proteomics, for micro spray, it's ideal for metabolomics. You can get away with a very tiny amount of material, but if you put in or happen to put in salts or detergents, it can really mess up your performance. You can change the mode from positive ions to negative ions. And that depends on the solvent that you're using or mixing with your LC system to produce either positive or negative ions. So after you've ionized something for a mass spectrometer, you now have these tiny ions flowing through their molecules or fragments of molecules. And if they just existed in space, we wouldn't be able to see them. So we want to be able to analyze and detect them. And so each mass spectrometer in addition to an ion source also has a mass analyzer and a detector. And what most people label a mass spectrometer by is by the mass analyzer. So someone has a QTOF or they have an orbitrap or they have an ion trap. That's how they refer to their mass spectrometers and that fundamentally is the mass analyzer. So the first mass spectrometers use magnets and they were called magnet sector analyzers. They actually give really high resolution but almost no one uses them anymore. What they now use instead of magnets is we use electric fields because we can control them better. And so among the first and most common mass spectrometers that are used in GCMS and in MSMS are quadrupole analyzers. That's marked with a Q. They have low resolution. They're very robust. They're very fast and they're much cheaper than most other mass spectrometers. The time of flight mass spectrometers have been around for a few decades. They have a very high resolution. They can also be relatively high throughput. And sometimes these are coupled to a quadrupole and so you'll have a QTOF quadrupole time of flight or a time of flight time of flight. It's another method. One of the more popular mass spectrometers these days is called an orbitrap. Among the highest resolution it's getting higher than time of flight. You can work with small molecules and big molecules. So you can do proteomics as well. And they actually tend to perform or outperform what used to be or still is the highest resolution mass spectrometer called an ion cyclotron resonance or an FTMS. These are incredibly expensive and huge machines. They're relatively slow in terms of their data collection. They are very, very useful for applications requiring very, very high mass resolution. So in terms of their mass accuracy or mass resolving power, we talk about things like parts per million where we calculate the experimental versus the actual calculated mass and compare it. So FTMS and orbitraps we can measure down to one ppm or less. Time of flight mass spectrometers around three to five ppm. I think there's a mistake here but generally quadrupole and ion traps have a resolution of about 100 ppm, although you can tweak them to get higher resolution under certain circumstances. But quads and ion traps are low resolution, TOFs, orbitraps and FTMS are high resolution measured on one ppm accuracy. So when you collect data from an LC-MS or GC-MS experiment, you will get essentially chromatograms that are sort of coupled with not only your LC but also your MS. So you can get a total ion current chromatogram, a base peak chromatogram, or an extracted ion chromatogram. And a total ion current reflects all of the ions from all of the peaks coming from the whole HPLC or GC run. And they're kind of ugly and not shown in red on the left corner. The base peak chromatogram is more appealing and it's one that essentially shows you more about the compounds that you're detecting. So we're not using UV to detect things, we're using the mass spectrometer to detect things. And this would be the equivalent to a nicely resolved LC or GC chromatogram. And that's the middle one, that's the blue one. Then the extracted ion chromatogram is one where you just extract the single or a couple of molecules from the TIC or the base peak chromatogram. And this is one where you're just wanting to analyze or identify a specific molecule. Each of these can be electronically extracted from an LC or GC-MS. So this is an extracted ion chromatogram from, or not an extracted ion chromatogram, but a base peak chromatogram from an LC-MS run where we're seeing individual peaks with individual masses identified. So we're seeing retention times above and masses below for these specific compounds from tomato or rabidosis. Now I'm looking at our timer a little over time when we started about 15 minutes later than I'd hoped, but I'll try and move quickly just so we can get through the rest of the material. So I've talked about mass spectroscopy and liquid chromatography and gas chromatography. I'm going to talk about another technique that's used in metabolomics called NMR. And I think there was only one or two of you who indicated you have done NMR. When we first started teaching this course about half the class was NMR. These days it's usually about 10% of the class does NMR. But we still talk about it because it is a technique that's still very useful. So NMR uses a giant magnet, a superconducting magnet, and it collects spectra that look like this. They look a little bit like a GC chromatogram. They look like an extracted ion chromatogram or base peak chromatogram. So instead of showing retention time, they show chemical shift and you have peaks that are narrow of bearing intensity. In NMR, we put a sample under a very strong magnetic field. The magnets are about the size of a refrigerator and they have the strength to pick up a city bus. So they're some of the strongest magnets in the world. So if you put a sample, liquid sample under strong magnetic field, it becomes very sensitive to radio frequency radiation. So if you send in a pulse of radio waves, they will be absorbed by the sample. And by measuring the absorption spectrum, just like when you measure a UV absorption spectrum, you'll see absorbed lines at specific frequencies or chemical shifts. So the frequency and chemical shift are the same. The absorption bands, and this is an example of the absorption bands for an NMR sample. And those are the peaks that we see. In NMR, it's not a radioactive thing. We are measuring nuclear magnetism. So it's not about fusion and fission, it's just nuclear magnetism. It's non-radioactive and we're testing the changes or probing the changes in nuclear magnetism. All molecules have nuclei because all molecules are made of atoms and atoms have nuclei. We use light, but the light is not visible, so the light is radio frequency light. And we measure how these absorb and change the nuclear spins. You can only detect an NMR signal when you put something in a really strong field. At zero field, there's essentially no absorption happening. Different nuclei, carbon nuclei, hydrogen nuclei, deuterium nuclei absorb at different frequencies. Those nuclei can have spins. They are tiny little balls basically spinning in space and they spin either right or left or up or down. So all protons and neutrons have a spin. Protons because they have a positive charge. When you spin a charge around, it produces actually a little magnet. So protons, which are in the nucleus, when they spin create a little mini magnet. And either the north pole points up or the north pole points down. And so that's how we know when something has a spin up or a spin down. So at a sample, you're going to have billions of nuclei, trillions of nuclei with all these protons spinning, some up, some down. And when we shine some light or send in a radio frequency radio signal, which is still electromagnetic radiation, it will reorient or cause some of those nuclei to flip. Some will go up and others will stay down. So we go from a low energy state to a high energy state where we have more of these red balls, more nuclei that have spin, we're spinning up, have gone to a high energy. So I think there's usually some kind of little video here, but I don't think this one works. So we have all these nuclei, they're all spinning and when they're all oriented, they spin up and then they start spinning back down. And it's a little bit like a whole bunch of bells ringing at different frequencies. So if you can think of a carillon where they've got the bells in a church, and they're ringing some are big bells, some are small bells, they have different frequencies. And they're all ringing at the same time. So that's what we're detecting in NMR. Spin flip produces a ringing, and that ringing is what we detect in NMR. And we detect that from many different nuclei corresponding to all the atoms in the molecule. So big bells have low frequencies, little bells have high frequencies. And it's the same with nuclei, some are big, some are small, and they all differ by different frequencies. And so this is what actually what we get. And this is what's measured by NMR. It looks like an oscilloscope readout. If someone's speaking or talking, you get the same sort of signal. But this ringing is something that can be converted using something called a Fourier transform. So it converts the free induction decay, the ringing, which is a measurement of change in time to something that goes to a signal in frequency. And the dimensions are converted to frequencies. And we see now bands or peaks corresponding to the frequencies. So that's what we actually interpret. So those liquid chromatography like signals are essentially started from a free induction decay, and a mathematical transformation to produce the NMR spectrum. Big magnets, the bigger the better, the stronger magnets produce higher frequency measurements. And that means greater separation of the signals. So just like with a big column or a long column in HPLC separates the signals, a big magnet separates the signals in NMR better. With modern NMR, we can take a sample loaded up liquid goes into this big magnet, which can pick up a city bus. It's connected to a radio wave transmitter that sends in the electromagnetic radio waves and detects the electromagnetic radio waves. The ringing is then sent from the transceiver to a computer, which then does the Fourier transform to convert it into the spectrum that we see on the computer there. The magnets are big, they're superconducting magnets, they're filled with liquid helium, then they're surrounded by this foil that's an insulator then surrounded by liquid nitrogen. So they're kept very cold. And they are kept cold by filling them up every two weeks with liquid nitrogen or liquid helium every six months. So like giant thermoses, just like a thermos keeps things very cold or very warm. That's what the big can is in an NMR. The magnet is actually fairly small. But the can is needed to keep things so cool. You typically have a hollow electro, well it's actually a permanent magnet with that is charged. So it's made up of wires that are wrapped around. And this, there's a probe, which is used to hold the sample, and to drop the sample in from the top, the probe at the bottom has the electronics that manages to send in the radio waves and receives the radio waves. So what the probe looks like and inside the probe is a couple little wires to produce what's called a saddle coil, which is used to essentially it's like a little antenna that sends out the radio waves and receives the radio waves from the sample. The sample is dropped in, it looks like a tiny pencil sized to typically there's about 500 microliters it's in an NMR to, and it's dropped into the giant magnet. And that's where it sits in this saddle coil that sends and receives the radio waves into the sample into the liquid sample. When it detects the ringing from the sample it's converted to a spectrum and this is what we see. We see peaks. And some of them have very symmetric shapes you'll see doublets or triplets or quartets at different chemical shifts or different frequencies. Some of them are about chemical shifts in part per million one PPM there's a triplet there and about seven and a half PPM there's a doublet. Those splitting patterns are due to what's called spin coupling, and the intensities of the peaks are proportional, almost exactly to the number of hydrogens. So unlike mass spec NMR actually allows you to quantify things, the peak intensities tell you something about the quantity of the material. The number of protons and then they can be used to actually tell you about the concentration of the molecule as well. So just like mass to charge tells you about a molecule chemical shifts in mass spec chemical shifts in NMR tell you about a molecule. So the different hydrogen chemical shifts exhibit different frequencies and that has to do with the shape and structure of the molecule. So the chemical shift patterns and the coupling patterns tell you a lot about how the molecule is structured. And someone who's trained in chemical synthesis or natural product analysis can often look at an NMR spectrum and in a few minutes figure out exactly what the molecule is different chemical groups on molecules, methylene, methine, aldehydic, aromatic, amino, carboxylic protons will have different chemical shifts. And in protons that can respond from zero to 10 PPM. And so the positions of those shifts and the positions of those peaks can allow you to identify specific groups in a molecule. So this is an example of bromo ethane. So we can see an example where the methylene group, the CH2 group has this cause has a quartet the methyl group is made up of a triplet. The triplet set to PPM, whereas the quartet is around 3.8 PPM. And the fact that there's a quartet that the A group is at three and a half four PPM is because of the bromine, which is very electronegative shifts. The proton peaks, further to the left or downfield. The methylene groups which are far away from the bromine are more upfield shifted. And then there's another signal called trimethylsilane or TMS, which is used as a reference to identify where the signals are, and it's always at zero PPM. And the amount of TMS that's put into the sample allows us to actually measure the exact concentration of the bromo ethane. And there's another one, ethyl benzene, and we can see again aromatic peaks at around 7 PPM, that's where those hydrogen protons are. And then you can see the methylene and methyl groups. In this case, they're not shifted as far down because the aromatic groups aren't as electronegative as bromine is. And at zero is TMS. NMR when you collect them, if the mixture, they'll produce kind of initially ugly spectra where peaks are pointing up and peaks are pointing down. So you have to phase them. Sometimes you'll have to correct the symmetry of the peaks. So this is done by shimming. If you're collecting something with water, you'll have a giant peak in the middle of the spectrum. So you have to suppress that. And then you also have to adjust the position so that everything is referenced to zero PPM. So manually, there's usually a bit of fixing that's done with NMR spectrum. As I mentioned, these are the techniques that help improve the spectrum, and these are usually done manually, but more recently we figured out a way to do these automatically. And so this makes NMR, so that's the only technique in metabolomics that can be fully automated. So NMR produces a spectrum just like GCMS, LCMS, BCPs and TICs and mass spec. So it's not a pile of peaks, but instead of time on the axis you see frequency or chemical shift on the axis. But that is still enough often to identify individual compounds. And usually it's multiple peaks are needed to identify the same compound. But NMR just like LC and GCMS can be used to identify mixtures of compounds. So different sensitivity. NMR is the least sensitive, LCMS is the most sensitive, GCMS sits in the middle. And so this is essentially showing the limited detection or lower detection limit. And then the number of compounds you can typically see or detect. So NMR you can generally see between 50 and 100 compounds, GC between 100 and 300 compounds. LCMS it's not unusual to detect a few thousand features. Unfortunately most of the things that you detect in LCMS we can't identify. So in LCMS we have lots of unknowns whereas in NMR almost everything we detect is knowable. Some of you mentioned that you do either targeted or untargeted mass spec or metabolomics. In targeted metabolomics these are often either kit based systems or use reference standards. They can allow you to precisely quantify, they usually use triple quad mass specs or linear iron traps mass specs. And you can quantitatively measure between 100 and maybe 500 compounds. Untargeted uses high resolution mass specs and you use different techniques. You use a lot of clustering and peak detection, feature selection and metabolite annotation. So it can allow you to identify between 200 and 500 compounds but they can also allow you to discover completely novel metabolites which targeted metabolomics doesn't allow you to do. So untargeted metabolomics is great for hypothesis testing whereas targeted metabolomics is great for validating or doing things like biomarker discovery or assessment. So if you compare the two targeted metabolomics has limited coverage. It's limited for a potential for discovery but it allows you to quantify things absolutely. It allows you to do automation which makes it very fast. It also is very standardizable and that also is a real strength. With untargeted metabolomics, you're detecting tens of thousands of features. So there's a lot of data you can generate. There's a great potential for hypothesis generation for discovering novel metabolites. But untargeted techniques don't allow you to do quantitation. Untargeted is very slow relative to targeted and it's still a process that needs considerable standardization. Every group, every lab around the world does untargeted metabolomics differently and so it makes it really hard to exchange and share data and understand what people are doing. So you might detect a bias in my presentations. I certainly prefer targeted metabolomics over untargeted. But with improvements slowly untargeted metabolomics is getting better and better. So with targeted metabolomics, you'll take your sample. You'll run it through GCMS, LCMS or NMR through a standard workflow. You'll identify and quantify metabolites. Then you'll do data reduction, data analysis from the identified and quantified metabolites and then you can go straight to your biological interpretation. With untargeted metabolomics, you often have to work with a large number of samples. And you use the large number of samples to help compare and extract the most significant signals. You don't do any identification initially with untargeted metabolomics. Instead you do a lot of your data analysis and reduction and multivariate statistics to select the peaks. And from there you reduce the number of peaks that you want to analyze and from there you do the metabolite identification. And that takes a long time and it still means that even after a table identification, you still haven't got to the next step, which is the interpretation. So that's partly why untargeted metabolomics is an intrinsically slower technique than targeted. These are examples of some of the tools that we use in our center, the Metabolomics Innovation Center. We have NMR, DIMS, LCMS, GCMS, CEMS, a whole range. And just the types of metabolites that you typically detect between the different techniques. So NMR is better for water soluble, LCMS and DIMS are direct injections, better for hydrophobic. A whole range of fluids and samples that you can use. NMR is quite flexible, whereas NMR is a little less. The sample volumes, NMR is not sensitive, so it needs more sample volume where LCMS is quite sensitive and GCMS is sort of in the middle. There's different preparation run times depending on the instruments, amount of data analysis, very different limits of detection. So NMR, about five micromolar where NASA's back LCMS is on the order of nanomolar. The number of compounds that you can detect with these different techniques and identify and quantify is ranges. The LCMS methods are now up to around 500 metabolites. If you compare between the different techniques, take the same sample, run it through NMR, run it through GCMS, run it through DIMS. They only overlap by about 10 to 20%. So these are very orthogonal methods for identifying compounds. And so this is why it's often a good idea to use multiple techniques, multiple platform to get a full picture of the metabolites. Choosing one platform kind of restricts you to what you can do and analyze. So with the latest techniques in NMR, it's possible to identify between 50 to 200 metabolites in a sample. With the better techniques for GCMS, you can get up to about 150 metabolites that you can quantify. You can identify but not quantify up to 200 or 300 metabolites in GCMS. With things like direction-jection mass spectrometry, you can identify and semi-quantify about 150 compounds. If you integrate DIMS and LCMS with targeted techniques and even with untargeted techniques, you can identify and semi-quantify or fully quantify about 300 to 500 compounds. With lipidomics, some of the best techniques can get up to about 3,000 lipids identified and semi-quantified. And then with phytochemical analyses and the more exotic drug and pesticide residue analysis, often you have to use pure HPLC systems, although mass spec systems are also getting better. I'm looking at my time and I know I'm running a little over and people are probably getting hungry for lunch. I might, I don't know, Rashad, Francis, do you think I should carry on or should we perhaps call it or give a break to everyone now? Who wants to break now? How much time do you have left, David? I think it's about five or 10 minutes. I suggest you finish it then maybe, yeah, maybe a longer or 45 minute break after or something. Sure. Okay, so this is sort of a bit of a detail about untargeted metabolomics. We're going to be talking about it a few times over the course of the day. In untargeted metabolomics, you run many samples and you separate and are looking at thousands of metabolites. A typical untargeted study will take hundreds of samples and you may run things over days or weeks. The problem with running days and weeks of runs is that LC instruments and MS instruments aren't reproducible, so they vary in terms of their separation. They vary in terms of the intensities. And from run to run, we see what are called batch variations. So these changes in retention time, intensity, mass measurements make it difficult to sort of merge or compare these features and to scale them. So what you might get if you're running a bunch of samples on day one, a bunch of samples on day two, a bunch of samples on day three is that you'll get changes in intensity. You can see changes from day one to day two, day two, everything is reduced intensities. And you'll get changes in retention time. So if you merge everything from day one, day two and day three, you get that kind of messy look, which is those colored lines above. But if it's the same sample and essentially the same metabolites, you really should be only saying five peaks, and they should all be nicely aligned and nicely overlapped. And so this is what both batch correction and sample alignment peak alignment is intended to do in untargeted mass spec. So this is an example with real material, real data. And you can see the different colors correspond to the different LC retention times and the intensities from this base peak chromatogram measurement. And so you can see the overlaps and the variations under the word raw, and then you can see things like cow XCMS and block shift. These are techniques to do spectral alignment or chromatographic alignment and batch correction. And you can see how things that were, you know, initially two or three very distinct clusters have now aligned into single piece. So this is one of the critical things that's often done in untargeted mass spectrometry. And it's done also for untargeted NMR as well. In addition to the alignment, you also have to do scaling. And so there are cases where things, the instrument gets weaker and weaker, the signal gets weaker over time. Sometimes, depending on the injection order, depending on how dirty things are, sometimes you'll get high or low intensities. So with these things changing either with retention order or batch changes with intensity, you also need to do scaling so that you can ensure that all of the peaks are of the same height. And this is why often quality control samples or quality control standards are added in untargeted metabolomics. This allows you to correct all of those peak intensity issues. This is not an issue with targeted metabolomics. So all the things I'm showing us is there for untargeted metabolomics. And it's these corrections that take a fair bit of time. And if they're not done right, you can get some really improper results. So with liquid chromatography mass spec, you have an LC thing that generates a set of retention times for peaks. And then with mass spec, you can also get masses. And so you can end up with not just a two-dimensional chromatogram with height or intensity and retention time, but you can end up with a three-dimensional picture where you have mass to charge, retention time, and the intensity. So we're looking kind of like a, you know, black is very intense and yellow is not very intense and dark red is more intense. And you can see the plot in a more three-dimensional view in the corner where we're showing retention time, intensity, and mass to charge. And so this is what you typically collect in an untargeted MS study. Now, if that was one sample, if you had a bunch of samples where you had a whole bunch of different retention times, a whole bunch of different intensities, instead of getting the simple thing that I showed, you get this where there's all kinds of peaks. And it seems like maybe your samples are more information rich than they are, but they aren't because you have to do these retention time and mass shift corrections. So what you do and a lot of the techniques that you'll hear about later today and tomorrow as well are performing these batch corrections, performing these retention time adjustments, performing the peak alignments. To reduce the number of signals to allow you to identify the relevant ones, and then to identify the significant ones, and then to annotate those peaks. And that's the main goal of untargeted metabolomics. So you go from spectra, whether it's LCMS or NMR spectra to lists of metabolites. And that's done with either untargeted or targeted metabolomics. And then as we go through the rest of the course, we'll try and go from these lists or annotated collections of metabolites and their concentrations or their relative concentrations to pathways. And going from those lists to pathways means you can interpret the data. And so we'll go through that as well, both today and tomorrow, using tools like Metaboanalyst and HMDB. And from those lists and pathways, we can also identify markers and biomarkers. And these are used to help even generate models and perform systems biology studies. And these are some of the things that will dabble in a little bit towards the end of the course. So there's lots of challenges and we're trying to deal with these. We deal with going from spectra to lists. We worry about data integrity and quality, spectral alignment, spectral normalization. We worry about data reduction and classification. Thinking about significance, which are the significant metabolites, which ones are the insignificant ones. And we worry about metabolite annotation, identification and quantification. We're going to learn about that today. And then tomorrow and part of today, we'll learn about how to go from those lists to things like pathways and biomarkers. How to do pathway mapping, how to do path identification, how to do biological interpretation and how to identify worthwhile markers for moving into practical applications.