 Okay, welcome to the workshop this is the first time that we're trying to do a hybrid workshop. And so there are going to be some technical issues as we deal with working in two places at once live in both cases but remotely in both cases as well. The idea of a hybrid workshop is that not everyone can travel to some of the distances across Canada. The two of you are from Alberta or Saskatchewan, and the people in Montreal or from Ontario and Quebec. It saves money and time for them to be able to go to these local places so we're hoping this will be a new model for the bioinformatics workshops. CBW started in 1999. I'm the last person standing from the workshops originally started by Francis will add and myself and several others Francis retired from CBW about a year or two ago. Michelle has been carrying the torch, and then Nia has joined CBW as of early this year, and we're happy to have or join it us and right now you're running two workshops simultaneously. We're in three locations. So, we can talk about multitasking. So I'm going to. This is following a creative commons license. You guys can download all of these slides and hopefully you have. And this workshop is also being recorded. So if you miss things, you'll also be able to get the material shortly after the workshop is ended with the dialogue and other things. So I'm going to be introducing you to metabolomics. Hopefully everyone can see the slides from back there. I know some of the stuff at the bottom might be cut off but I mean if you can't see things you can follow along with your own set of slides. We've already gone through the schedule here so we're in the introduction to metabolomics. I have a fair bit to cover so I'm going to move fairly quickly but certainly you can ask questions and if you can't hear if there are other issues, raise your hands. We're here to help you. Likewise, if there's issues with lighting. The slides are too dim. We can also dim the lights, but is it okay for everyone right now. Okay, and then I'm just checking with Montreal are you guys still able to follow along and everything's a okay. Yeah, very clear why I don't get very clear. Audio feedback. So what we're going to do is we're going to learn about the basics of metabolomics we're going to talk about the sizes of different metabolomes because that's important. We're going to talk about some applications of metabolomics give you guys an introduction why metabolomics is important. We're going to talk about some of the principles of metabolomics technologies are going to be talking about liquid chromatography gas chromatography mass spectrometry and NMR spectroscopy. And we're going to talk about the differences between targeted and untargeted metabolomics and those are important to understand because today we're mostly going to focus on targeted metabolomics. And then tomorrow you'll learn a little bit more about untargeted metabolomics, and then you'll also learn tomorrow about how to analyze metabolomic data. Now I know that a lot of people mentioned that caffeine is important as a metabolite for them. There's coffee that's sort of boiling over here. If anyone wants to have some coffee. Because I think it's available. So if you need to move around to wake up or or just exercise that's fine. Some of you I know have to leave for certain vital meetings that's okay to it's an open classroom. So in terms of metabolomics, we can think of it as as this structure around the pyramid of life. The base of the pyramid is the genome. That's your DNA, or any organisms DNA the DNA codes for proteins. The proteins are there to facilitate chemical reactions. And life is really a series of chemical reactions, set of metabolites or chemicals that are in organisms is called the metabolome the study of the metabolome is called metabolomics. That intermediate step the proteins proteome study is proteomics and likewise genomics is the study of the genome. Now as you climb up the pyramid there's different levels of influence is the genome is not affected, typically by the environment it's very stable in fact we can dig up DNA from mammoths and neanderthals and it's still largely the same after hundreds of thousands of years. On the other hand, if you guys are still drinking coffee or having your breakfast sandwich your metabolome is changing, even as I speak. So it's influenced by what you eat what you breathe what you drink. It changes relatively rapidly over time. There's also physiological differences you go up the pyramid of life. Your genome is identical, whether it's a cell in your stomach a cell in your eye a cell in your muscle a cell in your skin. But the metabolome in your brain, your skin, your muscle is completely different. So the metabolome is exquisitely sensitive to physiology. And that's something that many people forget and don't realize. There's also a range in terms of difficulty of measurement. It's pretty easy to routinely sequence the entire genome now. We can sequence you for less than a couple hundred dollars. That's three billion base pairs. And that's because there's just four bases to worry about characterize the metabolome in a person or an individual still isn't possible in the sense of completely characterizing and it's extremely difficult because there are literally thousands of different chemical species and hundreds of thousands of different chemicals. So the pyramid of life is something to consider I'll use the same analogy and a few other examples as we talk about metabolomics, but it's important to understand that this role of influence. It's at the interface between the genome and the environment. And so that makes the metabolome uniquely positioned to identify or characterize the phenotype, which is what we do in biology. So in terms of what is metabolomics, well you can compare it to genomics, which everyone's heard of. Genomics is a research field that uses high throughput technologies to identify and characterize the genes in either cells, tissues or organisms. Metabolomics, same definition field of life science, high throughput technologies to characterize small molecules or metabolites in cells, tissues and organisms. Peptides are organic and including even in organic molecules, and they have to be detectable in an organism or a body or tissue. We use a cut off of about $1,500. You can go a little higher or a little lower but that's been a general rule. It includes very small peptides but not proteins. It includes short oligonucleotides but not your genome or RNA. It includes sugars and nucleosides and organic acids and ketones and aldehydes and all those other chemicals that you've learned about or heard about. It covers food components, food additives, it includes toxins and pollutants, herbicides, pesticides, drugs and drug metabolites, all the stuff that's actually in you, or just about any living organism. It also includes exogenous compounds that the microbes in your gut or on your skin or in your mouth produce, and the concentration is essentially is what we can detect. So as technologies improved, the threshold of detection has been dropping steadily, which means there's more and more compounds that we can detect every year. So when we talk about metabolome, it's a complete collection of metabolites, and that can be something that could be a cell, single cell metabolomics, could be tissues and organs. It could be an entire organism. It can be biofluids. It can be dirt or soil. It can be water from the environment. As I said, it includes those endogenous or internal compounds that our bodies or other organisms produce, and it includes the exogenous man-made molecules. It also includes transient things, because I think if you've learned a little bit about chemistry, there's lots of intermediates as something is transformed, and some of those intermediates have a lifespan of microseconds, but they are part of the metabolome. And as we are appreciating now, it also includes theoretical molecules, ones where we know this has to exist because the starting and end product are known, but the intermediate hasn't yet been characterized. And so this has been part of biology actually for many decades. The metabolome is defined by the technology. So there are some sensitive or hypersensitive techniques like mass spectrometry that says the metabolomes are at large. There are less sensitive techniques like NMR, which generally thinks or views the metabolome is somewhat smaller. The fact is the metabolome is always ill-defined. It depends on the organism, because it changes tremendously if you're looking at plants, microbes, or humans. It's constantly evolving. The first edition of the human metabolome database had 2,500 compounds in 2005. It now has 250,000 compounds, and we expect in a couple of years it'll grow to about 5 or 10 million compounds. So it is growing rapidly, but it is essentially changing as we learn more about chemistry and use tools like machine learning. So when we think about metabolomes, we have to think about organisms. And so if we think about mammals or vertebrates, typically they have about 250,000 compounds as of today. So we have 250,000. A cow has 250,000. A dog, a cat, a snake. It's roughly about the same. We may have a little bit more because we have a little bit more diverse diet. So there are microbes, and there are microbes in our bodies. There are microbes in the soil and the air everywhere. That includes bacteria and viruses and parasites. They can't run away from threats as quickly as mammals or vertebrates, and so they have a few more compounds, and the estimates may be around 350,000. So there are so many niches that microbes inhabit from acidic to hyperthermal vents, and the microbiome for essentially microbes is still in an early stage of characterization. Plants, probably around 400 or 500,000 metabolites, more diverse than mammals, more diverse than microbes, and that's the game because plants can't run away from threats so they use chemical warfare. So a lot of the exotic metabolites, secondary metabolites, natural products of interest in the medicinal world come from plants because of the huge chemical diversity. Now if we go back and look at human metabolomes, because we I think know a bit more about ourselves as I said there's about 250,000, we can say endogenous metabolites. There's drugs, there's about 2,600 that are mapped. There are food chemicals, food additives, components and foods, maybe about 70,000 or no. There are drug metabolites. So if you are taking drugs, they're converted to metabolites, those exist in the body, and there are various toxic chemicals that include herbicides, pesticides that can also include cosmetic compounds, dyes, things in your clothes or put in your hair. All of those represent part of the human metabolome. What I've shown here is that there are the range of concentrations, and we're going from as low as femto molar to as high as molar. So the most concentrated metabolite in your body is urea. It can be found in urine and it can get up to concentrations of hundreds of millimolar even molar concentrations. But there are obviously, you know, steroids, secondary signaling molecules that are in the picomolar range, even the femtomolar range. So we keep this information in various databases and we'll talk about these databases later today, but the human metabolome data is in the HMDB. Information about the drugs which are prescribed to people or you can get over their counter is in a database we do up called Drug Bank. It's a database of food components that's in a database called FoodDB. The drug metabolite information is also in Drug Bank and it's something called a toxic expose on database contains all the information about environmental chemicals. There's a gradient in concentrations here because of the lowest concentration you'll hopefully find the metabolites of toxins. If they're too high, then you're in trouble. Same thing with certain drug metabolites. Obviously when we eat food that's in our body to but sometimes just temporarily. So the drugs typically are on a nano molar range foods and food additives nano micro molar range drug metabolites and toxins way down hopefully in a picomolar range. But endogenous metabolites the things that keep you alive. Those have a tremendous range because they have different roles and purposes. Now there are also sort of theoretical human metabolomes. As I mentioned about how we're learning to understand some of the intermediates, or new classes of compounds. So the most diverse collection of compounds in your body are lipids. Because there are probably hundreds of thousands still not in the human metabolome database. So we've analyzed the current estimate of number of drugs and drug metabolites as much below what we believe to be true seeing who just arrived is actually working on that and hopefully we'll have a new number about the number of drug metabolites. Foods and food compounds those are transformed so we know about 70,000 but those are transformed in our bodies and so those will also produce a variety of glucuronides and sulfate derivatives and and cleaved and then something that is still a little obscure is what we're calling the second dome or the secondary endogenous metabolites. These represent metabolites that your enzymes will process, mostly accidentally. And these represent a huge collection of things that we just don't know about. And these represent the dark matter of the metabolome, and this is a major subject of metabolomic research now and has been for a number of years, and there's several people here in this group who are certainly studying that. So, why are we taking a course in metabolomics, why is metabolomics interesting why is it important. There's some interesting facts. So when we look at the common clinical diagnostic assets if you go into the doctor, and they take a blood or urine sample. Most of what's being done is to look for small molecules as to look for things like creatinine or calcium or urea or tyrosine and phenylalanine ratios or HDL and LDL which are cholesterol lipids. So when we think about drugs that save people's lives, almost 90% of the drugs used today are small molecules, between 50 and 55% of all existing drugs are derived from existing or pre existing metabolites, mostly plant some of them mammalian or microbial metabolites. Almost a third of the identified genetic disorders involved diseases of small molecule metabolism. And when we think about small molecules within living systems, they serve as the co factors and signaling molecules and the bricks and mortar for not only 1000s of proteins but all of the components in the cell. What people are realizing is that metabolites are fundamentally the canaries of the genome just like the canaries of the coal mine that were used as early warning systems in the 1800s. So a single base change in a DNA chromosomal molecule can lead to a 10,000 full change in metabolite levels. So finding that DNA changes just like finding a needle on a haystack measuring that metabolite change in many cases it's trivial, which is why metabolites are used as clinical biomarkers all around the world, and will continue to be used for a long time to come. Now, one of the challenges in metabolomics is that it's very time sensitive. I talked about what happens when you eat or drink something. Well up there at the top is the response you would see if you measure things every minute or two minutes and you'd see things going up and down wildly. If anyone's ever had a glucose monitor and you can talk to Dorsa back there she's worn one for months, you'll find your glucose level goes up and down all the time. Now if we were to eat a large meal like spaghetti or this bacon eater you had today. Your proteins would change to your insulin levels might go out your ghrelin levels might go up but only about a half dozen proteins will change at a much lower slower rate, lower amplitudes and what your metabolites will. Likewise, if you had your coffee or ate your cereal this morning, your genome, hopefully didn't change. Otherwise you've got some pretty toxic things in there. But this is the point about this time response metabolome changes with time the podium changes subtly with time, the genome doesn't change. And so there's a temporal issue but there's also, I guess a very appealing part is that it's easier to change the metabolome to fix the metabolome that is to fix the genome. The other thing that people often don't realize is that metabolism is fundamentally understood. This is a wall chart that was drawn in the 1960s. It's still used today and it represents the same thing you'll see on the keg metabolism pathway chart. There are hundreds of reactions, hundreds of enzymes, thousands of metabolites. These were all known decades ago. This represents the fundamental metabolism of all cells. So we understand metabolism to a higher degree than almost any other biological process. The other thing to remember is the metabolome is connected to all other ohms, metabolome in that pyramid is connected to the proteome, which is connected to the genome changes the metabolome. The metabolites will bind to your DNA, your DNA will produce proteins that then will produce metabolites, which will then change things. You know the trip repressor and the lack operon that you might have learned about all of that is showing how there's a connection between small molecules, the genes and proteins. It's true in bacteria. It's true in you. So if we can think about the small molecules like AMP, CMP, GMP, TMP, the monophosphate nucleosides, these are the components of the genome and the transcriptome. Without them we wouldn't have DNA, we wouldn't have RNA. The 20 amino acids are essential for the proteome without those amino acids we don't have proteins, the lipids and glycolipids that would give cells their shape and give cells their integrity. The lipids amino acids ATP in particular, that's a source of all cellular energy. If we don't have them, you die. All things die without those critical molecules that fuel our cellular processes. And as I said, the cofactors and signaling molecules, the vitamins, secondary messengers. And as it turns out, many, many amino acids form both cofactors and signaling molecules. What we're discussing now is that basically the genome and proteome, which we spend billions of dollars studying every year, largely evolved to just simply catalyze the chemistry of small molecules. Life is chemistry, and the proteome and genome are just all on for the ride. So understanding the metabolome is fundamental to understanding life. So lots of applications, as I mentioned, third of all genetic diseases are metabolic disorders. So genetic disease tests are used. One of the first examples, in fact, the most widely used example of metabolomics is in newborn screening. If you're under the age of 30, you probably had a metabolomic test. You just didn't know it because it happened when you were two days old. So you take a blood sample from your heel and then they analyze it using metabolomic technologies to look for acyl carnitines and amino acids to identify whether you have a metabolic disorder. So there's 350 million people that have had a metabolomic test. There's less than 100,000 who've had a genomic test. So when you think about the importance, particularly with the metabolomic newborn screening, one million people's lives have been saved or changed because of newborn screening. We use metabolomics and nutritional analysis. We use metabolomics in any clinical tests for blood analysis and urinalysis and cholesterol testing, drug compliance and drug testing, whether it's performance enhancing drugs or drugs of abuse or even just simply monitoring morphine concentrations. We use MR instruments, CT scans to do chemical imaging. These are forms of metabolomics or imaging metabolomics. Toxicology testing, clinical trial testing increasingly is being used by drug companies as they move products to the FDA. It's almost becoming your requirement now. Monitoring fermenters, food and beverage testing, which is done every day for all the foods that you eat, nutraceutical analysis, drug phenotyping, water quality and soil quality testing, petrochemical analysis, all use metabolomics applications. It touches every part of your life, but most of you may not be aware of it. So some of you are brand new to metabolomics. How do we do it? Well, this is a simple diagram. Usually we'll take some biological tissues. So it could be some microbes, it could be soil, it could be plants, it could be meat, it could be muscle, or an organ, and we might extract it. If it's solid, we extract it. Or if the organism has something like blood or urine or cerebral spinal fluid, we use a needle or we ask them to generously donate a sample. Either way, we're trying to get a liquid. So it could be a solid or from a human, a liquid sample. And from there, we do chemical analysis. So four steps, tissue extraction, biofluids and chemical analysis have been around for 60 years. It's the last step, which is mostly about what this course is, is where all of the recent developments have happened. These have occurred in the last 1520 years. Chemical analysis is done using mass spectrometry, NMR, spectroscopy, liquid chromatography, gas chromatography, variety of techniques. The last step is what's made or changed metabolomics from just being the analysis of a single chemical at a time, 200s or even 1000s of chemicals at a time. Now, when we are looking at sort of the general coverage, what's possible, and this is why I talked about the difficulty or completeness. This is a weakness fundamentally of metabolomics. And as I say, we can sequence 3 billion bases, 22,000 genes in each and every one of you for less than $150. We can do proteomics of a sample or blood sample and get about 8,000 proteins routinely, maybe not quantifying them. And this is the strength of proteomics. But metabolomics, the average metabolomic study only characterizes between 100 and 200 chemicals, often less. And we know there's 200,000 chemicals in our bodies. We know there's 50,000 proteins in our body. We know there's 22,000 genes in our body. So in terms of completeness, genome, about 100%, proteome, maybe about 20%, metabolomics less than 1%. And that's because metabolomics is dealing with huge amounts of chemical diversity, millions of compounds, thousands of chemical classes. Whereas in proteomics, it's just 20 different amino acids. And in genomics, it's just four bases. The chemistry is easier, we can multiplex it, we can do lots of tricks with it and that's why next generation genome sequencing is so fast and so cheap. And why proteomics continues to get cheaper, again, because the alphabet, a chemical alphabet is quite small. And metabolomics will always be difficult because the alphabet is huge. I mentioned the standard methods we use in metabolomics. We talk about chromatography, ultra performance liquid chromatography or UPLC, high pressure, high performance liquid chromatography. Capillary electrophoresis is another separation technique. We can couple those separation methods to mass spectrometry. So liquid chromatography mass spec. We can also couple them to gas chromatography, that's GCMS. We all often have high resolution mass spectrometers like Orbitrap or Fourier transform mass specs. We can have lower resolution mass specs like triple quads. We can use NMR instruments. We can even use crystallography and even electron microscopy to characterize chemical structures now. We can have a wide range of tools. So it's not one gene sequencer, or it's not one ELISA assay. It's metabolomics uses a lot of technology because the alphabet, the chemical diversity is so huge. So I'm going to segue into some of the techniques that we use in metabolomics. I'm going to talk about chromatography first. You have either seen it or performed it, but this is a separation of molecules and mixtures by running something through a column packed with some kind of resin. You can also do it through paper chromatography or thin layer chromatography. And this is just simply a way of separating molecules by some absorption to a matrix. So, officially, the definition of chromatography is a separation of mixture components. And we take that mixture that's dissolved, and it's dissolved in a mobile phase a solvent could be water could be a cedar nitrile could be ethanol or mixture of multiple solvents, and it goes through a stationary phase. The stationary phase is usually made up of could be silica gel or sand could be made up of plastic beads, it could be made up of anything. And that essentially leads to that separation of that molecule as it passes through that stationary phase. And that's through differential partitioning between the mobile phase and the stationary phase. So there's column chromatography which is what's primarily used in that metabolomics, but if you're a chemist use thin layer chromatography, you can use liquids as the mobile phase you can use gases as the mobile phase. The matrix can be something that binds to affinity, it could be one that interacts through charge through ion exchange. It could be a thing that separates based on size, with small molecules coming off more slowly than large molecules. It could be something called reverse phase where the stationary matrix is hydrophobic or normal phase where it's kind of a mixture of hydrophobic and hydrophilic. You can let things separate just through pure gravity. Or you can put in pressure and move things faster. So, the most common method for separating molecules, both in biochemistry, chemistry and metabolomics is using something called high pressure or high performance liquid chromatography. So the abbreviation is HPLC. So this is developed about 50 years ago, and it uses very high pressures at 6,000 pounds per square inch, and very small five micron pressure stable particles. So this is very fine sand if you want to think of it that way. It allows you, if you have a good detection system to detect and separate compounds at the parts per trillion level. So it's a very powerful separation and can be coupled to make you identify and separate things at very low levels. And depending on the stationary phase, you can separate and detect polar and non polar compounds. You can use either hydrophilic or hydrophobic compounds. So we use the thing called reverse phase, which has sort of a greasy sand inserted in the tube or in the column for separating non polar molecules. So you have a non polar stationary phase, and usually use something like methanol acetonitrile, which is a polar solvent as a mobile phase. Which is ancient chromatography, and it was used for separating non polar molecules uses a relatively polar stationary phase, and a relatively non polar, maybe chloroform organic mobile phase. It's not used very much anymore. And then another one which is called helix or hydrophobic interaction liquid chromatography. And this is for separating polar molecules. That's different than reverse phase. So you typically have a polar hydrophilic stationary phase, and then you'll have kind of a mixed non polar polar mobile phase. So the columns you use an HPLC are relatively thin and narrow, they can be made of glass, they can be made of something called peak, which is sort of a plastic or they can be made of stainless steel. Typically they have to be able to handle high pressures and so stainless steel is generally preferred. The very thinnest and smallest columns are called analytical columns. You can also use thicker wider columns, preparative columns. They can range internal diameter from one millimeter to 50 millimeters in length from 20 millimeters to half a meter in length. So tremendous variability. Most HPLC systems that are attached to mass specs are the analytical ones. The natural product chemists typically like to use preparative ones. And generally metabolomics labs that want to identify novel compounds also need to use preparative HPLC. So I mentioned that the reverse phase column method is probably the most widely used chromatography method in metabolomics. We use these tiny sand or silica beads. So silica is sort of like you know ground up glass. And they are course, but they're derivatized with hydrophobic components. We will use in this case C18, meaning there's 18 carbons in the fatty chain that's attached to the surface. So it's essentially a greasy phase that's layered on top of all of the beads. These beads are about five microns. So a C18, that's 18 carbons, a C4 column means that there's a four carbon surface that's attached to the HPLC beads. You can have diphenyl groups, you can have other chemicals attached depending on the type or method you want to separate or the type of column you want to prepare. Again, that surfaces defines what you will separate and how well you will separate it. Now the length of the column, as well as some cases how long you run your column separation, improves your resolution. So if you have a short column and you've got two compounds that are trying to be separated, typically they won't separate too well. If you have a longer column or a slower run, those two things will separate better. So that's one trick for getting better separation. The other trick that was realized is if you go to smaller beads going from five microns down to 1.7 microns, now you can get almost the same separation over the same column length. So this is the basis of UPLC or ultra high pressure liquid chromatography, because to push something through with really tiny beads needs a lot more pressure. So the concept of using really small beads to get better separation also required higher pressures. And this is just showing the different size beads. Again, they're relatively porous, which is another mechanism to help sieve or separate molecules, but they are all coated with some kind of organic molecule. So how do you do HPLC separation? So you typically have a column. You typically have a pump. They'll call a solvent delivery system or a solvent manager. They'll typically have some kind of solvent. That's the mobile phase and it's usually in some kind of flask or container, usually some tubes that are running out of it. You have the pump pushing things in, and then you'll inject your sample. There's an injector. Some cases you'll have an automatic injector. Sometimes you do it by hand with a syringe. The sample is put in. The pump is supplying the pressure. Of course, the column has full of these tiny beads and it's pushed out. And then it'll come, the separated material comes out and is detected and it can be detected with an ultraviolet or infrared detector or it can be detected with a mass spectrometer or an NMR spectrometer. So the detector can be anything. So as this is happening over time, what you're doing is you're monitoring these these compounds as they come out over time. That's called a chromatogram. And so you'll see these peaks. And under each of those peaks, there's either one, two or a dozen molecules coming out at a certain elution time. And so that's monitored through a computer and through your detector. So if you get better separation, what people realize is it helps to have sometimes two or more solvents appearing in a gradient. So most higher performance, higher quality separation methods use gradient HPLC. So a solvent A, a solvent B, sometimes solvent C, that's two different pumps, things mixing. Sometimes you inject your sample, but now this gradient of solvents has changed and there's a gradient profile. And it's very much an art. Everyone has their own preference, different solvents, timings, lots of trial and error. But there's some standard chromatographic methods, as Mahi will tell you that allow you to generally separate most metabolites, most of the time. So that typical HPLC separation looks like. So this is one that's been run for 50 minutes, which is unusually long. And you're seeing dozens of peaks, some at the very beginning, some coming out later at the end. The ones that come out later, if this is a C18 column, are more hydrophobic, the ones coming up beginning are more hydrophilic. The intensity of the peak is related to how much of that material is there typically. In this case, we're measuring absorbance. So we're measuring UV, maybe around 280 nanometers. And not every peak represents a pure compound. Many of these peaks may have three, four, 10 compounds under them. Now, we've talked about liquid chromatography. There's another technique where the mobile phase isn't water or cedar nitrile or methanol. It's a gas. It's usually maybe a helium gas. And instead of running things at room temperature, you run things in an oven. And so this is a schematic and the picture of an HP or GCMS or GC system. So typically, like the HPLC system, you'll have an injector, but instead of a pump pushing liquid, you'll have a gas tank. And you will pass things through a column and then instead of the column being straight and narrow, it's actually a coiled thing, very tiny. And it's in an oven. And then you'll have a detector. And there's various types of detectors. Some will use flame ionization. So that's a little fire depicted there. Others will have a mass spec. And you can see at the bottom, the type of separation you'll get through gas chromatography. So in gas chromatography, things have to be vaporized as a gas. It's not as if stuff is coming in as a liquid, or at least not into the column. So it's vaporized into a gas and injected into a column, which is filled with gas. It's pushed through the column through a gas. It can be helium typically has to be an inert gas. And the column is very thin, about two millimeters and internal diameter, and very long, typically about 10 meters long. The inside of the column is basically it looks like a copper tube, very thin, but it has a polymer stationary phase it's been inserted absorbed the surface of the metal to Now, not everything vaporizes into a gas and so there's tricks people have learned where you can add trimethylsilane, you can chemically modify compounds with this trimethylsilane, and it makes them volatile. And that way they can vaporize it much, much lower temperatures. Here's this compound called BSTFA and you can get trimethylsilane or trimethylsilane attached to all these hydroxyl groups. You can also do methoxamation. And again, this is another compound that will open up and derivatize things so they're a little more amenable to silation. So, gas chromatography involves some chemistry in advance of running it through a GC system, whereas liquid chromatography almost never involves this derivatization. So this is just a little bit of an animation illustrating the gas chromatography game talk about this sample of a mixture we're pushing it through with a helium gas column with things that have high affinity stay in the column longer things that have low affinity come out of the column faster. And you're seeing at the bottom this animation as things are being pushed through peaks, the blue peak comes out and then the red peak comes out a little later. That's the retention time. And we're seeing a graph of GC chromatogram. It looks a lot like an LC chromatogram, but separation from gas chromatography is many, many times better than liquid chromatography, far more reproducible has a much higher plate count. So if you can, you actually want to use gas chromatography rather than liquid chromatography, but there are limitations. Liquid chromatography allows it to separate very large molecules, you know, up to proteins and peptides. Gas chromatography generally only works for smaller molecules less than maybe six or 700 dolphins. So you can't really analyze lipids or glycolipids or peptides, not everything derivatizes nicely in gas chromatography. But there are huge advantages and the number of people that run GCMS far out numbers and other people running LCMS. So this is what a GC column looks like at least schematically say it's a coiled piece of wire, about 10 meters long. If you look at inside it you'll see that there's this this piece of wire and then it's it's got a few silica component and then it has this the amide coating and then a stationary phase which is usually got polysiloxane. That's that the orange thing that's in the inside the tube. So that is the key or the stationary phase that allows you to separate things by gas chromatography. So whether it's liquid chromatography or gas chromatography, we talk about how long things are retained in the column, and that separation, which is the time axis. So we saw here which is that chromatogram and we saw this retention time. It's labeled RT and it's the time and analyzer molecule takes to pass through a column. So it depends on the compound itself, hydrophobic compounds tend to stay in the column longer, small molecules tend to move through faster or sometimes slower depending on the column. So it's long or short columns, wide columns, the flow rate of the gas or the liquid, the pressure of the gas or the liquid, the carrier of gas or liquid, the temperature can have high temperature HPLC or very high temperature GC. So if you are able to compare your retention time to some other standard or if you can predict retention time as Mahi is working on, then that can help you identify a compound. Because different compounds have characteristic retention times. You can also calculate something called a retention index. You can also calculate gas chromatography and gas chromatography in particular. So that's the retention time normalized to a set of retention times of adjacent molecules. And you usually in gas chromatography we use NL canes. So that might be butane, pentane, heptane, hexane, octane all the way up to dodecane. So this is just illustrating how if you have some kind of standard and you're able to inject your sample, your known standard and say it comes off at one point or 2.85 minutes. If you inject a new sample or mixture and you see a peak pairing exactly at 2.85 minutes using the same chromatographic conditions, you can be certain that it must be the same molecule. What's more, if you're using optical detection like UV, the area under the curve tells you how much of that compound is there. So the little area, there's one peak of ground, the big area says there's 10 peak of ground. This is key to the measurement of concentrations. And we're going to learn a little bit about that but you have to understand whether it's a measurement of through chromatography for UV for mass spec. The area under the peak tells you how much is there. So here's this GC chromatogram. It's about 30 minutes instead of the 50 minutes. You'll notice the peaks are much, much narrower for seeing actually many more peaks. And that just highlights how much better gas chromatography is over liquid chromatography. And that resolution is called the plate count and gas chromatography is exceptional at that. Now the detectors we stick to the LCR GC, it can be NMR instruments, they can be mass spectrometers, they can be UV, IR. We're going to focus on the mass spectrometers. And we're going to talk also about the NMR instruments as well. So we in metabolomics typically will attach a mass spectrometer to the end of a GC or an LC system. Mass spectrometry is just a million dollar waist scale. It's an analytical method to measure the molecular or atomic weight of samples. NMR instruments have been around for more than 100 years. They were used to measure the mass or the weight of electrons and then later of atoms developments and then that's how the periodic table was determined. People finally got the wise idea that maybe you can measure more than just atoms, you can maybe measure molecules. And so that's what they've been doing. So these are typical mass spectrometers. If you guys wanted to take some time off during the breaks, you could also see some of the instruments that we have in TMIC. But as I say, these are million dollar waist scales. They weigh very sensitively. That's just why they're so expensive. This is a mass spectrometer that's attached to a liquid chromatography system sort of on the back of that. So it does LC MS. One concept is that different compounds can be uniquely identified by their mass. So here are different drugs, butophanol, aldopa and ethanol, and they have based on their chemical formulas and structures, very specific molecular weights. Just like everyone in this room probably has a different weight. So if we had a list of your weight and your name, and we had a waist scale over in the corner here without looking at you, I could just look at the list and I could say, based on your weight, this is who you are. This is the way we would do it in metabolomics and in analytical chemistry. So typically with these million dollar waist scales, we can measure the molecular weight to within one part per million. So that means if you had six digits in terms of your scale, weight scale, so 100.000. zero. Dalton's so for decimal places or point zero zero one percent. So that's usually sufficiently accurate to confirm molecular formula from mass alone. So with large proteins you can generally get to within point zero zero two percent about one Dalton for 40 kilo Dalton protein. So the unit is not kilograms or pounds it's Dalton's or am use atomic mass units. So when we use mass spectrometry typically couple it to either a gas chromatogram system so that's GC MS. Or we can stick to mass spectrometers together because they also separate things. That's called tandem mass back. So GC MS is for separating volatile compounds, LC MS is for separating more delicate compounds by HPLC or you can see an MS MS is for separating compound fragments by either magnetic or electric fields. So masses are especially at the atomic level are affected by isotopes. So it would be like if you stepped on the way scale and you saw three different ways simultaneously. Firstly that doesn't happen for us, but because there are things like carbon 12 and carbon 13 or hydrogen one and deuterium or nitrogen 14 and nitrogen 15 those are all isotopes that differ by one Dalton or one am you. So if we were to take a compound in this case a large say lipid that has a mono isotopic mass of 1155.6 that is the mass of the C 12 and 14 oh 16 h components. Now, you also find another one that should have 1156 one Dalton more that probably has either a carbon 13, which has got a 1% abundance, or maybe it has a nitrogen in it. That's one Dalton higher to Dalton's higher would say I would have maybe two carbon 13 or carbon carbon 13 and nitrogen 15. The abundance is dictated by the isotopic abundance of those isotopes. So there's always going to be lower peaks of these isotopic variants, the mono isotopic mass is the highest because that's the most abundant these represent 95 99% of the molecule. Now, if you take the weighted average of those peaks 1155 1156 1157 based on your intensity, you'll come up with an average mass, that's 1156.3. That is, if we have a low resolution mass spectrometer, that's what we would measure for this compound. Most of what you see as high resolution aspects so you can see the mono isotopic mass that single wise, but if you're using a low resolution triple quad, you'll measure the average mass. So, this is the isotopic abundance of the main things on this case with chlorobenzene. So we can see that hydrogen is 99.9% deuterium very rare carbon 1298.9% carbon 131.1% that chlorine chlorine 35 and chlorine 37. There's two isotopes and one is sort of a two to one ratio chlorine 35 at 68 chlorine 37 at 31%. So, if we take the mono isotopic mass of chlorobenzene it's 112.00797. That's the one hc 12 chlorine 35, but we'll also have a situation where we can have something that maybe is 113. That represents a C13 instead of a C12 or a deuterium abundance. And that might have an abundance of maybe two or three or 4%. But then we'll see another peak, which is quite high, about 32% of the maximum intensity, and that's the chlorine 37 isotope. So, this chlorobenzene will have essentially six detectable peaks in our mass spectrometer with these masses measured at this at this high resolution, ranging from 112. To 117 Dultons or AMU. So how does a mass spec work. So it's not a scale where you step on it. What you do is you take your sample and you push it through kind of an aerosol spray system and ionizer. So if you have a spray aerosol can that's not unlike what a mass spectrometer does is it sprays things out. It's an ionizer has very high charge, usually a couple hundred or 1000 volts or more. Let's set the very tip of your aerosol. And it goes into a mass analyzer. An analyzer could be a giant magnet or a set of quadruples or an orbit trap, but it's a way of pushing around ions. And then those ions are then sent down a long tube where they're detected on an electronic detector, and the signals picked up. So mass spectrometer makes will produce a mass spectrum, and the x axis is not time. The x axis is M over Z mass to charge ratio. So mass spectrometry working with charged molecules. And in this case we charge up aspirin. And in this case the aspirin kind of fragmented. And so we're going to see certain peaks. The 180 is the mass to charge ratio of aspirin that's apparent molecule. So then you'll see fragments at 140 120 around 92 and 43. Those are fragments of aspirin as it passed through or maybe collided with certain gases. So a mass spectrum is usually characterized by very very sharp peaks much sharper and narrower than what you get by chromatography. The x axis is a mass to charge ratio for a given ion. So it's not, it has to be a charged molecule. The peak is a relative abundance of the iron. It's not particularly reliable for quantitation, it's related to the ionization efficiency. So the peak intensity is the ability to dissolve or flaw. So ionization efficiency is a function of the molecule of the iron as the matrix of the day of the week in the phase of the moon. So it's quite variable. It's hard to predict. That's why mass spectrometry is, it's hard to quantify with UV and NMR. Totally different. It's very easy to quantify. So, as I said, the mass spectrometers produce sharp narrow peaks. The width of the peak is the resolve with the power of the mass back instrument, the better the resolution, the better the instrument, more expensive, more better the accuracy. So the resolving power is sort of defined by the mass or M over C or M over Z of the molecule divided by that width, the difference between the two mass that can be separated or delta M. So, here's a schematic of a mass spec peak looks like a triangle if you think there's a width at half height. There's a width at 5% height. And you can see that at the top there are two peaks that are resolved everyone can see those two peaks. The one at the bottom might be a little harder to resolve. And it's certainly a function of how narrow those triangles are. So this is how we talk about resolving between two peaks. The peaks are separated by one Dalton, and you can't separate them then you can't distinguish at the one Dalton resolution. So here is a low resolution mass spectrometer picture. It's very heavy molecule in this case it's a peptide actually the resolution is 700. And all you can see is a mass that 2847. That's its average mass. Then we go into a higher resolution mass spectrometer a time of flight. So the top one is either an iron trap or triple quad time of flight instrument here has 6000 resolution. And what you can see are at least seven maybe even eight peaks. And all of those are separated by about one Dalton. So here's a low low resolution mass spec high resolution mass spec triple quad time of flight. So here's another example or just looked at it and this game it's a peptide but 3400 Dalton's, that's mono isotopic mass 3482.7473. The average is 3484 notice that the precision or a number of decimal places is slower. And the blue, which is just one giant Gaussian peak, that's a low resolution triple quad. It would just give us the average mass, and the resolution is about 1000 there. The red has some rough shape to it was a little narrow maybe it has a resolution of 3000. As we got a resolution about 10,000 we're starting to see the individual peaks, and then at 30,000 most of the black peaks. So all nicely resolved we see, you know, eight or nine nicely resolved peaks. Yeah, we have a question from someone in Montreal. The height of the peak is not reliable for quantization but does this mean we cannot compare the height between these in the same sample or the same peak in different samples. Technically both. We cannot compare the height in both samples within the same set because there's ionization efficiency differences as you go down as things come off. And then likewise between samples the instrument ionization is this variable as a rule, there's if you have a good stable instrument, the intensity from batch to batch sample sample is pretty good. And so that intensity is relatively reliable. The only way you can get really confident measures is where you put in an isotopically labeled standard, where you know exactly how much has been added, or you've done some careful calibrations through that same run, so that you know what the response of the instrument is. And that way, you know, you've taken what you know, and you compare that. And then you can say okay, I trust that intensity I can integrate. Hope for that clarify things. So this is a picture of a mass spectrometer. I'm just stating that it's lots of vacuums, very strong vacuum system, turbo pumped diffusion pumps, rough pumps, high vacuum pumps. These are things that always break in mass spectrometers. Then there is an inlet. Usually that's your HPLC system it could be a GC system, it could be a sample plate. And then there's the ion source or your ionizer, and there's the mass analyzer, which is the separator and then there's the detector, which is usually a micro channel plate. And then there's a computer that attaches to the whole system. So you've got this big collection of tubes and pipes with under high vacuum. And then there's the inlet, which is the chromatography we've already talked about, but then you try and ionize those molecules, it could be in a volatile form a gas formula could be a liquid form. And so the ionization things column, there's Maldi matrix assisted laser distortion ionization ESI electrospray ionization atmospheric pressure ionization, electron impact ionization chemical ionization. There are small methods to convert an uncharged molecule to a charged molecule and send it flying down through the mass analyzer. So, electron ionization or electron impact ionization EI is a hard method. It ionizes very small molecules usually under 1000 gallons. It allows you to determine structures. It's a method that's used in GCMS to ionize things. There's chemical ionization, which can also be used in GCMS. And again it's for smaller molecules. It's not as harsh as electron ionization and is used for more delicate molecules. Electrospray ionization and Maldi one Nobel prizes actually and these are soft ionization techniques. They are very gentle. They are ideal for peptides and proteins but they've also been very good for small molecules that are relatively delicate. And they can measure much larger things, hundreds of thousands of dollars, but they can also measure small things. And so ESI is primarily used in LCMS for both proteomics and metabolomics, and Maldi is primarily used in imaging. They can be used in proteomics and metabolomics as well. So electron or electron impact ionization EI that's gas chromatography. It's a case where you have your little inlet in the back that's about these gas phase molecules. And then you have essentially an electron gun, which is sending electrons, and they smash into these molecules at about 70 volts or 70 electron volts. So it's almost like a TV tube, the old style TV tube sending electrons in hits all of the neutral molecules, and they become positively charged, but also breaks them up into small fragments, and then those positive ions that are sent out into the mass analyzer. And again you use electric fields to push things around to repel or attract the ions that they can fly out into your vacuum. So, again, it's it's for gas phase molecules GCMS. You're bombarding it with electrons, use a tungsten filament just like the tungsten of the old incandescent light bulbs. The molecule is shattered because the electrons have energies much greater than the bond energies bond energies about five electron volts. Electrons with 70 electron volts will break all those bonds. And as I said, it's used for GCMS. And here's an example if we take a simple molecule like methanol and shatter it. We will see my creative $32, which is my creative methanol, but then we'll see another one that's lost a hydrogen it's 31 will see another one that's lost 29. And we're seeing the structures there. So these are some of the structures that are visible. When we shatter methanol. So those fragments are reasonably well predicted. And people with a lot of experience can kind of look at a, an EIMS and sort of figure out what the fragments are figure out what the structure is soft ionizations they're different so the electric spray ionization is sort of like taking an aerosol spray can and put a big charge in front of it. That's what we're seeing on the right, like select to spray ionization. So we use lasers, usually the ultraviolet laser, and we put our sample into a matrix. Usually it's a chemical that we dry down so we take our blood sample or tissue sample, and we add cyanohydroxy ceramic acid. And then we dry it down. The hydrosanamic acid absorbs very strongly UV wavelengths, it gets heated up, and it literally blows up. And as it blows up, it sends the fragments of that blood sampler tissue sample and all its ions into space. Shouldn't work, but it does. And they got a Nobel Prize for it too. So electric spray ionization is the most common method and as I say, what you're doing is you're spraying a tiny, tiny droplets through a little capillary or an aerosol tip, but you have three or 4000 volts at the end of the tip. So don't do this at home. But those those droplets, as they come into the high vacuum, start evaporating. And as the droplet evaporates, these charged components, which got their charge from the pillory, start, you know, getting too close together and the positive charges start repelling and the negative charges start repelling and so they fly off again, and they evaporate and they fly off again. And so, through the combination of the high vacuum charge repulsion, eventually you get from a droplet which might be a micron in size to a droplet that contains a single ion. And that's what you're detecting in the mass spectrometer. So typically with electric spray the sample is dissolved in a volatile buffer it's pumped through a capillary, apply a voltage you nebulize or aerosolize things, and then you send things through a high vacuum. You can have nano spray where you're sending tiny amounts that's used in proteomics, you can have micro spray which is used more in metabolomics. And this little is a picomole of material. But if you have salts and detergents it'll mess you up. You can switch between positive ion modes and electric spray or negative ion modes and that depends on the solvent. And often in metabolomics you'll run both because different molecules will fly or spray out differently. Every metabolomics method runs both positive ion and negative ion mode. So the question was when you use Mali does it change the analyte or the chemical composition of the sample. No, essentially you have to choose the matrix carefully. If you choose the wrong thing, yes, you get reactions and all kinds of stuff. But if you choose a relatively inert matrix, all it's doing is just absorbing and transferring energy and ionizing. So it's a bit of an art in terms of finding the right matrix and preparing things in the right way, which is why Mali isn't so widely used in metabolomics. So once you've got your ions then you send it to the mass analyzer. So the mass analyzer is where things are weighed, and we can have time of flight analyzers orbit trap analyzers quadrupole ion trap for a transform analyzers these are all things that will push ions around and calculate how long they're spinning or how long they're flying and a variety of simple equations to figure out the speed of ions or the deflection of ions is used to help calculate their true mass. So the original mass spectrometers use giant magnets is called magnetic sector analyzers, they're still around very high resolution quadrupole analyzers those are used in get GCMS have low resolution time of flight analyzers, those are very high resolution instruments. There's a relatively high throughput. There's an orbit trap analyzer this is where things are kind of spun around in circles. They also give very high resolution have relatively high mass capacity so they can be used in proteomics and metabolomics. So the expensive mass analyzers that are for a transform once or I encyclatron once again they spin around for minutes. They have the highest resolution they use giant magnets, not unlike the magnets that are used in NMR extremely expensive. They have the highest resolution so the most expensive aspects generally have the highest resolution. So FTMS point one ppm orbit traps about point five ppm magnetic sector one ppm time of flight about three ppm triple quads shouldn't be three to five ppm it's typically about 100 ppm. So that's a typo. And then linear ion traps also around 100 ppm. So they're not, these ones are not that accurate in terms of their mass accuracy for quadruples. So when you run a mass spec, whether it's GC or LCMS you're going to have things coming through your chromatogram you're going to have some time and then you're going to have signal intensity. So this case we're not measuring milli absorbance units, we're measuring the intensity. And we can have things that are called total ion current chromatograms base peak chromatograms or extracted ion chromatograms. So the axis, the y axis is, if you want intensity. The x axis is time that's the elution time. And this is sort of normally what you get TIC, not pretty. The BPC is the one that you would get if you had sort of a UV instrument, but instead of using mass spec. So first it's playing the most intense peaks, not all of the peaks. And then there's an extracted one where we've just chosen a couple of analytes. These are things that you can do mathematically from your data in a mass spectrometer you can get any one of these chromatograms. Here's a base peak chromatogram of say tomato and Arabidopsis. And then we're seeing the time axis, that's retention time because it's run through an LC, but then we're seeing on top of each of these things we're seeing the masses. So at 2.94 minutes, we have a mass of 191. There's a bunch of peaks there. There's another one that comes out at 12.67 minutes. It has a mass of 443. So instead of a two dimensional view, we could look at several of the other masses and so we have a three dimensional view of both peaks and masses. And this is what we would actually see. So we would see there's a retention time and we see these individual chromatograms these peaks that sort of the base peak but in the back going to see a whole bunch of other masses. So this is intensity retention time and m over Z value. And so you can see that the stacked plot could produce a three dimensional image. And typically for a single compound when it goes through a mass spectrometer. It's not just going to be these isotopic variants. I mean, it's also going to be some salt addicts. There's also going to be multiply charged species. They might even be some neutral loss species showing up. So it gets a little messy. So one compound, 10 peaks, 20 peaks is possible. And these are what we call mass spectrometry noise, which makes LC mass really tough. So we're trying to distinguish these noisy peaks, these salt addicts, these neutral law species, these multiply charged species, and either merge them and say okay this is just really one molecule. Or maybe there's three molecules here. So here's an example of a compound that is got a hydrogen ion. A sodium, a sodium addict, and it's increased its weight by 22 dolmens. That's the addition of sodium. You can see other examples of molecules where sodium has been attached it's attracted to negatively charged components. And so that's an addict. And on the right, which is kind of hard to see but is a list of all the possible addicts that you can see for any given molecule. Some have different charges, three charge plus one charge plus two charge. Some have a cedar night trial potassium sodium ammonium, if depending on your solvent depending on how you extracted it. These could all appear for the same molecule, all these different charges all these different piece. So that's some of the challenges that you have to deal with in analyzing mass spec data. So that's a little bit higher, not much. It's about a difference of factor two or three. So again the question is, for those who couldn't hear. So what's the difference between this accuracy it seemed that the orbit traps are supposed to have the highest resolution time of flights. It's subtle differences I think that the key thing is the main differentiation is the triple quads, which have low mass resolution, low mass accuracy, and the higher resolution ones like the top and orbit traps. So that's sort of the two two camps and within those higher resolution instruments there's subtle differences, but not too much to worry about a little behind on schedule here so I'm going to try and pick up. So what you're what you're dealing with an LC MS are these, all these extra peaks and so you're trying to consolidate your addicts consolidate the multiple charge species, or in some cases remove them if they're noise, trying to consolidate the fragments and neutral losses the breakdown progress some of the rearrangements. Likewise these isotope peaks. So this a single molecule 1020 different peaks, all showing up, and you have to try and group them or consolidate them. There's noise that shows up, you've got a very sensitive instrument, and in many cases with mass spec, you're going to have just arbitrary noise could be instrument noise it could be something contaminated your column contaminated your sample. So this is why you often run blanks sample blanks multiple times or reference blanks just to make sure that you're not measuring and zeroing in on ghost peaks or noise. So what happens if you take a raw positive iron mode spectrum of some mixture, it's not unusual for people to report 15,000 ms features. You get rid of the addicts to consolidate things might go from 15,000 down to 12,000 ones that are multiply charged you know that means that some of them have two charges or three charges or a single charge. So you're going to try and get everything down to the single charge size so you get rid of 2000 features might get some neutral loss peaks and isotope peaks from carbon 13 noise. So we go from 15,000 very quickly down to 2500 features. That's quite typical. That's how much noise is in a typical LC MS metabolomics spectrum. You have a positive mode and you go to the negative mode. You have generally fewer but it's still 10,000 features you're down to 1500 and the positive and negative modes lay overlap as well. So it's not going to be to 2500 plus 1500. The net result might be 3000 molecules. Now from these peaks if you've measured at high resolution. 2000 to 3000 peaks, you can start determining not only their mass but their molecular formulas. So there are programs that you can take that are online or you can purchase, which will do some of this work so you can take that high resolution mass value. You can take that mass value here and 525.0808 and we type that in, and this mass formula generator generates possible formulas, C2, or C24 H 15 F3 and 504 p. That's a formula. There's other ones that have sulfur, and you can sort of say I don't want to have any fluorines because that's not biological I don't want to have any sulfurs because I can't see any of that. So it'll essentially reduce you down to a couple of viable formulas and give you the sort of the resolution that you can work with an error. So if you've got a formula or a reasonably accurate mass, then you can search a database to see if you can match that formula to an existing compound. And so one of the databases we will talk about is the human metabolism database, which has a list of all known metabolites. And so now you can try and match that. The formula or the accurate mass. Now generally just a formula match or mass match isn't ideal, because you'll often get multiple hits. So if you want to reduce that multiple hit problem you can include retention time data. So something that came off earlier something that came off late, even though it has the same formula and you should be able to distinguish that because of the retention time, or something called collisional cross section area, or you can fragment that single molecule that single peak into further peaks using mass spec with tandem mass spectrometry. Or you can put an authentic standard and say I think it's this compound, I'll run it through my mass spec, and do I get exactly the same results. And again that confirms what it what the molecule is. And so electrometry is a way of taking that sort of single peak for that single molecule and fragmenting it further. We use something called collision induced association, and we separate those fragments on a second mass spec. So that's why we put MS slash MS to mass specs. And so we can take it. We can do that, that fragment or that mixture, push them through the collision cell get those fragments. And we can do that through either something called the data independent acquisition or data dependent acquisition. We can do this fragmentation through a triple quad mass spec, a q top mass spec, an orbit route mass spec. And you can do it in different modes, as I said with the DIY or DDA or product is can. These are all methods that allow you to take that single peak, break it up and get some more structural information. So in case of triple quad mass spec. As I said there's a product is again which is equivalent to the data dependent acquisition. So you have sort of this mixture of things there's three compounds that go into this mass spec. You can choose based on the paradigm mass, the bigger one say, I'm just going to choose that that particular peak and then I'm going to fragment it. And I can see all these fragments at that top side. And so I have a peak, which is the paradigm, and then I have the fragments the product bands that I measure. And that tells me based on the mass of the paradigm and the fragments of the products, likely what the compound is, I can see the square I can see the triangle I can see the red circle and say oh, based on that this is what the molecule is. I can also do something called multiple reaction monitoring, which is where I choose that same molecule at the top. But rather than measuring all the fragments. I'm just going to choose one other fragment product. So I choose a precursor product I impair. And that makes it even simpler. But it means I have to have had authentic compounds and I have to know the retention time, the mass of the original compound, and the mass of the product. And you can make tables of those, and you can also have the integrated intensity and using that information you can actually do quantitative compound identification. So the DDA is often used for untargeted MRM or PMM is used for target metabolomics. So if you're just measuring, in this case, quercetin is a polyphenol, and you run it through just a high resolution mass spec, we'll get a single peak. Usually, maybe you get a tiny peak there. But usually just a single peak that's not enough to tell you what the molecule is you could do a formula search. And I guarantee if you did the formula searching, or launch the weight search for quercetin. You'll get about half a dozen hits in the HNVV. If you fragment quercetin into a tandem mass spec or a QTA for an Orbitra, you get all these peaks, and these are all the structures that correspond to those peaks. That tells you a whole lot more about the molecule. So even though you might have found a molecule at the molecular weight of 303.04 Dalton's and had six hits. No molecule except quercetin is going to give you those fragments. So that's how you determine through tandem mass spec, what a molecule likely is. So the concept of sort of metabolite ideas you separate through a gas or liquid chromatography individual peak there might be three or four compounds. You take your MS MS spectrum for each of those things and then you take the MS MS spectrum and compare to a library of known MS MS spectrum. And from there we can see we have certain matches. The blue one matches the one at the top the red one matches something in the middle the green one matches something at the bottom. That's how we know what the structures are that's how we identify our compounds. Now there's different levels of metabolite identification that positively identified that's level one, putatively identified level two, roughly identified classes that's level three. And I don't know what it is that's level four, about 85% of the time was untargeted metabolomics, you get this unknowns. The databases that people use. You get databases, things like Mona, metlin and said cloud nest, so on. These have anywhere from 3000 to maybe 200,000 compounds in them. Not all of them are unique. If you put them all together. We have MS MS spectrum for about 85,000 different compounds. The libraries are not big enough really to do compound matches which is why we only get about 5 to 10% of compounds identified for their techniques are being developed to predict mass spectra that use machine learning. One of these is called CFM ID that we developed in our labs over the last few years and allows you to predict quite accurately what these MS spectra are, it allows you to search things to calculate MS MS spectra for hundreds of thousands of compounds and so people use this technique to identify a lot of these unknowns. I'm going to switch gears. I only have about technically five minutes left. And I need about 20 minutes to finish here. I'm going to talk about NMR spectroscopy. And this is another technique to identify compounds. It uses a strong magnetic field. We send in radio waves into a sample that's exposed to strong magnetic field and we measure the absorption of those radio waves. And that gives us a spectrum. So that's not a radioactive method. We just are measuring magnetism. And what we're doing is we're measuring absorption of light but actually radio waves to the changes in the nuclear spin orientation. Different nuclear absorb at different energies. And the energies that we're looking at are basically protons and how they spin. Some will spin in the right direction others in the counterclockwise direction that gives them a spin up or a spin down. Because protons have charge when you have a spinning charge that creates a little magnetic field. So it's like these protons have tiny magnets. So when we put these protons or molecules with protons or hydrogens in a strong magnetic field, they'll absorb this light wave or radio wave, and it'll cause some of them that were spinning down to turn up to spin up. It goes from a low energy state to a high energy state. It's a bit like having a whole bunch of different bells in a carillon. And if you ring the different bells at different frequencies, what we're doing when we send in radio waves are bringing all the bells at once. Some will have different frequencies, which they'll resonate at, just like these spins are going up and down. It's the same way that bells ring. So big bells have lower frequencies, little bells have higher frequencies. And that's what we're measuring through NMR. When you measure them all at once, it's not as if we hear each bell individually, we get kind of a cacophony of bells. And some of this is all these frequencies added together. I mean, much to us, at least our eyes, but if you do something called Fourier transforms where you convert that frequency or time dependent change to a frequency space, we get peaks. And these represent the three different frequencies and how intense those frequencies are. We go from sound to essentially peaks. And we all understand peaks better. This looks like a chromatogram. Looks like a mass spec. This is called an NMR spectrum. So an NMR will take samples, liquid samples, we won't even go through chromatography just put it straight into an NMR. And then we collect or put it under high magnetic field send out radio waves and collect spectra. NMR is giant magnets. Again, if people want to see an NMR coffee break, Mark can show you what an NMR instrument looks like 700 megahertz, the giant thermos with a superconducting magnet magnet can pick up a city bus. And in that giant magnet, you put a tiny pencil thin tube, which contains about half a million liquid, beside a radio transmitter. So that radio transmitter sits inside that giant magnet. And then we blast away and we do our Fourier transform. And this is the spectrum that we would collect. We can see some peaks or sharp peaks just like a GC MSP or GC or MSP, but they have patterns. They have things called chemical shifts they have intensities actually mean something because the integrated intensity actually tells you exactly how many hydrogen atoms there are in the molecule. The splitting pattern tells you things called couplings and the chemical shifts tell you the characteristic molecules chemical shifts are key to NMR reflects Adam specific or group specific functional group specific frequencies. There's a unique pattern fingerprint for almost every molecule in the world for chemical shifts, and they're affected by different neighboring atoms bonds or chemical groups. So people have created chemical shift tables. And you can look for aromatic groups around resonate about 8 ppm. You can have a my group saturated carbons, they resonate around one or two parts per million, those are called chemical shifts. So someone can take a table like this and the spectrum that they measure and actually figure out the structure of molecules quite accurately by animal. You can't do that with mass spec or not, not easily. So here's an NMR spectrum of a compound that's bromo ethane, and you can see the peaks, the be peaks, that's a CH3 has a chemical shift around 1.8 ppm, and it's split into three peaks. That's a coupling, because it's neighboring a CH2, the AP which is around 3.6 ppm. That's a CH2, it's split into four peaks, because it's adjacent to a CH3 group. So these are consequences of couplings. And to get into the details, if you've never seen NMR spectra, probably take a day or more to explain. But these are all related to the structure of the molecule, the intensity of the peaks, the number of hydrogens, the splitting pattern. All is sufficient for someone with a bit of background in NMR to determine the exact structure of a molecule. Now if you've got a pure molecule, it's easy. If you've got a mixture, it's a pain. And these are some examples of mixtures where you've got all kinds of NMR peaks overlapping with different intensities. Now different techniques in metabolomics, these different approaches, some are less sensitive like NMR, which are likely to detect maybe 50 to 100 compounds. Highly sensitive methods like mass spec can detect thousands of compounds, but a lot of those compounds, we just don't know what they are, whereas in NMR, we know almost every single compound. So there's a difference. There's, you know, if you want sensitivity, those things are unknown. If you don't want sensitivity, most of the things are known. Sometimes things are exciting if it's an unknown, some things are informative if it's in the knowns. So consider about the technologies, what works, what's useful, what informs. And this is, I think, a really useful graph. Now in metabolomics, there's two roots. There's targeted metabolomics and untargeted metabolomics. Today we're going to focus mostly on targeted, but I'm supposed to give you a bit of a background on an untargeted so you know what we're going to do tomorrow. So targeted metabolomics measures between 10 and 1500 pre-selected molecules. It's not designed necessarily for discovery, because you're already knowing what you're measuring, but you're focused on quantification. That's key. And because it's focused on a pre-selected set, you can automate it and we'll show you how it's automatable, and it can be very standardized. Untargeted, we're measuring thousands of peaks, like 15,000 peaks that I talked about. You can discover some pretty new compounds, but that's not very common. You can't do quantification with untargeted methods. It's not automatable, and it's not standardized. That's a problem. I'll explain that more later. So in targeted metabolomics, you take your samples, you run it through your spectrometer, whatever it is, and you identify peaks. Then you run it through Metabol analyst to data reduction, and then you get some interpretation. So samples to compounds to data analysis. Untargeted, you take your samples, collect all your spectra, and often you'll do some follow-on data reduction to sort of say, which peaks should I get rid of, which peaks should I keep. And it's only after you've done your data reduction that then you do your metabolite identification, because you're trying to get rid of all these peaks, and you just want to focus on 10 or 20 important peaks. So it's a different process. And so typically in untargeted metabolomics, you're not trying to identify all the peaks. And typically in untargeted metabolomics, you only end up with 10 or 20 positively identified peaks. And typically in untargeted metabolomics, you separate, you run over multiple days, hundreds of samples. You find the LC is not very reproducible. This thing called batch variations, changes in retention time, changes in intensity. A mass spectrometer also have issues. So how do you compare? How do you scale? So this is what you might see for different runs, different batches, you stack them all together, and you kind of see this kind of mess. If everything was perfect, you would get this thing at the bottom, fully reproducible intensities, fully reproducible retention times. So that's where a lot of the processing is done. So there's various methods to do chromatographic alignment where you can get all of those peaks so they're all matching up together. And then you also have to do some scaling because sometimes as we saw here, things are high one day and day two they're all low and day three they're higher again. So you have to do scaling methods to make sure the intensities are consistent. And as I said, this is what you'd get. This is retention time data. That's sort of the base peak chromatogram, but this is showing retention time and the mass to charge and the three dimensional intensities. Some cases these things are real peaks. Some of them have to be merged. If you haven't done your alignment, you get this total mess where you think there's, you know, thousands more peaks than there are and that's because of the problems with your retention time and your scaling. So you have to make sure that that's all cleaned up. You do your mass, you do your alignment, do your batch correction, you reduce it down into a smaller step from there you do reduce it even further, like getting rid of the adducts getting rid of the neutral losses getting rid of the other defects and isotope. And from this to this you end up with that. So from 75 peaks to 20 picks, you end up with nine peaks, or from 15,000 feet peaks to 2000. That's the general process of untargeted LCMS data processing. And you have to do peak matching across samples retention time drift matching realignment, and you do this iteratively. For example, you group the things by the retention time and intensity their mass to charge. You can look at this a little more closely during the break, but you change adjust change adjust and eventually you get a final table. So from those three samples in fact they were five different peaks, five different retention times, but each of those was found in different samples. Like the 102 point or 126.79 one one and another one at 126.7905 different molecule even though the mass is the same, but a totally different retention time. So that's how you know that they're different. So the feature simplification process is the biggest challenge in untargeted metabolomics and there's different tools like XCMS, you guys will learn metabolism analyst are or MS style, or MZ mind three these are all the tools that are used and targeted metabolomics to do all that feature reduction to simplify things. And even from those peaks then you simplify even further to get down to 20 compounds, because you're only interested in the ones that are different, as opposed to all the peaks you can identify. So whether it's targeting metabolomics or untargeting metabolomics, the whole idea is to go from spectra to lists of compounds. Some of those lists maybe 10, some of them maybe 100, some of them can get to 1400 compounds from those lists. And that's what we want to go and go to biology. So we're going to learn today about how to go from spectra to lists. And a little bit today and mostly tomorrow we're going to learn about how to go from lists to biology. So now we can go on our coffee break.