 So what we're going to dive into now is metabolite identification and quantification. This is also called metabolite annotation, and it's actually probably the biggest bottleneck in metabolomics, at least as it refers to from the informatic side. So gain every slide set begins with our learning objectives. So we're going to look at the three main technologies. We're going to look at NMR, GCMS, and LCMS. And in all cases, we're going to look at what I call deconvolution, spectral deconvolution. So you have this complex collection of peaks, some of which are overlapping. They're separated by time or retention time, retention index, or they're separated by chemical shift. And we're going to show you the principles of how these deconvolution steps are done, how individual compounds are identified and then quantified for each of those technologies. And then we're going to close off about some of the different types of mass spectral database searches and some of the types of mass spec databases, and also NMR databases as well, I think. So you've seen this slide before. And so this is the concept of metabolite annotation. So it's a bunch of spectra or single spectra. And what you're trying to do is put names to every peak, compound names to every peak, and intensities or concentrations. So in this case, it's a list of concentrations as ultimately we would like. And it can be done either through targeted or untargeted metabolomics. This is still the final objective. That's the point. Now when you compare metabolomics to some of the other fields like genomics or proteomics, historically with genomics we have things like databases like GenBank, tools like BLAST, which allowed us to take sequence data and instantly identify which gene we're working with or which related set of genes we're working with. With transcriptomics, we could also go up and look at databases as well, which would convert the intensities then the marker or identifiers that tell us which gene we're talking about. So there were online tools for that. In the field of proteomics, there's programs, tools, web servers like mascot, which allows you to take lists of peaks, enter them, press go, and you're going to get information about which peptides, which proteins, and then again, based on their intensities, get some indication of their relative concentrations. Historically with metabolomics, you know, you can come up with your chromatogram or your NMR spectrum or your mass spectrum, and there really wasn't a place to go. People still today even manually go through peaks, peak sets, and match and fit and look up handmade tables to identify things. So this historically was a problem, but I think we're going to try and show you today, and some of you obviously know as well. There are now tools, resources, databases, web servers that allow you to do this, but this historically was the problem with metabolomics. Now the other part to metabolomics is something that I've hinted at before, which is this difference between the known compounds and the unknown compounds. And the term has evolved over the years to something we call known unknowns and unknown unknowns, and this is from a quote from Donald Romsfeld, I think in 2001 when he was talking about Afghanistan and al-Qaeda, and people were asking, how do you deal with all this stuff? And he said, well, there are known unknowns, so let's say we know that there are some things that we do not know, but then there are also unknown unknowns, the ones that we don't know that we don't know. And so this represents essentially the compounds, the dark matter of mass spectrometry and metabolomics. There are lots of things we just don't know because we just don't know them. So in the case of metabolite identification, we might have a spectrum, and it might be from blood or it might be from urine, it might be from some other well-characterized biofluid. And for the vast majority of the peaks that we're going to look at, or maybe not, but at least with say NMR or GCMS, those peaks are in some existing database. Those compounds are known. It's just for us at the time, we don't know what they are, and our challenge is to identify them. But then as we go up to LCMS, whether it's 30,000, 20,000 features, we will only know a fraction about anywhere from 2% to 5%. The other 95% to 98% are the unknown unknowns. They aren't in any database. People still haven't characterized them. There's no structure. It's a complete guess. It's the dark matter. It's the unknown unknowns. So for these unknown unknowns, truly novel compounds, you have to use technique called computer-aided structure elucidation or case. Has anyone heard of that before? For the known unknowns, what we're going to be doing is essentially chemical libraries and spectral libraries, and we're going to use spectral deconvolution. And so that's what we're going to focus on today. It's looking at the known unknowns, characterizing things that there's some information about them already. We just have to map them. So spectral deconvolution is the technique that we're going to use. It can be done for NMR, it can be done for GCMS, LCMS, and MSMS data. It's, to some extent, part of what we use in targeted metabolomics, although it's also what's done at the last step of untargeted metabolomics. It's matching peaks from your spectra, your chromatogram, to a set of known peaks from pure compounds in a pre-compiled database. So that means someone has had to spend a lot of time and money buying all these compounds, collecting their NMR spectra, their GCMS spectra, their retention times their retention. This is the LCMS spectra, the MSMS spectra, on a whole bunch of different instruments. For NMR, it might be at 400, 500, 600, 700, 800, 900, 1 GHz. For MS, it might be on a QTOF, a QTRAP, an IONTRAP, an FTICR, an ORBITRAP. All of those have to be collected at different collection energies, at 10 and 20 and 40 electron volts. It's a huge amount of work. And there's only a very small number of groups actually who've been doing it, and to some extent now it's more become a collective enterprise. There are also companies that do it, and they sell these libraries at fairly hefty prices. So we're going to look at NMR first, and as I said or learned, most of you haven't done NMR, but this is the concept, and it's relatively simple. Remember in metabolomics we're dealing with mixture. So the top spectrum you're seeing here is the mixture one, that's in blue. And you can see, I don't know, a dozen different peaks. There's some big ones and some little ones, there's doublets and singlets and triplets. And we know that there has to be a bunch of compounds in there. Now in red and green and purple, what we have are the pure compound spectra. There's compound A, I don't know, what's methionine. Compound B, leucine, and compound C, adenosine or something like that. Those are the different compounds, and we have their pure spectra. Now if you kind of do the thought experiment, if you add these spectra together, the purple, the green, the red, you can see that they will sum up to produce the blue. You can see there's the red singlet on the far right or your far right. And there's the purple triplet that's added. And then there's the two doublets from compound A and B, the green and the red, and they add up to produce a giant doublet. And then there's the red triplet. So these things all add up together. So we can see how taking the pure spectra of three different compounds add up to produce the mixture. What we have to do in deconvolution is the reverse problem. We have to take the mixture and then decide which three out of maybe 100 or 300 possible compounds would make up or constitute this mixture. So there's the forward problem, the reverse problem. The forward one is easy, the reverse is harder. So there are tools where people have collected these single spectral references and have compiled them and have created interactive software. So one is a software company called Kinomics. And it's quite widely used and you can get your spectrum of a mixture, say from urine or blood or something like that. And it will allow you to interactively fit individual pure reference spectra to your mixture. And it will produce a list of the compounds and their concentrations. So the method itself is fairly manual intensive, but it's used by hundreds of people around the world and it's been used for more than a decade. You will take your NMR spectrum, you'll transform it, you'll phase it, you'll remove the water signal, you correct the baseline, you'll reference chemical shifts, you'll normalize the peak shifts. So all of this has to be done manually. And then you fit the spectrum and it's done through guess and check. So you look at a peak and say, hmm, I think that looks like alanine. So you click on alanine and it pops up the alanine spectrum and you shift it around and say, oh, that was wrong. It must have been betaine. And you pop up betaine and you shift around. Oh, it looks like it's a nice fit. Good. So you have to have an idea of what sort of compounds are there. You have to have been trained a fair bit. And it takes about half an hour to an hour for a person to fit a spectrum. And if you've done it a fair bit, people are actually quite good at this. But it takes training. And years ago, this used to be part of our regular course. We would have people take the afternoon and they would try and fit spectra. But we found it was taking too long. And we obviously found that a lot of people made lots of mistakes because they didn't have the training. So since then, we've tried to develop other methods to make it faster and easier. And other companies and groups have also done the same thing. So Brooker has a manual method to help with spectral deconvolution. And then Brooker has developed NMR hardware with special software that will fit the spectra of juices. So you can collect NMR juice and you can collect NMR of wine or beer. And these will automatically identify the compounds, match the peaks, do all the spectral deconvolution. However, to get the juice screener and wine screener, you have to have half a million dollars in your back pocket. So most people don't have that. So are there some free alternatives? Yes, there are. Imperial College, which is one of the pioneers of metabolomics during the Mickelson's group, and members of that have developed a program called Batman, which does spectral deconvolution. And then we've developed one in Edmonton called Bazel. And this also does automatic deconvolution. So with NMR, it is now possible to automate it. And automation is now about 30 to 60 times faster. And tests have been done on the precision recall, and it's very, very consistent. That is 95%. Basically, it means that you can upload lots and lots of spectra, click go and go home. And the next morning, all your results are there, or you can take a long coffee break, and your results are there. Now when you have a computer doing it, it's consistent and entirely reproducible. But if there's some bias, if you've collected the spectra incorrectly, or if you've added things, yes? Does it concern the system itself? The more you add, the more you confirm, or it's fixed? It's fixed. There was a learning period when it was originally developed. But yes, now it's fixed. So the advantages of using computers is that they can work forever and never get tired. But if there's some bias, they can't pick that up. And so they'll reproducibly generate the wrong result, which can be problems. But it's also a case where, because it doesn't get tired and because it's able to look for signals that are often ignored by humans, either deliberately or just subtly, it's able to do things that sometimes even humans can't do. So Batman is now a project, and I think there's one on GitHub or Sourceforge, I think. So you can actually download the program and give it a whirl. It's very slow. So it's not 30 to 60 times faster than humans. But it doesn't get tired, which is an advantage. Bazel is fast. And it's been converted to web server. And it does use machine learning or it used machine learning to get accurate. It uses things that are called hidden Markov models or probabilistic graphical models to help with the deconvolution. It does essentially what economic software does, except automatically. So it does sort of guess and check performing a fitting and pattern finding. In order for it to work, you have to, or it needs to know what the biofluid is. So if you don't tell it that this is blood or cerebral spinal fluid, it gets completely lost. And it's sort of like taking a person and blindfolding them and say, go find your way home. So it needs some knowledge. Question? No. There's enough similarity between the serum or plasma of sheep and cows and mice and rats. So no, it won't do that. So right now it's specific to a certain set of biofluids. So it's cerebral spinal fluid to plasma. And I think, I don't know, is it saliva? I can't remember. And some work's being done to expand the libraries to other types of situations. And it somewhat depends on how complex the fluid is. So it doesn't work for urine. It fails completely because it's just too complex and the HMMs just die. Now in terms of the other thing that does, and that was really critical, was that because manipulating it on our spectra is something of an art. If we have 30 people in this room and we said, okay, all of you try and phase, reference, remove water or baseline correct, there's going to be 30 different spectra. And they're all going to look subtly different. Some will look very, very similar, but some will look really different. And then if you try fitting those spectra, you get a different answer. So by automating all of these steps, 30 people are going to get 30 identical spectra. So that means the fit will also be identical for everyone. So this is how complicated it can get. So you can see this is a top example. There are spectrum of 90 compounds. The bottom one is 150 different compounds. There are peaks on top of peaks on top of peaks. And you can see how things are deconvoluted. So well-trained humans can do this by hand, but most of you wouldn't want to get that training. And so this is the advantage of actually having a computer doing that sort of spectral deconvolution. Here's an example of cerebral spinal fluid that was done by a manual method. The red is the fitted spectrum done using kinomics. And there's the black spectrum. It's actually the actual spectrum. And then below that is one that's being fit with basil. And the red basically matches with the black. And the difference is one took 45 minutes. The other one took between three and five minutes. Now today we're going to give basil a try. And right now our TA and Jeff are madly working at communicating to Edmonton to make sure this one will work. So there's two versions. There's a public version of basil, which allows, has a different website. So it's just basil.ca. And then there's the private version, which is called teamic.basal.ca. And the teamic.basal.ca will require a login, which we'll show you guys later. But so you guys don't go to what I'm going to ask everyone to make sure they close their lids on their computers, because I think you need to concentrate on this stuff now. So we'll fill you in on this, but it's the whole idea is to make it trivially simple. There's going to be some data that you're going to be able to, just once you've logged in, you can download it. You can also use the public version and use some of the examples and all it is just point click wait and then start viewing the results. So that's metabolomics made easy. You don't have to learn any special skills, you just have to be able to use your mouse. And what this is, is a form of targeted metabolomics that's fully automatic. And this is ideally what you'd like to be able to do with everything in metabolomics, where the software is written, so it has the libraries, where you can get essentially complete coverage, where you can get every compound identified, everything quantified, everything annotated. And unfortunately we still have a long ways to go with other fields or other aspects of metabolomics. But this is an example of where it needs to go. So if you've uploaded things, if you've started it, what you'll have is first thing, it'll get the spectrum, it'll show the spectrum for you. This is what an initial spectrum might look like. This is highly unfazed. So after it's done the Fourier transform, it'll then do the phasing, so that all the peaks are pointing up in the right direction. And then it'll start doing the water removal, the baseline correction, the chemical shift referencing, the peak convolution. And so after about 30 seconds, it goes from what was at the top, which is unusable to what's at the bottom, which is usable. And it'll do this exactly the same every single time. So it's done all of this automatically. Then for the next two to three minutes, it's going to take that and do the spectral deconvolution. So this is the spectral deconvolution. You can see a faint blue against the black. The blue is the fitted spectrum, the black is the actual spectrum. And so everything now is matched. So cool, you've got a spectral fit, but really the more useful point here is this. This is your annotated spectrum. Every compound in the spectrum has been identified, and the absolute concentration has been generated. So that's automatic metabolomics. And as I said, this is where you want to go. This is where we should be going in the field. And the nice thing about NMR is because it's not a particularly sensitive technique, you can do this. What's that? I can't. Oh, there's a confidence score. So the confidence score is an indication of essentially based on how many peaks are being fit and how complex the spectrum is. So acetate or acetic acid is usually characterized by a single peak, but if it's well isolated, then the confidence score goes up. If there are other peaks nearby that are partly overlapping it, then the confidence score will go down. If there's other compounds with lots of peaks, like glucose, which has 20 or 30 peaks, then that also increases the confidence score. That's right. So earlier we had the question, does it do bacterial culture? So no, it's configured right now to do serum, plasma, cerebral spinal fluid. There's some modifications that are being done to do things like fecal water and saliva. As I said before, it doesn't work for urine. Right now it has about 100, a little over 110, I think. So in those cases it won't identify, and that's essentially why it's restricted to these things. So we know everything, at least from NMR, that's in serum and plasma and CSF. Everything has been characterized that NMR can detect. So in that case it's a solved problem. And that's again because with NMR we can only detect things at 5 micromolar. And even for some of the inborn errors and metabolism, which you'll get some bizarre compounds, rare compounds, those again are known and they're detectable. Occasionally if someone has overdosed on some bizarre drug, you won't have any idea, but that's really not what this was intended for. What we're trying to do, and I think this is important, people don't miss the point, quantitative or targeted metabolomics is not about identifying new compounds. It's about identifying the changes in concentration. And there's a tremendous amount of information about understanding those differences in concentrations. Metabolically everyone in this room is pretty much the same. Everyone's going to have alanine in their blood, everyone's going to have methionine, everyone's going to have ATP. I mean, otherwise you're an alien. But what distinguishes us is just how there's going to be different concentrations of those. Some are going to be having higher levels and some are going to be having lower levels and some of that reflects your health, some of that reflects your age, your gender, a bunch of other things. And so if you just think or you can do the computation if there's something as high, normal, or below normal, that's three states, just from concentrations. So if you have 100 different compounds that you're measuring, you can measure three to the 100 different states. That's a big number. And that's essentially, I think, the central power of dealing with quantitative metabolomics. You can distinguish, come up with billions, trillions of different patterns and therefore potentially identify or distinguish billions or trillions of different states or conditions. In case of human metabolized database, you can find the concentration, biological concentration of any metabolized. But over there you can see they put number of range and they are not in the same range. They are just scattered all around them. How can we understand what is the normal range? Yeah, part of it is, I think, a challenge with people, different groups, different times having collected with different technologies. More and more there are standard ranges and at least with the human metabolome database we've tried to get reference ranges for about 300 to 500 commonly measured metabolites. And some of this has been gathered through text also from our own studies in TMIC. We won't have uniformly agreeing ranges for every metabolite from every study. It's just some of it's because people have made mistakes, it's because the different populations which weren't clearly distinguished. There are distinctions for age and gender and we've tried to identify those. So you'll see that in HMDB. There are also cases for different diseases and those ranges are also given. It's a lot of hard work and say it's something that we're constantly refining. So there are limits. What you're going to hopefully use today is multiple spectra. That's what the TMIC basal one does. The public one only does a single spectrum at a time. So upload a spectrum, wait a few minutes and upload another one, wait a few minutes and so on. And lots of people do that. But the one for batch one is hopefully the one that we'll try today and hopefully you can get a flavor for that. Okay, so that's NMR. Question? For NMR? Can we go back? Yeah. So when you prepare an NMR sample, you have to add a reference. So DSS is the reference compound that you have to add. You have to buffer it, so it's pH 7 because sometimes people put things that are very acidic or very basic. And so again, this is buffered and set for a standard pH. And then you also have to add a phasing compound which has a chemical shift up around 10 or 11 parts per million. What we found is that, you know, this is the recipe that people were required to do it. No one read the recipe. So people were throwing all kinds of stuff in. They didn't pH. They put the wrong reference compound. They were collecting at 700 megahertz, even though we said collect at 600. And then they said it didn't work. Well, it's not going to. So... That's right. Yeah, it's not how you're collecting your blood or how you're collecting the CSF. So yeah, it's more of a sample, post-sample prep if you want to call it that. Okay. Yes. I have a quick question about data compatibility. Do all of the NMR machines output data in a similar format, or are they all... Essentially, they output it all in the same way, yes. But just, there are different frequencies that NMR machines work. And so in those cases, there's higher and higher and very high resolution. So the spectrum will look different. And so it has to know which frequency you're working on. In terms of just like format, any NMR format that comes off your chain is compatible. That's right, yeah. Yeah. Anyways, the reason why we're making a deal out of NMR, do you have a question? That's the public version. The, if you want to call it private one, the TMIC one, it's 500, 600, 700, I think 800 is on the way, and it does a little bit more. It takes a long time to build out the reference libraries. I mean, you have to have a lab that has a 500 and a 600 and a 700 and an 800, and that's like $6 million worth of equipment. So that's expensive to build out. The other point I was going to make is that the reason why we're talking about NMR, even though most of you don't do it, is because we wanted to use it as an exemplar, because this is, I would hope, maybe three years, maybe five years from now, this is how you could do LCMS. Go to web server, just upload your spectrum and walk away and out comes this annotated list. It's not just 200, it's several thousand. You guys are going to have an exam, we're going to look at untargeted metabolomics, and some of you here last night is we were installing XCMS and other things. It is not point and click. It's a lot of scripting, coding, waiting, reconvoluting, going back again, checking. So it's very, very manually intensive. But I think as a field matures, things become increasingly automated. And in case NMR-based metabolomics is mature, it is the most mature form of metabolomics right now. Okay, so GCMS. This is the next most mature version. And in GCMS we have an ion chromatogram, TIC. We'll have peaks. Any individual peak might have two, three, four other compounds or it could be a pure compound. Within those peaks you're going to have very characteristic spectra, electron impact or electron ionization spectra. And whether it's the blue or the red or the green, you're going to compare those EIMS spectra to what's in a library, a spectral library. And the spectral library may have a hundred, a thousand, ten thousand. And what you're going to try and do is match those spectral that you collected with the spectral library. So in the case of GCMS, use electron impact ionization. An example with methanol, here's the characteristic fingerprint. This is a methanol reference spectrum. And because these are highly reproducible, there's a standard electron ionization energy. It's 70 electron volts. Everything in GCMS is very standard. It means it's possible to compare, not unlike with NMR, which is also very standardized. With the GCMS spectra you'll see these fragments. Occasionally you'll see the molecular ion and occasionally you'll also sometimes see adducts. But in most cases it's just the fragments and not the molecular ion. In many cases we're working with analytes that need to be derivotized. So you have TMS, tetramethylsilane, TBDMS, in some cases you'll have methoxine derivotization. These will modify alcohols or aldehydes or amines. And that result is that you are not looking at the original compound in GCMS, you're looking at a derivotized one. And the derivotized compound could have one TMS, two TMSs, three TMSs, four TMSs. So one compound could have six signals with six different retention indices, six different mass spectra, six different masses. TBDMS derives the same thing. What's nice though is that you can look at sort of these mass increments because it's a known derivative. So you're just going to look to see if it's 72 Dalton's or 144 Dalton's higher than what you would expect for the pure compound, the underivotized compound. So GCMS is good for working with amino acids, really good for organic acids, pretty good for sugars, really good for short chain fatty acids, and basically molecules with a molecular weight of less than maybe 500 or 600 Dalton's. I mentioned before that GC, it's very high resolution, very high plate counts. It's EIMS is very standardized. So the EI spectra are very comparable from any reference source around the world. Most people use a combination called AMDIS, which we'll talk about later, and then the National Institute for Standards or NIST database. So I think NIST 17 I think is coming out, but the NIST 14 MS database is pretty widely used and this is huge. So this is hundreds of thousands of EI spectra for about 240,000 compounds. Now that's a bit of an exaggeration because many compounds have TMS and TMDBS derivatives. So they're basically, it's the same compound, but it's just been derivatized many, many different times. So the number of truly unique compounds is about 20, 25,000 I think. They also have eintrap MS and they also have QTOF and triple quad MS data, which is also pretty cool except they only collect the more produced sort of single, TMS spectra. They produce consensus spectra rather than individual energies. They have lots of retention indices for about 80,000 compounds. So it's a vast resource. There's some things that aren't great about it, but it's still pretty amazing and it's quite affordable. There's software that allows you to search through a database and to do some matches. And then there's another piece of software that you get with NIST, which is called Automated Mass Spectral Deconvolution and Identification System or AMDIS. And this has been around for years, like decades. And it basically does peak identification. It looks for background noise. It identifies peaks relative to the noise. It generates a model spectrum and then it does this spectral deconvolution. And it uses the NIST library. So it's, as I say, this has been around for, I don't know, 30 years or more. Conceptually it's not unlike the basal except, you know, the spectra are a little simpler in some respects because you've got this separation. Whereas in NMR you don't separate. There's no liquid chromatography. With GC you've already done separation. So you've got, you know, somewhat simpler spectra to work with. To match your spectra, basal uses something similar to a match factor to assess it, but, you know, for GCMS they call it match factor. And it's measuring the query to the database. And it's basically saying if I've got five peaks here and five peaks here, I'm going to slide it over and see how well they match. And this is essentially, the matching is done through a dot product if you've ever heard of linear algebra and you do dotting two vectors together and normalizing by their intensities. And they multiply the match factor by a thousand and they scale things. So basically if a match factor is about 600, 650 and above, up to a maximum of a thousand, that's a good hit. And so you can see these match factor scores and that's sort of what you look for, say, yeah, I've got something. Now you're never going to get a perfect match, you're never going to get a thousand, even for perfect compounds, even if you put in the pure compound because there's just variations on the instrument, but you could get 990 or something like that. And that's, again, a very good indication that these are identical compounds. If you're doing GCMS, you do have to do some calibration and normalization, just like with NMR. NMR we had to add DSS, NMR we had to do phasing. In GCMS, before you start off, you have to have a set of alkane standards, about 8 or 9 of them starting from octane to C16 or C18, hexadecane. And these are a calibration standard. Also when you do GCMS, you need to have a blank sample. So basically putting in your solvent and usually the derivatization agents, the TMS or the TDVMS or the methoxene, whatever you've added, because this is sort of contaminant stuff that's going to show up when you have your authentic sample. So you have your calibration standard, your blank sample, and then your sample or samples of interest. This is your blood, your urine, whatever you're wanting to do. And after you run under the same elution condition, same temperature gradient as what you ran your blank. So this is what you collect with your external standards. There's, I don't know, 8, 9, 10 peaks there. This alkane mixture allows you to calculate your retention indices. So remember you had retention time. Now you convert it to retention index. That normalizes everything. So that's your calibration file, C-A-L. And that will recalculate the retention times with the alkanes. And then you're going to search your NIST database for those matches, both with retention indices. And when you have this blank spectrum, there's also going to be a way of getting rid of any of these extra peaks, the TMS peaks, the methoxene peaks. Yes. Does it normalize to, like, change the temperature in the oven because your retention index is going to change? Yeah. So you want to use the same protocol, both when you run the alkanes as when you run the blanks, when you run the actual sample. And so if you have a programmed heating ramp that, you know, goes up in levels and it goes up like that, that heating program has to be identical for all three, including your alkane standards. And then with that, then, yes, it'll calibrate or normalize properly. So with the AMDIS software, here we are creating the calibration file. So AMDIS allows you to do that. Here's your loaded spectrum. Then you normalize using, again, a pull-down menu. We're not going to run AMDIS today, but this is sort of just for your information. So after AMDIS has done that calibration, then you can start doing your database searches. So you'll move your window over a certain peak. Here we've moved over one that's about, I don't know, 11 or 12 minutes. And under that peak, there's, that's the chromatogram peak, we're going to see some masses. There's one that's, you know, it's hard for me to see here. Look here, 73 and 172, and I don't know, I mean, it's 140. Those are marked with red, yellow, and blue, and these correspond to the masses from the EIMS spectrum. So you're going to have your GC peak list and you're going to have your EIMS spectrum for each of these peaks. You can zoom in a little more and then you can actually try and see how well this particular peak, we've zoomed in a little bit more, how this peak matches to the NIST spectrum or the NIST database. And in this case, we have a match factor of 840 or 84% as they've calibrated. And we've now got a match of 73 and 144. The red and the blue are the two best matches. And they match to the reference spectrum, which is shown below this one. And you can see it's almost identical. So in this case, this peak, single compound, is valine. So we could have also checked with the retention index and we would have also found the retention index for valine. It would have also been very, very close. So we would have had two or three pieces of information, mass spec with the 73 and the 144, retention index, all saying this is valine. So in this regard, we've been able to identify, and if we measure the area under the curve, we could partially quantify how much is there. So Amdus and the NIST database allows you to manually go through GCMS spectrum and identify compounds. So if you started this morning with one spectrum, sometime this afternoon you would finish annotating that one. And if you had a thousand spectra, that's about a thousand days of work. So there are alternatives. I'm going to introduce you to GSoAutoFit, so that's something that you could just automate it. There are other tools outside of Amdus, Analyzer Pro, Chromatof. These were compared about ten years ago. There are different databases. The NIST 14 is the latest one. There's NIST 11, NIST 8. They come out every three years, so NIST 17 should be coming up. There's the GOM database, and then Oliver Fiend's group has created databases by Leco and Agilent. The GOM database is an open access database. Different purposes, GOM is focused primarily on plants. The Fiend database is more varied. It uses a different calibration standard than what most people use. So as I said, just like with Bazel, we wanted to be able to teach people how to do GC, and so we developed this tool to sort of automate it. So it's called GC-AutoFit. It needs three spectra. It needs your blank. It needs your L-Cane standards, and then it needs your sample spectrum. It does auto-spectro alignment. It does peak identification, peak picking, peak integration. It takes most of the normal types of files. It's pretty fast, and it can measure up to about 100 different compounds in a sample. It's best with analyzing urine. So unlike NMR, which chokes on urine, this one does really well with urine. It does okay with blood, and saliva, and CSL. But if you don't follow the protocol with the derivatization and other things, then it won't work. And this is something that I think we really have to emphasize, is that when you standardize the protocols, things really work. If you follow the protocol, it doesn't work. It's like following a recipe. If you leave out the egg and the sugar, the cake won't look or taste very good. So this is the website. And these are the types of files that you can upload. And we'll go through this in the lab anyways. So it's probably just easier to read it. There's different freeware tools to do conversion if you happen to have something that's not compatible with what GC AutoFit has. It's pretty simple point and click. You can have zip files, or you can upload individual files. So you upload your spectra. Again, just like with Bazel, it needs to know whether it's looking at blood or urine, or whatever biofeed, because it has its own spectral library. You can also upload your own internal library if you have prepared it in the standard way. You also need an internal standard to help with the quantification. So that internal standard, whether it's cholesterol or succinate, has to be added to the sample. And you can choose which biofluid. I think you guys are going to be doing urine this time. Is that correct, Jeff? Yeah, it's urine. So select the fluid. Select the library. It'll kick out a couple of sample spectra just to make sure that you haven't uploaded something that's empty. So you can see that there's a file there, and you can see that things the LKN standard looks decent. And then it'll run through and annotate your spectra. So it's not going to do the fitting that Bazel does. It'll write over top of each of the peaks what the sample is. And similar to Bazel, it'll indicate the name of the compound. It'll give you a concentration. I think it gives you a match factor score. You can see there's 780, 760. It'll identify the ions that were picked out as being the key ones to help with the identification. So it looks very similar in terms of output as Bazel. In this case it's your annotated list. Now that's the web view. You can also download the results in a CSV file. A text file. And then you can also just view the spectra. So that's GCMS or GC auto-fit. So that's another example of automation. It's a little closer to LCMS. But in this case it's defined. So it doesn't work for bacterial extracts because the library hasn't been built for that. And again it is critical on following the protocol. And the protocols, both for Bazel and for GC auto-fit, were designed to be consensus protocols. Ones that just about everyone uses. So it's not very specific for very rare reagents or really strange processes. That's a quick question. Can you talk a bit more in Jersey on one of the previous slides in that about 60% would that be similar? Yeah, whether it's in terms of a match factor about 600 or 60% is sort of a minimum that people will say is worth it. And it's a fuzzy cutoff. It's not a hard one. If you've got a great retention index match and only 700 with a spectral match then probably more confident. We've got a lousy retention index and even at 700 then it's probably not the compound. So there's pieces of evidence which are sort of highlighted here in this slide which is the metabolomic standard initiative or MSI for metabolite identification. There's I guess it's level one, level two, level level three, level four. Level one is things where you've actually positively identified in fact you have the authentic compound and you put it in and you get exactly the same spectrum exactly the same retention time or index or same NMR spectrum and you say this is it. Most people unfortunately don't have chemical libraries and most people never do this. Which is a real problem because this is in fact technically the only way you can report compound identities. The second most one, level two is what is formally called a putatively identified compound. So that's when you get the mass spectrum EIMS, MSMS and the retention time or the retention index. You're getting a match factor of 800 you're getting a retention index match or retention time match or you've done a basal fit and all the spectra looks perfect, it's a perfect match with many, many spectra. That's considered a level two although in NMR, I think many people are saying that would actually be a level one. Then there's level three which is essentially saying I've got a lipid and it's a PC32-6. It's a class phosphatylcholine 32-6 is also a class because it's saying there's a either a 14 and 18 or a 16 and a 16 chain and you don't know whether it's an SN1 or SN2 or SN3 so that's only saying I've got an approximate identity and then the vast majority are the unknown compounds. Say I've got a peak, I've got a retention index and I've got a mass, I've got a chemical shift a bunch of chemical shifts but I don't know what it is. That's still an annotation and it still falls up to level four but obviously you haven't identified anything. So this was originally designed for MS as I say it's not really compatible with NMR because if you have a perfect match with intensities and chemical shifts by NMR that is positively identified. And that's as long been used as the gold standard for chemical characterization. So LCMS the method for identification is basically very very similar or identical if you want. You have your chromatogram you have your SPET if you have MSMS spectra then you're matching to MSMS. Many people unfortunately only match to the parent ion mass and say I've got something at 276.167 and this is my compound that's wrong and basically should be avoided. Unfortunately 90% of people doing metabolomics do it this way and we want to try and get you away from doing that. You want to be able to do spectral matching with MSMS if possible or MSMS in retention time. Identifying compound purely by its market weight or purely by its chemical formula is extremely risky and technically falls more into the level 3 standard identification. So there are a variety of tools for doing MSMS as commercial tools of all the major manufacturers from Agilent, Brooker, Thermo, Waters etc. PsiX Then there's a bunch of free options XCMS XCMS online MZMine several others. We're going to be looking at XCMS and Anama has also told you guys to set up an XCMS online account as well. I will explain XCMS a little bit more also during the lab that will follow after lunch. So the reason why we're doing XCMS because it was basically the first open source open access tool for MS spectral processing and metabolomics it does peak picking it does peak matching, it does retention time alignment which are all critical you can get it as a server that's XCMS online or as a standalone which I think is installed in most of your programs there's many different formats and it is linked to a database called METLIN which is widely used so with liquid chromatography mass spectrometry what you typically do is you collect many, this is primarily untargeted mass spectrometry so it's different than Bazel, it's different than GCR this is where you're collecting lots of spectra for lots of samples and with LCMS there's always tremendous amount of drifting how the LC performs so something that had a retention time of 28 minutes at the beginning of the day will have a retention time of 32 minutes at the end of the day and between that it may have gone higher or lower so it's moving all over so what you have to do is not only with retention time you have to then identify the peaks aligning things so this is where the extracted ion chromatograms are selected there's some non-linear alignment methods so once you pick those out and align them then you can do some more formal peak picking then the MS data will give you an accurate, if you've got a high resolution instrument, an accurate mass and then you can extract from there the MS MS if you have done it, if you haven't then XMS will still allow you to identify compounds by the mass match alone but it's not recommended peak detection is hard with mass spectrums LCMS peak detection algorithms are always evolving XMS uses a relatively old one which at the time was very good but there are better ones now that have come out it will it will fit things according to a function to maximize the peak intensity and reduce the nearby neighbors to give you a central peak that you'll work with it does a nice job of peak alignment and retention time alignment the top spectrum is an example of just the natural drift that you get you don't get this drift with MR you don't get this drift with GC so it is an issue for LC but when you do this you can see how everything matches up quite nicely and when they started comparing XMS about I guess this is 10 years ago it had the highest precision and recall in terms of identifying peaks and picking peaks out subsequently it's not the best there are other programs that are doing better but it's still pretty good so I think the peak detection that you're talking about here is the matched filter method XMS and they have since like years later in the sent wave method is that still would you consider that like state of the art or do you use something else it's probably pretty close to state of the art but there's other issues in terms of peak detection and the peak alignment that are still an issue and whether it's the sent wave filtering or other types of filters it's not just that alone there's a lot of other things that need to be done to actually get good peaks and to be certain of those peaks so there's still a continuing problem with XMS of false positives too many and so there's other programs that have reduced that and the specific details that they have I don't know but it's it's a pretty robust program so we're not saying don't use it it's just you are going to get slightly more false positives than some of the other ones for high resolution is that data correct to rather use the sent wave than the match filter well that's probably now the default so you probably wouldn't be using a match filter now anymore or you'd have to go deep into the program to ask if you wanted to use a match filter so this is just taking you through what you would do with XCMS online you're going to learn how to use XCMS offline and I'm not going to spend too much time on it but it is a useful server it's not something you want to have 30 people doing at the same time in fact we would be blacklisted very quickly and they shut us down and it's also case that XCMS online is not really fast you have to be able to expect to spend or wait sometimes a couple of hours or try and spend on it midnight or at low user time so you can access it you also have to have an account so it's not open access but once you've got an account which I think many of you now have you can create a job and there are different types of jobs where there's looking at multi group analysis pairwise analysis multiple studies so you can pick and choose which job and if you've chosen in this case a pairwise job it's probably the most common metabolomics type of experiment then you can upload your two data sets one for cases, one for controls and again it's relatively user friendly once you've defined your panels then you can start uploading your spectra and gain different options and I'm not going to go into those details because it's probably even slightly different so you've uploaded your spectra because you're uploading hundreds of megabytes usually that takes a while especially if you've got a slow connection what's wonderful about XCMS is it has enormous capacity for handling all kinds of different instrument types a long, long list so we had one which was just an HPL single quad almost no one does that anymore I think the data itself was like 15 years old but it was compatible so we were able to do that once you've told it what sort of format it is then you just submit wait a while a long time too long for this course actually but then eventually you'll see it'll tell you where your queue is and then you can wait for an email notification and then once you've got your notification you can download the results including your peak list tables and this will give you your mass peak identifiers retention times intensities so note that this is not quantitative these are relative intensities so yes if the intensity is a thousand for one and the intensity is a hundred for the other probably the one with a thousand is more abundant but remember with mass spectrometry that's essentially a measure of how an ion flies it's not necessarily a pure measure of its abundance on the other hand with NMR it matches exactly with concentration and if you've done with the GC AutoFit all of the appropriate calibrations for quantitation and quantitation standards then yes you can also do that so with untargeted metabolomics you're simply doing relative quantification relative measures so as I said there's XMS has an issue with false positives intensities are only relative unless you link to METLIN you aren't going to get the peaks identified to compound names and that's where you need the METLIN database so that's a quick run through of XCMS online we're not going to do it today although you're free to do it in the evening you're free to do it at home it is I think a widely used program by many people so so with NMR we talked about how it was better for certain things DCMS better for certain things LCMS really good for lipids this is where almost all lipidomics is done you can do fatty acids bases and amino acids it's best for more hydrophobic molecules even if you use HILIC it's still a better technique for hydrophobic ones really if you want to do high quality identification you need MS MS data tandem MS data you need retention time you need internal standards if for some reason you can't do MS MS or you're just desperate to get information or if it's a case where it's more of a defined standard set and mixture that you're looking at and you just need to be able to say which one is which maybe you don't need to have the MS spectral matching so you can do mass matching so mass matching is a say strongly discouraged but it is possible in a case that you don't have any other resource you can go and search against Kebby against PubChem against ChemSpider or against the human metabolism database so the largest database in the world for chemistry is PubChem and I don't know how many of you have used PubChem how many people have played around with it not that many okay anyways you can do a mass search you can choose to search for a mass range in this case we've chosen from 89 to 89.1 Dalton's and here's the list of 400 Dodd compounds that have that match so that's a mass range search you can do the same thing with Kebby chemical entities of biological interest and you can do that for HMDB now that's not something you normally do generally you want to be able to do more advanced MS searching you want to be able to look at actual spectra and upload those spectra and see if those spectra match to anything now I'm going to emphasize something just about these different databases whether it's Kebby or PubChem, ChemSpider, HMDB and I'll emphasize this again later but PubChem is the largest database probably HMDB is the smallest database the PubChem has I think 85 million ChemSpider around 50 million Kebby has about 60,000 HMDB has about 42,000 compounds 99.5% of the compounds in PubChem are irrelevant to metabolomics 99.5% of the compounds have never left the lab so they are not in humans, they're not in animals they're not in plants, they're not in microbes they're not in the environment so if you're going to be doing a search against PubChem or ChemSpider you have a 99.5% chance of getting a false positive unfortunately a lot of people in the metabolomics community do that and they publish basically junk claiming that based on mass matches they found something which cannot physically be in the environment or in an animal or in a human same sort of thing is they say okay I'm not going to use PubChem I'm going to be Spiderman and use Kebby so Kebby is a collection of metabolites that is that are biologically interesting so it has really interesting mushroom metabolites it has really interesting fruit fly metabolites it has really interesting phytochemical or sponge metabolites those are things you're not going to find in humans or in mice you have to know from where these things come from so again you can get a nice match to a mass or even to a chemical formula in Kebby but if you don't worry about the origin or provenance you'll also get garbage so from the perspective of metabolomics really the best thing is to go to databases that are specific to an organism so if you're studying humans or other mammals including mice probably the human metabolite database is a good one that would cover basically what you will expect to find in mammals if you're wanting to look at bacteria go to a bacteria database if you're looking at yeast go to a yeast database if you're looking at plants go to a plant database this is also a problem with metland because it's also amalgamated all kinds of different databases from many different organisms it's a similar problem with lipid maps where they've also merged lipids from insects plants and humans which again you're not going to find insects lipids in humans and plant lipids insects and so on so we'll go into the masks advanced searches so you can do molecular weight searches but you can also do parent ion searches you can look for positive megadions you can put in peak lists so these are really intended for high end mass spectral searching so most people are familiar with metland we'll talk about that later we'll also talk about mass bank I'm going to talk a little bit about CFMID which I don't think most of you have heard of so this is a tool that was developed a couple of years ago to actually predict mass spectra so you can take a compound any compound and it will predict the EIMS or ESIMS spectra tandem mass spectra and it does a pretty good job it'll predict the intensities of locations it'll analyze the fragment ions and annotates them it also allows you to do compound identification from uploaded MS-MS spectra so you can type in all of your peaks and peak intensities and it will look through its own library of mass spectra and some of them the newer version will have actually real spectra but it also has a lot of the predicted spectra from HMDB so for 42,000 you can use it from keg which covers all the other organisms plants and microbes so a total number of compounds something around 60,000 or so that it would have that it would do a match for so if you wanted to do that sort of search you can click between these so here's my spectrum what is the compound you can upload it here's your list of peaks that are chosen in case of an MS-MS spectrum 20, 10, 20 or 40 electron volts and then it'll run through and generate a match and it'll rank things and the blue ones are the observed the red are the database and then it'll indicate sort of the match factor score and the ranking so with the issue of just doing tandem or not even tandem you're going to have lots of peaks and these peaks come from adults come from neutral losses and they come from multiple charging events and these peaks essentially represent noise, they're confounders to MS analysis and so if you've got these 30,000 or 20,000 features most of those features are exactly these things the neutral loss species, the salt addicts and you want to try and reduce that what do we mean by addict formation so remember with a mass spectrometer we charge things we'll get a positive or a negative charge that's the parent ion so here's one at 961 Daltons but this one also has a sodium addict and in fact the sodium addict is much more intense and there are many examples where a sodium or a lithium or a potassium you can also see the isotopemers the 3 or 4 extra peaks that come from this carbon 13 or deuterium and these are examples of addicts these are also examples again this is a higher resolution mass spec TOF where you're taking in this case a real spectrum we've taken the total ion chromatogram or baseline base peak chromatogram we've got the extracted ion chromatogram in the bottom and then from there we're seeing the peaks and we can see the addict, the sodium addict we can see the parent ion peak from that and then we can also see I think is there two sodiums minus a hydrogen so when you're working with biological samples you've got sodium floating around you've got potassium floating around you have buffers that you might have prepared all of these have ions and in this case it could be formate or acetate as well these are some of the addicts that will appear so a given compound could have 10 or 15 other peaks all arising from simply the interaction of ions, sodium chloride, formate with your parent ion some of them a combination of multiple charges multiple charges single charges so this is one reason why you see so many peaks in a mass spectrum all the morphine has created a large list of addicts and they've made that public it also allows you to calculate some of those masses in mass spectrometry we also get this phenomena called neutral losses where you lose the equivalent of a water or you lose the equivalent of a sugar those are peaks that don't necessarily show up because they're neutral fragments but you will also see the fragments that are charged that come from the loss of the water or the loss of a sugar and again these are simply noise peaks so handling or predicting addicts is an important thing and many commercial programs actually support that there are some freeware programs in MZDB and HMDB that do that METLIN and MZDB also handle ion pairs and multiple charge species and METLIN also does a nice job of handling neutral loss species so when you're given a mass spectrum from untargeted studies you're going to have literally tens of thousands of features or peaks and what you need to do is identify and remove all those false positives or those addicts, the multiply charge species you want to consolidate the neutral losses, the breakdown and rearrangement ones you also want to get rid of all those isotope peaks because they are essentially the same compound but just isotopemers and in many cases you also have to deal with the blank noise which comes from sample blanks just like with GCMS when you run something through a column stuff comes off it wasn't part of your sample but it's just stuff and there are different ways of getting rid of those noise peaks things that only show up periodically or things that never dilute no matter how much water or solvent you put through so these are various approaches that help you get rid of it so let's say you start with a positive ion mode spectrum let's say you see 15,000 features it's usually what people report to impress everyone say my sample gave me 15,000 features but once you get rid of the addicts that 15,000 drops to 12 once you move the multiply charge species that drops from 12 to 10 once you move the neutral loss fragments that drops from 10 to 8 once you remove the isotope features that drops from 8 to 3 once you remove the noise ones you're down to 2,500 real peaks so drop by a factor of 5 or 6 or 7 is not unusual and generally when you do it in a negative mode negative ion mode there's fewer peaks even though you might again get 10,000 features so if you did both positive and negative ion mode you probably get maybe 4,000 useful consolidated peaks and those are probably real compounds of which about 200 to 500 are probably identifiable and the tools that use all of this as I said there's some freeware programs that do this but or partial lists but MZMine, Metfusion, Magma and XCMS are the ones that do that along with commercial tools now I've been discouraging this but I'll bring it out which is the excitement over high resolution mass spectrometry particularly with Orbitrops which give you 1 ppm or less FTMS which give you 1 ppm and even better TOFM instruments are now down to 1 to 2 ppm if you can measure a mass of a compound to 5 or 6 significant digits that's really useful and there's some examples of how you can help with the identification process and this is where you can take the mass and convert the mass to a chemical formula so there's some servers like MZDB that's maintained by IberisWith where you can type in your molecular weight to high accurate mass and indicate both the charge and the type of likely chemicals or elements that you're going to expect in the case of living systems it's just carbon, hydrogen, oxygen, nitrogen, sulfur phosphorus so that restriction actually immediately reduces the types of compounds and feasible formulas that you're going to end up with so if you've got a formula then you can be somewhat more specific than a mass so you can search for various databases including Pubcam and HMDB and others by formula but you can refine the formula even further from just the mass to clear things like the isotopic abundance remember when we saw that pattern with chlorine 37 and chlorine 35 how you saw a big peak and then a little peak and then another big peak that's very characteristic of a chlorine there are also chemical bonding restrictions that say you can't have a chemical that's C28H2 it just doesn't happen so using rules like Lewis and senior rules this is a feasible chemical formula and a feasible composition you can come up with essentially a collection of viable formulas this was put together a number of years ago by Oliver Fien and they called it Seven Golden Rules which has subsequently been incorporated in just about every commercial program to do formula generation or formula filtering and download Seven Golden Rules it's an Excel or visual basic program but it allows you to take essentially a raw mass spectrum and narrow things down you can also take things from commercial tools this is a Brooker formula filter program and it says here are the possible formula sets we're going to have and our input mass is 525 point something something and you can see from that it's giving you about a half dozen feasible formulas based on the data that you've put into it I'm not sure if this one actually is taking any isotopic abundance in there or not yeah okay so this has been ranking it by Nsigma which is essentially including the isotope one and so this top one is probably a very good match now remember this is only a formula and it's only working it from the single mass high resolution mass so this is treading on very thin ice um it's um not using the tandem ms spectra you're not using retention time you're not using the authentic standard so ideally if you say I think this is the compound then you should go and pull out the authentic standard or what are possible authentic standards and see if you can get it a match however still shrinking things down if you think of all the possible formulas where you've got compounds made of carbon, hydrogen, nitrogen, sulfur, oxygen, phosphorus there's 8 billion possible molecular formulas with different elemental composition but if you use 7 golden rules that shrinks from 8 billion down to 600 million and then if you look at the formulas that would match in PubChem again not necessarily the best database but from 85 million compounds when you can restrict it it only leaves you with about 700,000 formula and then if you start working with natural products from that set suddenly you're reduced down to probably just 50,000 which means that in many cases you're only dealing with a few possible elements now as you increase the size of a molecule the number of possible um chemicals also increases so this is this general trend as you go from 200 Dalton's to 300 Dalton's you can see the number of possible molecular formulas climbs up almost linear from about 20 to about 80 so if you have a really big molecule then you could look at many many possible ones this is something that was also published by 7 golden rules when they compared how better resolution improves the likelihood of a first match and then exact formula so a low resolution mass low resolution data means you're going to have lots of possible formulas if you can measure things down to 1 ppm or 0.1 ppm even up to 7 or 800 Dalton's you reduced it quite significantly and then if you can use that isotopamer distribution or isotopic distribution then you can cut things down even more so when you use mass and isotopic abundance it's a powerful way of limiting likely candidates it's not perfect but it certainly has a powerful constraint on what possible chemicals could be now as I say this is the point I wanted to bring up again which is when you use these large chemical databases like PubChem or Metlin or NIST no one really tries to worry about whether natural products are metabolites they're just simply chemical libraries so they mix non-metallites with metabolites, plant with insect with microbial, drugs, buffer reagents everything so this is leading to a growing trend particularly with people using Metlin getting essentially silly hits they don't correspond to the biology of the organism so if you know the source of the organism, if you can constrain things then it would be better to use this so if you know you're working with mammalian systems use HNB, if you're working with drugs work with a drug data bank, if you're working with plants use NAPSAC or if you're working with E. coli or other microbes use microbial specific databases now in the last two minutes here I'll just wrap up but we're talking about quantification so with LCMS we haven't talked about it, with GCMS and NMR we've talked about quantification but most LCMS and many GCMS metabolite studies don't use absolute quantification to do absolute quantification you have to do isotopic dilution you have to use isotopic standards deuterium, carbon-13 or nitrogen-15 standards those ideally have to be the same compound so N15 alanine to characterize alanine or they have to be very close to that compound you also use a technique called single reaction monitoring or multiple reaction monitoring when you're doing absolute quantification by MS to make sure that you're getting the exact compound so this is done usually with triple quadruple or linear ion trap and instruments so multiple reaction monitoring means you have tandem mass spectrometers and you will have a Q1 and I'll quote the Q3 because the Q2 is used for the collision signatures you're looking for a particular mass a particular filter and a particular product ion and so those are your signatures to positively identify the compound and then you use the isotopic dilution method to actually help calibrate and figure out the quantity so it's possible to do quantitation on your own if you've gotten a collection of standards but you can increasingly now go to kit systems biocrities is one example of a company that produces kits you can also get, I think systems from Shimadzu and I think a few others now where they allow you to quantify and do targeted absolute quantification of metabolites so the P180 kit is a common one and you can get measures of sphingoma islands and Pocetata colines and LysoPCs and now it's up to all 20 amino acids a bunch of biogenic amines and about 40 acylcarnitines and so it's targeted 86 compounds are there about so it's not going to be as expensive or as thorough or complete as say an untargeted analysis but this is targeted quantitative metabolomics yes how much you can't customize with it so it's a kit that is set and the same metabolites are measured all the time they have other kits they have a bile acid kit they have a steroid kit you can go to other vendors like I guess Shimadzu which will have a kit that's specific to their instrument in our lab we make our own kits and so we customize our assays as we wish but we've had to spend lots of money acquiring standards Tammy has built a number of those kits so if you guys want to know how to do that she can tell you but it's a lot of work but the benefits are that it becomes very automated and that's largely what they do with the biocrities and that's the appeal just like basil is automatic just like GC AutoFit is automatic these biocrities kits are automatic load it up go home the results on your computer the next day so that's very appealing and in the case of these MS kits you can measure down to nanomolar all the way up to millimolar so it's an enormous range of concentrations you can detect so it's much more sensitive than NMR and much more sensitive than GCMS as a rule so it's quite appealing and a lot of people are using these targeted kits now in metabolomics so LCMS both for targeted and for untargeted but the vast majority of people still do untargeted metabolomics via LCMS which is challenging as you guys will find out later this afternoon