 So, I introduced you guys a little bit to NMR, and there's three or four of you who have had some experience with NMR spectroscopy. We're going to have plenty of opportunity in the rest of today and part of tomorrow we'll be talking about mass spectrometry, but I am actually an NMR spectroscopist, so I have an affinity to the idea of NMR, and we're going to talk a little bit about the idea of software from a tablet identification and quantification. I'm going to give you a little bit of an introduction to spectral deconvolution, and then I'm going to just dive into this NMR tutorial, and then you guys are going to have a chance to work with the software. Now, one of the things I think we were doing, and I don't know if we had any opportunity when people were settling in, how many people were able to successfully download their Kinomics demo the most, how many were not able to, maybe that's the best thing, if it doesn't sound familiar then maybe, but okay, good, alright, so we're going to talk about spectral deconvolution, and this is actually a general truce about a lot of mass or NMR or any kind of compound identification process in metabolomics. So we're using NMR as a model, but it actually is the same sort of concept that's used and reused in GCMS, LCMS, MS, other things. So we're going to learn about deconvolution, and we're going to learn about compound identification, compound quantification. These are the fundamental pieces of metabolomics and modern metabolomics today, and as I say, the model we're going to use is some NMR based one, partly because it's a little easier to learn, and we can get some software that's fairly user friendly. And we're going to use this to try and actually work with a real biosuit spectrum. So I showed you this picture before, though I think the annotation was missing it, but this is just to reiterate this idea that there's the quantitative or targeted metabolomics and then there's the non-targeted approach. The quantitative one is the one we're going to emphasize in this course, and one that is, I say, more and more frequently done in the community, whereas the chemometric method is the historical method. It was one that was used and still can be, and in fact still represents the majority of applications, where it's more a pattern analysis. The quantitative one means you identify things. So all the peaks have labels and you quantify things. So all the areas under the peaks are measured. So metabolomics is a bit of a laggard when we compare ourselves to other omics techniques. So genomics has been able to make many of its great strides because of a certain program called BLAST and a certain database called GenBank. So if you sequence something, you can use BLAST to find matching genes and therefore actually annotate it. We can use RNA-seq as other methods, same sort of thing. We can not only annotate but then quantify so we can get information about transcript abundance. So all of that information is at our fingertips, it's on the web. Some of you have done a little bit of proteomics or have heard about it, but we can use the same thing, whether it's 2D gels or mass spec methods. And we can use a program called Mascot or other tools like that. And through that we can get information about protein identifiers. We can measure how much is there or at least relative abundance. And so again, these are tools that are out there and they're public. So if someone uploads a chromatogram, an HPLC chromatogram, a GCMS chromatogram, what do you do with it? And for most part people try and take out their ruler and measure peaks and cut out peaks and weigh them and do the integration the old way. There just wasn't historically a lot of tools for identifying and quantifying things. And that's been the thing that sort of has held us back. It's what we're going to talk about in this course. And I think what we'll do first off is talk about metabolite identification. So in metabolomics we talk about knowns and unknowns. But then among the unknowns they have the known unknowns and the unknown unknowns. And this comes out of a term that the former secretary of defense, Donald Rumsfeld, quoted in I think it was 2001 or something like that. And this is there are known unknowns. That is to say we know there are some things we do not know. But there are also unknown unknowns, the ones that we don't know that we don't know. And it's certainly after he said that he got fired. But the issue here is that we are in a situation in metabolomics where we are able to identify things. There's peaks there. But the peaks actually correspond to initially something we don't know. But they are in some data bank somewhere. And the fact that they're in a data bank means that we will eventually find out what they are. In some cases they may not be in a data bank but they're described in a book or in a journal. And if we do enough digging we'll maybe find out what they are. Then there are the ones that no one has ever described before. No one has ever characterized truly novel compounds. In some cases an exciting discovery. And in those cases you actually have to spend quite a bit of time to figure out what they are and what they look like. It's not unusual to spend two to three years to figure out what an unknown unknown is. I only know of a couple people who've actually truly discovered and characterized a completely novel compound. And I know a lot of chemists and a lot of people in the metabolomics field. So in most cases what we're dealing with is identifying known unknowns. Compound's been identified. Someone's described it. It's sitting there out for us to rediscover. In many cases we're identifying hundreds of those. So that's what spectral decongolution is for. It's to identify the known unknowns. The idea is then to work with a spectrum. Could be NMR, could be GCMS, could be LCMS, MS, whatever. By looking at a database of pure compounds in a pre-compiled database. So that's kind of like BLAST and GenBank. Someone has pre-compiled all of the gene sequences for us. And we use BLAST to search and match. So that identification of known unknowns is often called targeted or quantitative profiling. What I've shown on the right there is really just a spectrum. It's an NMR spectrum where we have deconvoluted the spectrum. And below that is a list of the compounds and their concentrations. So deconvolution is sort of depicted here. Sure. Because I'm getting some dupes about the targeted. Now in proteomics you have SORNs. You really do want to take it. Otherwise you look at the whole of the compounds. But you're not doing really quantity. I mean, you do a spectrum combination. It's not real in the sense that I thought that targeted is to be the SORNs. For mass spec it is SORMs or MRMs. And it's working with isotopic standards. And it is fully quantitatively characterizing them. In NMR we use the spectral deconvolution approach. For GCMS it's not MRMs but it's the spectral deconvolution approach. For MSMS, maybe not so for quantitation. But for identification it's a spectral deconvolution. Or spectral matching if you want. So there's sort of slicing it different ways if you want. In terms of targeted and non-targeted quantitative versus non-quantitative. The targeted metabolomics largely means identifying. And sometimes it's a case of classes of compounds, which is compelled by chromatography or methodology you use. In the case of NMR strictly it is non-targeted metabolomics. You're measuring everything that's there that's detectable. But it is quantitative metabolomics because you can quantify what's there. And therefore you are identifying and quantifying at the same time. That's right. So that's why I prefer just to use the term quantitative metabolomics and non-quantitative metabolomics. Because targeted, yeah, that was a term developed for the LCMS community. It means nothing in the NMR community. And it doesn't mean a whole lot in the GCMS community either. Whereas quantitative and non-quantitative metabolomics is basically saying I don't know what I've measured. At least that's the non-quantitative. And I don't know how much I've measured. But I do know that they're different. Whereas quantitative metabolomics is saying I know what I've measured. I know what I've measured and how much. And therefore I can say what's different. So my preference is to use quantitative, non-quantitative. But the community still uses targeted, non-targeted. Okay, so there's some confusion in the recognition zone. Yeah. Because I know that on target it means I'm measuring everything. Technically it is. Yeah. And that's sort of what it is. But it's strictly LCMS. Is that LCMS? Yeah. And it was developed by people in the LCMS community. But it really doesn't apply to all the other techniques in metabolomics. Okay. So that's why I say that the community has to get with it. We have to use, I think, the term quantitative and non-quantitative. Okay. What's the confusion between absolute quantification and relative quantification? We do. I do. And again, absolute quantification is much better than relative quantification. And the reason is, is that relative quantification is defined by your reference standard. If your reference standard is not available to everyone, then you can't be consistent. Whereas absolute quantification is the same value, regardless of which country, which lab, which city you're in. We all use micromolar. It's a universally defined value. And it's one that people can universally measure. It's also the standard that's used in clinical work. We make it as quantitative as possible. There are a few relative quantitation assays in clinical chemistry, but those are on their way out, I think. So absolute quantitation is your preferred route. So back to the point about deconvolution. So this is the example here where the blue one is an example of a mixture. It's a simple mixture, but it's a mixture. And this is an example of an NMR spectrum. We could pretend it's also a GCMS or an LCMS, but it's a bunch of peaks. And in this case, as I mentioned before, NMR is a little different than, say, mass spec, although not entirely. Typically, you'll see multiple peaks correspond to a single compound. So in red, that's compound A, green is compound B, and purple is compound C. These are their NMR spectra. You can see that if you add one of each one, so one micromolar, one micromolar, one micromolar, one millimolar, one millimolar, you will end up with exactly the top spectrum. So you can see that this contributes to that peak, B3 contributes to that. So just adding them together. What we try and do in spectral deconvolution is the reverse. We take the mixture and try and figure out what the components were. So it's the reverse problem of saying, oh, here are my three pure compounds. How would they sum to produce a mixture? So it's a little harder problem, and it's one that takes a bit of work. So there are two software packages for doing this in NMR. One is produced by Kinomics, and another one is produced by Brooker called Amix. What we're going to use today is this one from Kinomics. And the reason why we're doing it is because it's actually a company that was in Canada. Brooker was in Germany. It also specifically works with two types of NMR spectrometers, whereas the Brooker is only to work with one. It's quite user-friendly. It's been around longer than the Brooker one, and it has a pretty large library of reference spectra. So that makes it useful not just for analyzing, say, blood or urine, but you could also analyze plant extracts and microbial extracts and a bunch of other things. And so this is sort of the breakdown of the different compounds that they have in their library. And what they've done is they collected these reference spectra at 400, 500, 600, 700, 800 MHz. These are measures of the strength of the NMR. So I think everyone has downloaded the software and it's sitting on their laptops, hopefully. And I think you have like a 30-day license, is that right? So the software itself is divided into two parts. There's a processor part and a profiler part. And I'm just going to step through this so you guys can... Maybe you guys have already played with it. If you haven't, then this is to sort of get you tuned so that we'll have a bit of time. So the processor part is to fix the spectra. Remember I mentioned that NMR spectra, when the first moment that you read them, are all kind of warped. And it's just the way that radio receivers work. And so they need to have a little bit of massaging or fixing. And that's what the processor part's done. The profiler one is sort of the meat of the thing. And this is where you do the deconvolution. You do it manually, but it guides you along. It helps you along. So I'll just sort of talk about the processor. So if you pop up the screen, it has a characteristic set. It has sign marks, it's a Windows view, but it has something that allows you to track your history, what you've been doing. It has standard pull-down menus at the top. It has a viewer where you can look at the spectrum. It has a little thumbnail so you can start navigating. Sometimes you'll be zooming or expanding, so the thumbnail allows you to see where this stuff is. And then there's sort of a status bar. So if you were to do this, and you're not supposed to do this while I'm talking, so you're going to do this after, but the idea is to use this to help remind you what you're going to do. So you launch the processor. And then just like you're opening up a Microsoft Word document or PowerPoint, you upload a file. So you go to File, Open. There'll be a file there, I hope, with an NMR spectrum of some kind for you to work on. It'll upload and say, you know, is this what you think you have? So 500 is this. Once you've confirmed those spectral parameters, then you're going to phase the spectrum. And so this is a case where we have things that are in phase and out of phase. And it's just like when you've got oscillations that are in phase and out of phase. But there are spectra where an in-phase spectrum has a nice peak. An out-of-phase spectrum is one that looks like a warped or distorted or derivative of a peak. So you're going to phase it so those things disappear, but it's not too warped. Everything is looking vertical. Everything is above zero. You're also going to define a reference point that's called a zero point reference. It's like your ruler. And it's called DSS. This is a chemical shift reference. Some people call it TSP. It's another type of chemical shift reference. TMS is another chemical shift reference. These are silenated compounds that are added to the NMR sample. And this allows you to say this is where the zero point is. Then an NMR spectrum, it measures hydrogen. There's a lot of hydrogen in water. So there's a huge water signal in NMR. It's not informative. We know there's water around everywhere. So we just might as well say it's not there. So we delete the water signal. Then after deleting the water signal after finding our zero point reference and after phasing things, things are still going to be a little warped, a little bent. So we're going to do what's called a baseline correction. So things are completely flat at the baseline. It just makes for a very smooth, nice, appealing spectrum. So it's almost like doing Photoshop if you've done. You know, you've taken a picture and you want to move red eye. You want to put things into focus that are out of focus. You want to remove the tree that was hanging over someone's head. You can do that. This is sort of what we do with spectral massaging with this processor. And then the last thing you're going to do is call the reference deconvolution, which is just basically making all the peaks look similar to each other. Sometimes they're a little wide. Sometimes they're a little narrow. This just makes sure that all the peaks are basically the same size, same shape. Again, it's like putting things into refocus. It's not quite, but yeah, you could say it's a bit like that. Yeah, it just makes it a little more fitable. So these are just some screenshots where you're uploading the spectrum. So you should be able to see something that looks like an FID. I think it's called rmlplasma.fid is the name. I hope if we've got that loaded somewhere, maybe it's in the wiki. Okay. Okay, so it'll be in your wiki. It'll be able to upload that file. It's a zip file. Then it'll just make sure that it'll have read some of these things. It'll say, did you use DSS? Yes, you did. What concentration did you use? 500 micromolar. Does it have an indicator? Yes. Sometimes imidazole, which is a chemical that can be added just to make sure that the pH is there. Do you want to have auto phase correction? Yes. So these are all sort of checked off mostly for you. And so as I say, once that's done, and it's not much more difficult than uploading a Microsoft Word file, this is what you'll see. And what you really want to see is all of the peaks looking like standard Gaussian peak, but you can see that they're warped. That there's... And that's called a phasing problem. And it's just the electronics of the instrument. So it's not unfixable. And every NMR instrument has this mechanism for phasing. So you've seen this thing. And this is our starting spectrum. You can see it's out of phase. You can also see the left side is higher than the right side. You can see that there's a couple of giant peaks in the middle. Those are some that we want to get rid of. We can see a peak, which is near zero, but not exactly zero. That's actually our DSS reference. That's going to allow us to put things in the right frame. And so what you're doing in this processor function is you're referencing, phasing, getting rid of the water, and then spectrum decontolution, and then making sure that the left side is about the same height as the right side, which is the baseline correction. So you're photoshopping your spectrum, and it's making it much more readable. So there's different options, just like that. There's the phase correction in green. There's the baseline collection in red. There's a shim correction. There's a water deletion or region deletion. So those are the three or four that you need to use. And you can choose them. So the first one we do is we do the phasing. So you choose phase spectrum. That's your option. There it is. You can see things that are a little warped or bent. And so you can slide these things left and right to shift up things. Things that we're pointing down. They can bring them up. Things that were tilted if you wanted or sorted out. So it's just sort of sliding things around. Again, a slide bar, anyone can play around with that. But every time it's going to get a slightly different treatment, just like every photo you might take will be slightly in focus or red eye will be slightly worse. So each person has to modify it. Yes. This is a good point. And this is one of the central issues with NMR based metabolomics. People who are reasonably skilled will produce essentially the same looking spectrum. It's just like most of us can look at a painting and if it's looking crooked, most of us can get it mostly straight. Some not quite perfect, but if you've done enough crooked painting fixes, you'll get better. So it is a visual thing, but it's true. It's not going to be perfect. Between the 17, 18 people in this room, everyone's going to be slightly different. But that's only going to lead potentially to about a 1% error in terms of your quantitation. So that's within sort of what we expect. So the phasing, as I say, just has to do with how the signals are collected in NMR, how the receivers are configured, a collection of real and imaginary signals. So this is sine, if you've ever heard of real and imaginary numbers, cosine x plus i sine x. These are the way the electronics people like to think of things. Some things are in phase, some things are out of phase. You can see the one on the left under r e top is something in phase. And the imaginary component, that's the out of phase component. And that's what you're trying to eliminate in your phasing. So if you're not an NMR person, it probably means nothing. But this is what's happening. So you phased it, it's looking good. Now you're going to remove the water peak. So in serum or in urine or any extract from a plant or microbial sample, it's 110 molar of water. Whereas all the metabolites you're measuring are about 100 micromolar. So that's a million times more water than all the other metabolites. So it's kind of a big elephant in the room. And so what you do is you remove the elephant in the room. And so your spectra, if you zoom out, that's all you see is water. There's nothing else. Once you get rid of it, now you see all the other stuff. There's about 50 compounds there. So it's a standard trick. It's been used for decades. But this is a way of removing the water peak. Does the peer always look the same? Peers pretty much at the same position at 4.87 parts per million. Yeah. And it's a very broad peak. Now the other part was usually the left side is a little higher than the right side. The right side is a little higher than the left side. This is called baseline correction. So it's just making sure that there is a flat line. And you can see that this is the noise. Just to make sure that it is completely horizontal. And again, the electronics sometimes mean that it's not. And you can see, if you zoom in, this is a little lower. That kind of works here. It's a little higher. So what it's done is it's put in a spline curve. And this is a fixing method that's been used again for decades. Just to make sure that everything is flat. And it does this itself. You just have to look at it to make sure if it's okay. You can tweak it manually if you want to smooth it out. And again, this is something that people are actually very good at, but computers aren't. So that's partly why it's done in this visual way. As I say, ask a computer to move a crooked picture and it'll probably make it more crooked. Ask a human. We usually can straighten out the picture pretty well. The next thing we do is this reference deconvolution. It's not absolutely essential, but it's often used. And this is to take your best looking peak, which is usually your DSS peak, which is the one at 0.0 PPM, and to sort of convolut or transfer that same shape to all the other peaks in the rest of the spectrum. So this is your DSS peak. And it has here almost a perfect Laurentian shape. It's perfectly shimmed. It's perfectly smooth. And you want every peak in your spectrum to look like that. Obviously not in the same height, because each one is a different height, but every peak should look like a variation of that. And so this is sort of what you're doing. So some of your peaks weigh at one end or weigh at the other end. They'll have a little bit of a bend to them. Some of them will be slightly warped. This fixes that warping, so they all look the same. Not obviously not going to be the same height, but they have the same shape. So once you've done those four processing steps, you now have a really nice spectrum. So your picture has been photoshopped very nicely. And everyone will say you're a professional photographer. And that's what you're trying to do with this. So that's the processor. And in a perfect world, everyone will have processed their spectrum to look basically nearly identical. And so it's a reference point. But there will be some differences. And Carolina brought this up. So between the 17 of you, there will probably be a little bit of difference between each one. Anyways, at this stage you're ready just to profile. You're ready to do the spectral deconvolution. So you can now pull up your spectrum, but you pull it up through the profiler. The profiler has a similar model, same sort of thing. It has a spectrum view, the big one there, just like this. It has a thumbnail just like the other one. It has a cyber view just like the processor. But it has a couple of other things that are different. It has a table, which links the current counts and the concentration. And then it has a navigation tool called the cluster navigator. The last thing to move from peak to peak or group of peaks to group of peaks, up and down the spectrum, left to right or whatever. And it gives you numbers. This is a repetitive dynamic divided in the base peaks at 8.04.03.5, 3.4, 2.4, 0.9. It's a fairly complicated spectrum, and so you can click on these things and move from the left to right, back to the middle again, to see how these things are looking or where they are. So for profiler, just like with a processor, you launch the program, open the spectrum that you've just cleaned up, and then indicate that you're ready to start profiling the compounds working on a 500 megahertz instrument. First thing to do is to start with your reference compound, DSS. This is a compound that's added to every NMR sample. It gives you your zero-point reference. So you're going to try and fit DSS first. That's good practice, but it also makes sure that everything is working. And then once you've finished with the DSS, you can start fitting some of the other compounds. Once you've fitted or deconvoluted the spectrum with the other compounds, then you can export this Excel spreadsheet, which is your list of compounds and their concentrations. So you've gone from a raw spectrum, it looks like a crazy bunch of peaks, the list of compounds and their concentrations. A skilled person can do this in about 15 minutes. Most of you will take 2 hours. So here it is. You've launched the profile of the spectrum instead of being black, it's green, and then it may have identified or estimated some approximate concentrations. But that's what the profiler looks like. You're going to choose a library because you've done it 500 megahertz. You're going to choose the 500 megahertz library. So it means 450 compounds have been collected on a 500 megahertz instrument. Each one individually. Each one at different pHs. So this actually took about 8 years for the company to collect all of these samples and cost them lots and lots of money. But that's the library and that's the one you can use. So the first one is profiling DSS. That's a reference compound. Rosa? There should be TSP. There wouldn't be TMS, but there should be TSP in the library. Okay, you didn't find it? Okay. It might be listed as a full name tetramethylsilyl or trimethylsilylproporonic acid or something like that. So you might want to check if it's listed under a different name. So... But the ones that you should be looking at should have DSS or the ones that are in this particular example. Okay, so to profile a DSS, you go to 0. That's where the first peak is. Turns out there's four clusters of DSS peaks. 1 at 0, 1 at 0.6, 1 at 1.8, 1 at 2.9. And you want to make sure that you can fit each of these. So you can see where this green line is but in the dash line is where you want to sort of and so you can use your mouse and move the peak up so that it fits under the actual peak. So these are examples as you're fitting. You can see the green line in this case is where the subtraction is. So you... it's what's left over. So you can see that in this DSS we haven't quite fitted fully so we can push it up a little bit for the DSS. We're now looking at 0.6 ppm at the cluster at 0.6 and you can see that it's a fairly noisy black line. There's always noise. But you can see the green line itself looks pretty much flat all the way. Here's the cluster at 2.9 and you can see the blue, the purple. The green line is what's left over and then you can see this other fit here and the green line is the black line. It also looks pretty much flat. So this is just simply done by moving things left and right up or down until the peaks line up and then the green line or what's left over looks flat or... Oh, that's a subtraction line. That's a subtraction line. Yeah. If it looks like there's still peaks... Yeah. So you have to either move it up or shift it to the left or right. So that's, as I say, just done by clicking and dragging. You're dragging the purple thing around up or down, expanding it, increasing it. So you'll fit those DSS's and at that stage DSS is fit and you'll have the actual concentration of DSS and it should be somewhere around say 490 to 510 micromolar. Which you should know before adding because that's what you said about it. Exactly. That's what you've added. But it'll vary a little bit, partly because no one's perfect at weighing and there's some water retention and other things. So the next thing you could do is look at another compound. Now normally when people are doing this, they actually have a list written beside them about what they're supposed to find. So in serum we do know what you're supposed to find in blood. And there's about 50 to 55 compounds that are detectable by NMR. And so from a list out of the 450 you can go to this sublist that has been published our lab has published it. I'm not sure if Cantasia died on me or not. Okay, so we can now go to acetate, we can fit that. We can go to another compound down in the alphabet, alanine. We know acetate and alanine are supposed to be in blood. And again we'll sort of shift and drag things. There is an auto-fit function in economic software. So in a perfect world you shouldn't even have to go to this process. You should just sort of say go and then walk away and then come back and it's and that's kind of what the auto-fit does. I haven't used it. I think this is the first first version of the software that's actually had this. So it'll do it, it can do it for a few of them. And if you guys can play around this one sort of illustrates the auto-fit so it's fit automatically. Okay, and it'll sort of wander around and make sure things will do that. So if you guys use this and can figure out how to use this reasonably well you should be able to go through this exercise a little bit faster than in the past past courses. Yes. So this is what the auto-fit did and you can see there's pretty much no the green line is pretty much flat so it zeroed in on it. So to completely characterize these things you're going to have to repeat this process about 45-50 times for different compounds. We don't expect you to finish this. But just to have sort of the experience of both trying to do this deconvolution, seeing some of the challenges that are involved with it, and then to appreciate what we'll show you after lunch which is some new software that does the whole thing automatically. So once you've done all of the fitting then you can export the data into an Excel spreadsheet and that's your list. Now obviously if you're doing a real metabolomic study you would do this for 30, 40, 50, 60 times for each of the different samples and you'd have lists of concentrations and you could do that. So at this stage I think you guys have how much time do we have left?