 Okay, morning everyone. I hope you guys had a good night sleep last night. So the first module is about using R and XCMS for processing LCMS spectrum. So this is going to be a lecture you guys follow. So it's also practical lab. But we're going to have lots of time. So I'm going to slow down a bit and at the same time you guys have questions, stop and ask questions. So this standard slides. And so today is going to have two hours for us. So 8.13 to 10.13. So it's two hours. So we really have lots of time. So just take it easy and I can slow down and we just understand each command line by line. So again, this is R and R logo and XCMS and people tend to use this beautiful graph. It's usually generated up to we align all the spectrum and generate a deviation plot. Today we are going to try to generate one similar but much less not as this density. But we are going to have one graph like this. So yesterday they mentioned about target metabolomics or quantum metabolomics. So we also tried a bit using Chinomics. We just manually adjusted and identified. But for LC, another main field is MS based. So MS based is a lot of them is untargeted. So I guess LC-MS is quite popular. People usually starting with untarget and try to get the important peaks and up to identify important peaks and further doing downstream analysis validation. So that's quite popular. So today we are going to use one of the most widely used tools called XCMS. So I'm not sure how many of you guys have heard about XCMS. So MTAB analysts actually use XCMS online to support very simple structural processing. But if you really want to use a full power XCMS, you really need to learn because it's so flexible and a lot of parameters is not exposed through MTAB analysts or the other tools. So using XCMS from your own laptop and you can know what's the best parameters. And then you can upload to MTAB analysts or the other tools called XCMS online. So you know it and you upload it. So it's very important to try it first on your own then start using some online tools. Okay today we are going to learn how to use that. So this is an overview of what basically this whole untargeting metabolomics flow chart for MS based. So we collect our specter usually for the most standard format is NET-CDF or MZ-XML. And there's a lot of other proprietary format but usually this convert to this standard format. After this standard format and we can do the data processing. So that's a requirement. We need to have our data in this kind of format. Then we load into XCMS and in this lab we are going to learn how to process this specter using XCMS. So after we process the specter and we get a thick list of table. And this table is really suitable for statistical analysis. We can directly upload it to a MTAB analyst which is covered in the next module. And also for this very significant fix and we can do some potential identification. Yesterday we learned something using HMDB. Today we also metaling and some pathway analysis. If we identify the compound with all the names we can also do some functional analysis. So that's partially tried yesterday in the integrated assignment. But today we are going to focus on the process in the peaks and the next modules are using MTAB analysts to do a statistical analysis. So this is a very brief mention about R and bioconductor. And we do have a R review session before this workshop. And we only see four people but I guess you guys already did your homework. So I just briefly just mentioned R is a statistical language. It's free and open source. And it's quite unique. So it really was learning. So even you don't not come with command line but it really was a time just learn a bit and gradually you'll find it's quite useful. And especially if you're doing by metabolomics it's using R and XMS like today. But a lot of people are also doing some other omics like microarrays, RNA sequence analysis. So R and bioconductor have a lot of package to help you doing that. So learning R is really worth it. So limitation as Michelle mentioned. So at the start you probably kind of feel a little bit intimidated because of command line and typing and stuff. But you should be just have strong motivation. You can overcome it. Once you overcome the first stage and you'll find it very useful. Now this is a little bit of history about some slides. I will upload this slide back to you. Yeah, be sure to add in some slides. Just try to motivate with you guys to learn. So it's our command used in script will be same. So this is a little history about XMS. So it is a release, first official release of 2006. And then did a adding new algorithms to a percent high resolution is sent away algorithm is 2009, 2010. So there's several major updates since then. So here I should, there's another one is XMS online. So I finally made it online. I'm going to gradually discuss it. Discuss it XM online in the last few slides. So it is very most widely used for on target metabolomics. That's why we choose to learn. So again about XMS it is a free open source and powerful and flexible. So because metabolomics using a variety of kind of 10 years variety instrument and if you use a good way interface and sometime you just cannot processing it because it doesn't give you the parameters. So using a command line you can really do a lot of adaptation to your own specific configurations. So whatever the XMS also high throughput unlike Genome it yesterday you doing one by one XMS you can really put all your files into a folder. It will process in one by one and doing alignment. So it really fast. So it's a batch processing. And algorithm is very good. So it's have the very good peak detection and the decommission and alignment. So this is a critical comparison about all the popular MS processing tools including some commercial ones that published several years ago. So you can see from here it's P and M. P is proteomics, M is metabolomics. So we are just focusing today we will focus on metabolomics. As you can see XMS in the two grain is metabolomics data. It's a highest average precision and recall. So it's really amazing. And if you really pay attention to this time and it's almost one of those lower compared to other tools including commercial one. It's so fast. So really make XMS kind of very favorite tools for a lot of people like me. I really like to use this tool because it fast and kind of batch processing. So also it's flexible. So this is overview of what XMS try to do. So we first is we have our spectra ready. This is a multiple samples. Then we use XMS loaded sample and XMS will try to extract the peaks. And this peak you can see it's kind of noisy. And the next one is try to do the alignment. So XMS use a very normal nonlinear alignment. It selects some internal standards in just optimal andogenous standards. So basically they use the peaks inside. They found it's more reliable. So they select internal. You don't need to tell him which compound to use. It will select by itself and determine what's the best and do an alignment. And the job is. Yeah. So it will try to determine what's the baseline. What's the real features. So once they detect that it will select the real features. So this process is iterative. At the beginning it try to best guess. But once they try to process multiple samples and they realize at this position there's a more likely have a real feature. It will use a more strict threshold to detect. Okay. I'm just looking hard at this one. So it has several runs of selecting. And finally they will get all the peaks aligned and see whether this feature is consistent appearing in different features. It's only appearing a few samples. And it's more likely to be artifact or noise. So there's several tracks we are going to discuss later. So you can see this is a nonlinear alignment. And after alignment you can see it's more aligned. So here is noise and it's aligned. And after alignment we can export this aligned peaks which is become a table. The value will be this peak intensities. And they are mass and every kind of time for the identities. And this can be really useful for this T test. It's building XCMS. We are going to briefly imagine you. But we can also do upload later on with our analysts. The other one is done between different samples. Yes. So it's not done because a raw data, one single raw data is the end access one sample. So your process sample first try to detect in the peaks and then process all the samples. Yeah. Then get all the samples and try to align them in between different samples. Yeah. Yes. Yes. So XCMS recently also added support for MSMS. So that's we're not going to cover this. But just let you know it do have these features. So this is some very simple notes about the command, the symbol. So if you use a pound sign, it means the startup, the comments. So all these things after is just for us to read understand it's not it's going to be ignored by our and this arrow sign here. This is a startup Archmound. So you are not going to enter error. So this arrow sign, you just entered it up to this one. So this is a basic, basic format for R. This is your output data. And that this is assignment. And this is function name you're going to call. And within this bracket is your input data. So I guess it's if you're doing some program use a matlab or Java or any program is more or less the same. But here is a unique is an error. The error is not an equal sign as error. So it's a quite a specific for R. So at any time, and you're just not quite sure about what this function doing, which parameter to accept. And you can use a question mark before this function name. And you will just initiate a health page about these functions. So also, if you still found not informative Google this function, usually there's a lot of information about this function. So it's for help with R there's just so many resources, you're never alone. So for today, Michelle already mentioned about we need to install the latest R. I guess you guys already did. And we did XCMS package installed. And we're going to use a test data. It's called the FBAH key not called installed. And in also realized that there's an amount package called multi test. So I'm not sure Michelle mentioned that. So in case you haven't, you didn't install it. You can just do the same thing. Install this multi test because one command will require this package. Some of them is not probably not installed by default. So let's everybody just check that you have, let's have everybody install the multi test. Yeah, I'm going back to the Yeah, so go to your script, copy this line, copy this line, you see this line, source, first copy this line. And then you use the second line, backlight, multi test. Yeah, some some of it you start under. Yeah, you just need to do this command. First, you have this source. Okay, in the source, that's not there. Okay. Yeah, the source is just tell where to look for you're not not downloaded yet. Then you use to this second command is called a bioclight multi test. And this is really downloading these tools. If you're not sure, just to reinstall. I'm just installing it right now. You don't need to reinstall XMS, just install this package, which is an L. Can you take I'm taking all of them back again, I think that we can install everything. No, I don't know that we can. So yeah, that's you reinstall, that's reinstall. So overwrite your previous one. Yeah. Yeah. Can you move the mouse? Oh, yeah. Which command are you following? I'm just doing this one. It's it's that every time it doesn't remember. So it gives them the command and you don't know where to look. So it's under. So just follow this two command. This is a source. Yeah. I went back and you install from scratch. And this command. For some reason it wasn't I had to go back and install XMS FAA HK0. This is a bigger problem over here. You can help us always. So listen, if you want to install this, you need to remove this command. I just tell you this two command. Okay. Yeah. Just this two command. No. Again, to reiterate, can you put a red sticker up when you're having difficulty? Everybody's got XMS and malt tests. Yes. Yes. Yes. Oh, okay. Ready. And uh, we love to have both You can copy the X, switch on the right side. Uh, this is good to do all you don't have to miss. So it's not okay. And we got to the end. Oh, I didn't get to find this one. Yeah. Yeah. Yeah. Yeah. Close it. Yeah. Yeah. I'm just thinking. Um. Yeah. Yeah. So, uh, so these two, um, I hope you guys can follow and get multi-tests. Who didn't get the multi-tests installed? Okay. Okay. Good. Now this is our view of what we're going to do. This is, um, this is a basic command and what he's doing for each step. So first we're going to get our raw spatial data. Then step one is to read the peak detection. This command is XMS set. So this is doing peak detection. You're not going to follow it. Just the, uh, we're just the overview and then after that we're going to do the detecting peaks and we're going to try to do alignment. So the first step is try to match these peaks across different samples. And this, uh, the command is called group. And what do you group? We can, we probably have a better time about, uh, a better idea about where these peaks are located. And we can redo this, uh, doing this retention time correction. And then we, we, we do this, uh, peak detection. So this is a kind of iterative. And, uh, after that we, we try to fill the peaks because we, as I mentioned when we know the better we can use some more stream, uh, different cutoff to try to, uh, focus on the region of our interest. So, and, uh, after we fill the peaks and align them and here we can export to a meta analyst to do a statistical analysis. But we're going to stop here because this is next module. So all we need to do is, uh, all these first three steps. Uh, for this, uh, XCMS it also have is built in t-test. So it's a deep report. And we can just briefly try and see what's, what XCMS in this simple t-test what it can do. We can, from here we can visualize the, uh, peaks because, uh, visualization we really want to see whether it's artifacts or real peaks. So this is also important. So this is the commands. We are going to discuss one by one and we're not going to exactly all of once. But it's just several commands. And, uh, first we get the, um, where the file located. In this case we use the download files in the library. But in reality we have, we can have our own data. But we'll, uh, this is the, uh, after we give the data pathways data we're going to pick peaks and, uh, doing the peak alignment and doing retention time connection, uh, correction and try to regroup and fill the peaks. Finally we are going to save it as a table and, uh, which we are going to use for next, uh, lab uh, or for your own statistical, uh, SPSS if you use other stuff. So this is overview. Now we are talking about input. I mean really, uh, uh, the input is most, uh, kind of important, uh, step. For XMS we, uh, as I mentioned we use, need to use CDF. Or people can call it another CDF. But we, uh, a lot of time you get the files.cdf. So uh, after you get your CDF files, you put all your CDF files in one folder, give some names. And, uh, and here you basically um, you just specify the paths to your files. So in this, so this is a, uh, for example your slash my, um, spectra. This is a pretend if this is your spectra located here. You just give it here. This is if you really want to look inside the CDF file. It's a very designed for the high density information. So this is a format. But no, we shouldn't bother with this. It's just, uh, for people interested in passing it, we have really uh, nice passers. So this is just for your information if you really want to know. So the second step, uh, we try to do is a peak detection. So, um, for the, uh, mass peaks we have three dimensions. We have M, M, MZ values one, one dimension. And the other time, uh, dimension is time. So, uh, the third dimension is intensity. So the peaks have, uh, mass MZ values and retention time. So it's these two, usually we try to identify these peaks by combining both. So MZ and retention time. And what we are going to compare is the intensity. So this is usually data organized this way. For XCMS, I try to detect the peaks at different, uh, uh, at this MZ dimension. So across different retention time. So this is a slice. They are going to chop this spectrum at this each MZ and try to detect. Now this is a step. This is our first step and we can try to do this peak detection for, um, file file. So I'm going to, so let's do this. You load your library XCMS. You already did. And the second one is, uh, um, load this file path. So copy and paste. Okay. I'll just put it up. Now we, now this is a, um, past have our, um, where this folder is located. Now we want to see all these files. So this is our real CDF files. If you are, if you really want interest that you can just go to this location and see what the file looks like. If you want to analyze your data you should organize all your CDF files within this, within a folder and give it to the XCMS. Yes, we have 12 samples and into groups. Yeah. Yeah, that's a library. Yeah, this exactly. Uh, we're using data as F-O-F-A-O-H-K-O data. Um, we use this as one. Yeah. If you replace the, replace that CDF path, uh, CDF path with, if you have your own data replace it by, and specify where you want. So XCMS will read from there rather here. Now, uh, next up this is the one, uh, we're going to pick picks from each of these samples in this current library. Copy and paste. This is, uh, so, uh, XCMS currently doing this, uh, pick detection at, uh, for each, uh, at the M, MZ dimension. So, uh, it output, uh, two numbers separated by this, uh, colon. The first is a mess. So second one is number of picks being detected at this, uh, uh, not written time. Yes. So it's just, let's say at, uh, 600 MZ and they have totally 8147. So, now it's, uh, so, uh, this is really fast. If you use add it towards you, it probably takes much longer. So, we can see we have totally 12 samples confirmed. And, uh, we detected picks, uh, at C, we can see when finished at last 800, 1,700, uh, 500. So average, uh, say about 800 picks. This is a, uh, just, uh, basic idea about for each sample we have how many picks. This is, uh, uh, what you can read. Get from here. Michelle. Yeah. Yeah. Uh, did you, uh, library XMS? That's a warning. MZ or R warning, uh, can be ignored. I tested, it's fine. But I guess in the next two weeks, it will synchronize. So it's just, uh, we use the most recent R. Let's, usually you get such problem. Yeah. Uh, so this is a so that's that's that's that's that's that's fine. Yeah. Yeah. Yeah. Yeah. Yeah. So, uh, uh, for the default, uh, and it, there are several parameters. So default to the work most of time, but there are several parameters. For example, scan range. If you really know, for example, uh, the start of the, uh, spectra is not good. You can scan and just up to say, uh, up to 200 or up to 300. So you can specify here. And, uh, that it's, uh, it's a four weeks at half, um, and half height. So, uh, 30 seconds. If you really know what you're doing, you can specify different parameters. You need to experiment with it. So default to the works most time. But, uh, if you know, if you don't find the good results, you probably need to tweak a little bit. But there's a lot of suggestions based on different instruments you're used and what type of columns. So you can try different ones, but this is again, it's no hard rules. It's kind of, uh, try an error. So Yes. Yeah. You really always, uh, we're going to cover last step, just visualize some peaks. You really want to show it is a real peaks as high intensity, as high confidence. And, uh, so, uh, visualization is important. And some people are suggesting I use two different orthogonal to also double check whether these peaks also detect it. So it's, um, LCMS spectra quite noisy. So a visualization is always, uh, second check, see whether you want. So there's another, uh, important that parameter is a center wave, which is for a higher resolution, um, uh, uh, spectra. So you, you can try this if you want. But for us, this is not the case, but it just let you know if you use a very high resolution, uh, spectra, you can use this center wave probably give you much more, uh, features. So after we detect all the peaks, we try to do align the peaks, uh, align the peak based on their return time and, uh, so, uh, and align the peaks at the same time correct, correcting the return time. So it's really their location also being, uh, shifted. So we can compare uh, the same peaks. Uh, we know it's consistent across different samples. So this is just the overview of a before the alignment and up alignment. This is just the, uh, so again, uh, peak alignment is across different, uh, samples. And, uh, same time we do correct in the, uh, return time drift. And, uh, we can do it now. And, uh, here is, uh, bring up your console and, uh, we do the peak alignment. We do group, copy. Can you follow? I'll follow here. Okay. Yeah. Okay. So, uh, uh, this is the output from basically how many groups and being output here. So, uh, after we do an alignment and we, we do a return time correction. Oh, this is groups detected. Basically, we, uh, we believe there's a kind of, um, we align the peaks. And, um, these peaks within the same range, or we just put them together. This means a group. So, both. The, the, the, this is, this within these groups, each peak is identified by both retention time and MZ. So, that number, what is that? Oh, this is number of groups. Just it's just a statistic summary. It's not, uh, not, just to give you output. The real data we are going to view like, I see the real data in next slide. Yeah. So, I'm going to say like 262 is close to 250 and 325 goes to 300. So, is it kind of bending things when it does peak detection? Yeah, yeah, yeah, right. Yeah, yeah, yeah. It's more or less like that. But I, uh, the internal should be more, uh, com advanced than that. So, it defaulted to try to bend peaks closed with each other. But, uh, they have internal, um, uh, reference, uh, kind of standard using peak using some metabolites. So, shifting summary problem or something less. Uh, so it's in, inside it's, it's used at non-linear alignment. So, I'm not going to explain details, but the idea is same. They're bending peaks close to each other in a group. But, uh, how they shift peaks is they determine internal. For each compound, each peak, they calculate optimal ones. So, it's... So, we do not... No, we do get them. This is, uh, some, how many peaks, uh, sorry, how many groups we get? It just tell you... So, we have 600. We have about 600. Okay. Yeah. How do you, where's 600? We're going to visualize it later. This is just summary. It's... Basically, this is... But how do you, why are you saying 600? Oh, just around. Oh, this is a group's number detected. Okay. Yeah. We know each, it's some, at this moment, all the information is inside this one. So, we have all the information. We are going to visualize later. So, uh... Remember that everything from the right is moving to the variable now. So... Yeah. All the data stored in that left variable XS. So, this is only a summary. Actually, we have all the retention time and MZ stored inside here. We are going to visualize it as a table. We are going to plot it. So, uh, this is just a summary. So, we are going to see the real data. See. Now, after we're doing alignment, we're doing a retention time correction. Basically, this is just a relabel which picks according to their group media value. Basically, we put everything in the same group. We want to change the label for the same group. We want to have the same consistent MZ and retention time. So, when we compare, we know where they are. What we are referring to. Because even each group, it's very close, but not the same. But when we want to compare them, we really want to have a consistent label. So, we can just have the same row, the same column. And you're going to get this... a summary like this. So, okay. So, this is a... for example, if we choose zero at just a reference, and some samples shifted to a positive, basically, divide it more, some samples will be just faster. Sometimes running slow, sometimes faster. For each sample, it's plotted in different... we have 12 samples. For each one, it's shifted differently. So, it tries to adjust based for each sample. They have different adjustment factor. And below, there's a density plot and basically it shows what's the correction compared to... corrected one compared to before and after what's been done. So, this is a... a summary of what drift for this sample. So far, everything looks fun. But if, in certain cases, if you really see samples far away from this major groups, the drift is quite different. It probably means something with the instrument or something. So, so far, it seems all more or less close and seems normal. Yeah, this is for each spectrum. It plot how much being... based on their alignment, how much being adjusted or how much is shifted. So, they adjusted based on how much you think it's shifted too much and just adjust more for this sample. And for some samples there was really no much adjustment. For some samples it really tried to increase, some tried to decrease. The final goals tried to make them consistent so we can compare them. We have more questions about this graph. This is probably the main graph. This is a graph I'm referring to at the beginning. If we have thousands of spectra, we plot them with... we can see a very densely packed... it's a drift. Yeah, it's not offered by XCMS. For the plot, okay, I see your question. So, for the plot like this, actually it's all your data point. You plot a range to... in this case it's... I think it's M there. So, what have we ordered and plotted? It's basically... you can generally have plot like this. But this is not offered in XCMS by default, although it's easy to do if you know R. You just get all the data points, you plot them. If you order them, it's naturally for this curve. It's no magic. You just need this high density plot organized in this plot. And it will show up like this. This is actually a simple command. You need to have this data. You sort it according to the MZ and you plot it and you're going to have a graph like this. So it's... between no B and R and it's not difficult to do. So far everybody follows? Okay, good. Now, after we're doing alignment and correcting for retention time and we just basically have our data knowledge, we know all the spectra looks like and where the real peaks are located and we can try to iterate and try to improve whether we can improve detecting more peaks because we know where which range located the real peaks are the range probably just noise. So we can redo the whole process. We scan the raw spectra and try to see if we can improve. So again, the command is simple and we can do it now. Okay, so one step from before, before we're doing this and move to a few peaks, we can redo the regrouping because once we are doing the retention time, we can try to do a regroup. So here's a parameter I put it here. So this is just for illustration. When we group peaks, we can give them some default. It is dirty as I mentioned but we can give them a tighter range if we know better. But in some cases, it changes sometimes, sometimes it did not. So this is another it's as a BWS widths of this on a mass. It's B4 is 13 but we give them 10 and see if it changes. But in this case, it doesn't change. So it is a parameter just, you don't have to this is a parameter just for each one. Some cases when in certain spectrum and you do see you can detect more groups. So this is a follow to your question. So your group, this peaks with different beams but once we align them, the group can be tighter. You can really give them a tighter range. But in this case, we still get the same. So we really have a very neat group. So that's one parameter you can try. Same. You can try. The purpose is to give you what's the parameters available. So that one sometimes changes result. Yeah. But we can see it's more or less same. Same numbers. Now we do a few peaks as I just mentioned. We try to do it again. So when we do a few peaks, I try to redetect new features and write it down. But you can see here some warnings and you can see what's inside of the warnings. So it tells you some things out of range because we are trying to align different specters. Some specters probably have a long retention time and some are short. So it's not totally the same. So it complains. We can just ignore them. So what did you do here? So these are a few peaks. But after a few peaks you get them warnings and in this case you want to see what some warnings tell you. And these warnings basically tell you try to fill the peaks in certain areas but you just don't have no value. So basically you can ignore because that's where we try to get all the specters. Some is longer, some is short. So that's unavoidable. No group information? First, do this one. First, and do this one. At this moment you're right. Yeah, don't go through this. I think I showed that. I group it, fill the peaks. This one is you try to go through this but through this you already overwrite it. So the group you make mind. So you say this one has been overwrite but you cannot go through it unless you say the difference. So what you try to do is not support it. You already over that steps. That's not on the screen. That's not on the screen. Okay. Yeah. BW is just for range or mass. You want to specify for the grouping when you try to align the peaks. Okay. Help group. Try to use command. Help, help group. Okay. If you really want to find each one, select a parameter what it means. Do this command and you have a list that will help you mission. For what function doing, what parameter available. Yeah, doing this. Yeah, select which you want. Just always follow this. You see. It is a group. Yep. It's a group. Yeah, we have a group. You can see it. You guess the reasons you should be installed. You should get the power, you can see it. I'm trying to make it a bit easier to do. So you've got the kind of thing you do, but I can't do anything about it. So I'm going to do an ID. You just go to the question mark, and I look to the error. Yeah, I'm just going to do a little bit. Yeah. Yeah, this is the one. And then I want to do the same thing. And then the next thing we want to do is do a kind of event into the other percentage of the team. So you're doing that again, so you do the same thing. The same idea, you're doing the same thing. So if we do that group, and, yeah, what we call this group, is it going to be called group? Yeah, if you, Yeah. If you do that, it looks like that's just a method that you're thinking of. If you use group and group type in a method, it's a dot for the density of the whole person. This is an advanced one. I don't want to talk about this because I have a very special feature build object, how they call which method. So this is all connected by different dots. This is a density one, but you call a group and don't give it anything, you call the first one. This density is most commonly used. You can call group density to do things if you really know what you're doing, but otherwise the system will give you the best one. So that's it. From here, and from here, and don't do much. If you really want to see the problem, don't override it with x and g. x and g, you call things, so you don't override it. Oh, okay. Because you can't see the name? Because it's changed, it's gone, right? Yes, it's changed. So we aligned the peaks and we group in them and we fill the peaks. And we see some warnings, but we can simply ignore them because that's unavoidable in this case. Now we see the result and people just keep asking what the data looks like, what we got. So internally, I was organizing different lists of the things. We always want to look like a table, so fill peaks and here after that. So we get all the long list of peaks. We have all the information. We have the MZ, MZ, mean or max, basically mean or max based on the whole samples because some MZ default is the median value or mean value. MZ mean or MZ max is within this group which is lowest which highest. Same for the retention time, retention time, mean and max, and peak intensity area. This is the things we try to compare. So in order to compare, we want to have the same MZ retention time. We try to align and change the label to this one. But within the sample, actually there are certain ranges. That's why we want to do retention time correction and re-labeling with the same number within this group. So now we can really see what's inside and we are going back. So here we see the follow mean, this one. We got the peaks. We want to see the top 10 and we are going to see the same. Follow me typing like this. You see the peaks. So we use peaks. A peaks is a function name and a follow function name as your input data. In this case, it's x as you would process before. And we want to see the first 10 rows. If you really want to see all the peak files, for example, you want to see all my peaks. And you can save it to a new object. So you see the peaks. So if you're doing this, my peaks will save all your peaks. This is just an assignment to a new object if you're really interested in peaks. But these peaks, as I mentioned, are not organized well into a table for statistical analysis. But just for you to see these peaks, this is for the sample one, you can see. This is a really long sample for each sample, which peaks. It's just several thousand long ones. So if everybody is okay, you can get your peaks and see what's inside that object. So we can, at this step, we can either just using some built-in t-test, built-in XCMS, try to do some analysis. Or the other one is saving a table. We can upload to MetaVanList, which we're going to cover next. So I'm going to show you how to do internal t-statistics and find some important peaks. So it is in the script. It's in the lower. No, I was saying the last part you did, the peaks. Oh, yeah. Sorry, peaks is just for visualizing. It's not useful. Okay. You can write it down. So at this moment, this XSG contains all the information we need. So we can either save it as a table, which is going to be super for MetaVanList to analysis. We can do this or doing a t-test. For the clarity, we just follow what's in the script. So we can first do this save as a table. You copy and paste this file. Copy, paste this command. Again, this one, a group of values. You get all these group values. Previously, this peaks is not really compatible for statistical analysis. We need group values. So this is a command to get group values. Yes. So there are several parameters. Intensity always means original. There are several other media retention time. Using media retention time, use original intensity to specify. If you have doubts, help search these group values. There are several other options. But this is related to the work. Now we have all these group values inside the data. And we can see what's inside this data. This will be 2, 4. Now this will be the data table we are going to work with. This is data table on the column. It's different samples. KOC15, 16, and wild type. So it's 12 samples. We're having each column. And on each row is the peaks, which is identified by their media retention time. And the mass media retention time, this intensity is the original intensity peak area. So we gave it. And this is the table. And what we're interested in is for each of these peaks, we compare their intensities. Some of them probably are higher or lower in different groups. So this is a peak area we try to compare. So that's... When you talk about the original intensity, is that our going back to the original raw spectrum intensity before it works online? Yeah, the raw original intensity is also kind of peak height. So for each peak, and people just think what's the representation of the... You always want the peak representing the connotation. So if the peak height, the original intensity is normalized. Or there's different calculations of how you perceive what's the best representation of these connotations. So the original that you're talking about, an original intensity just means you're choosing not to do any transformations. Exactly. Yeah, yeah. So it already had this information. When you go to group value, they have several options. They already have this building. You just pick this one. And there's a maximum peak intensity. So there's several options. You can see some of the values very different. So for me, it's original, it's more comfortable, unless you have strong reason to use the other values. So you can see this group value, see how other options... What other people think about it? So this is... Now we have our data. And this data is super for statistical analysis and visualization, because it's organized as peaks and samples. So this is always unit. So now we are going to... This step, we want to save it to... Save it as a table so we can upload to Metabolist or other tools. So this is this command. But before we go to that command, let's look at what this guy contains. Okay? So XMS has kind of its object. When they read in, they will read all these group labels, basically the KOL stuff. So if you see this, you see this label is a knockout, a wild type. And so this is... We try to incorporate it into this table so we can tell statistical software where the sample is located. Which sample belongs. So which group it belongs. Is it in labels? Yeah. I'm going to talk about that in a moment. Labels means you compare two groups. And which group, one group to control disease, right? You always say... So far this is a sample. So which two groups you want to compare, right? You need to let the system knows. This group try to do that. And you understand it in... Now we try to insert this group labels into our dataset. Now we... Okay. It's getting longer. So if we do the same command, we can see we have actual label on top. It's group. It's knockout. A knockout first. And we have wild type. So this is a label. Because when you upload to a... They really expect you to give them which group is to come from. Yeah. So... In... Let's do this again. Let's save it. And I will tell you more. And it's more data visualized from Excel. Okay. We save it to your current folder. And we give a name called mypictable.csv. csv is a comma-separated value. So each value is separated comma. So we save it. And okay. I want to see where I am. And I'm in my current folder. And I can see where I am. And mypictable. I have mypictable here. And you can open it from Excel. And you can see what it looks like. So this group label, you can really manually enter. So you don't need to use a command if you don't... So you know which sample is from. You can really manually add in this table. So it's... The goal is try to... You need to tell which sample from control or treatment. That's... Yes. You can manually insert it. Open from Excel. You inserted that row. So that's the goal. Any questions, Sova? So again, this is one visualized from... It's on the slide. So this is what you get. And this is one for statistical analysis. You have the first row is a label for different samples. And the each... The row is peaks. You have MZ. And retaining time. And at this moment, and this was suitable for a lot of statistical analysis. Yeah. You've got some zeros in there. Yeah. And the rest of the... Each of the columns are there. Yeah. So... We can visualize the peak. That's what we are going to discuss. And some of them really have no peaks there. And we are confident it's zero. I mean, that's... So, okay. And so, as I mentioned, XCMS also have a build-in called a... Diff report. Actually use a t-test to do the things. And we can do some very simple analysis here. And more advanced is for metabolism analysts. So we can do a simple analysis and see significant peaks. We can visualize the peaks. Okay. This is what we try to do next. Okay. So copy and paste this command. Diff report. And the result is a table and sorted according to p-values. We can see the top four. Okay. This is a very... Yeah. Because you are reloaded. When you install, you are loaded before. Okay. I didn't install... So this is fine. You are already in your memory. If it's not there, it will complains. So the table is telling us the top four lines. Yeah. The top four lines are basically ranked by the p-values. If we see the p-values here. And this is p-values. So it's the top most significant peaks. And the followed by the... Yeah. You can see p-value increase. And for each peak, there are t-statistics and a fold change. And mz, m retention time. So here is a number of peaks detected. In this case, it's 12. Basically, this peak shows up in 12 samples. And here, we've only seen the peaks show up in seven samples. So in five samples, that means they're going to have a value zero. So that's... p-value is in the first... It's too long. And here, p-value. Yeah. And really summarizing the peaks. And this is 12 with 6, 6. And for this one, it's interesting. We only see these peaks in seven samples. Actually, majority in the knockout. And the wild type, we don't have it. So it could be a knockout specific. And this peak. And it's very interesting to see. And we're going to visualize that. We'll select it and we'll visualize it later. But we just see the number. We can see some interesting stuff here. So... Okay. In order to see... plot these peaks, we need to get this group stuff. So you group this one. We want to see what's inside. So for each peaks, it's summarizing... Get the groups of peaks and summarizing whether it's how many have been showed up and which groups it's in. It's knockout 3 and wild type 5. Now, the goal... Why are we doing this? It's because we want to plot it. Extracted eye... Oh, sorry. Extracted eye chromatography. So, okay. Here's the command we're going to use. We copy and paste. Now, this command, what it means is... We want to have a retention time more than 2,600 by less than 2,700. And it will show up in at least eight samples. So you can have it at least, say, equals 12. So you want to show up in all 12. But see... So this is... Here is the US selecting range. So what... Returning time range you want to focus on. Let's just do this first. Then we can really select some other peaks based on what we saw previously. See what's the main difference. So we got these groups and we see what's inside here. And this is all the peak index. So we have all these peaks show up in at least eight. So this one probably... So what's the... That this output is different? Yeah. Probably I'm going to give one more correction. Wouldn't I? Okay, you got... Oh, yeah, yeah. So in between I showed one, probably I did the binging correction a little different. But let's visualize it. So the goal is to visualize the data. So we are visualizing these peaks across different groups. So we are... For example, here I specify one because we have this group index. We know we have about 12 groups of peaks. We just want to see the first one. So now we extracted the ion chromatography. This should show up, I think. Sorry, I haven't plotted yet. So we extracted again. We extracted... Now we copy and paste the second one. So now we see these peaks across different samples. This peak is aligned. You see the center always aligned nicely. And it colors according to different groups. So this is... Let me explain this. This is a basic R command. Usually people... How do you color your... Color your graphics according to class labels? What you're going to do is... Is this... You have your labels and you converted it to a numeric because one... One, two... Let's see the numbers. So what do you get? The numbers one and two. Basically one means black, two means red. So you really want to color them differently. So this is a very basic R... Kind of coloring scheme. I assume you guys are fine with this one. See, this one is already... Because we use XMS, it's already added there. Oh, yeah, yeah. I see. You can question a lot. Yeah, I'm not sure. Legend... Legend... I... Yeah, I... Just help Legend, okay? You can add a Legend here. And just there. It's an easy way to figure out how to add a Legend. Okay. I did... Yeah, I'm not seeing... Yeah, I... Thank you, sir. What do you mean? Legend, something... Yeah, the plot is Legend. Sorry, the name is Legend. You just need to specify with... Legend, for example, and just think about... Yeah, I need... What's the name? What's the one you put there? I couldn't do it out on the mind. I also need to see... What's the grammar? I'm the same as you guys. Question about the Legend. Yeah, but my computer don't have... Yeah. So, I assume Legend is... And where you located. So, I'm not going to use it. So, yeah, let's see. Let's see. Let's see what happened because you asked me something. It's... I don't have it in my mind. Let's see what's the error now. Let me see with no default. Yeah. Let's see. Let's see. See? I added Legend. See? Legend, top. Okay. And secondly, I gave you a label as the KLWild type. And the field is one or two. It's... Basically, one or two is... As I showed here, one is black and two is red. So, you can specify in the bottom of the right. So, you can add Legend. Okay. But again, this is very basic. So, when you add it to right, it's here. So, there's a lot of fancy ways advanced options. And, yeah, I... search it, and you always find some good suggestions. Actually, I actually... Oh, yeah, sure. Because I'm doing a lot of stuff, sometimes I'm just... So, you need to just a little bit working with Aranya, you know. What I meant was plot, command, and all. I'm surprised it doesn't have a Legend but you just send it as a parameter. That's a very good suggestion. It seems like... And I think they have default is X label, Y label, and the title. But I'm not sure Legendary included. I did a question about doing the answer quickly. Yeah, plot is so complicated. And, yeah, you just choose Legend. The other one is par is a parameter, but it's just a long list of things you can customize, get to whatever you want, but it's... Yeah. But it's so flexible it became almost useless if you search plot. There's so many options. So, I know that's a very good... I totally agree. You got a very good... Yeah, yeah. Yeah. Okay, now we're doing another one. We just explore our data. We saw some peaks as significant based on the simple t-test and we really want to see how it's different because before I showed you this plot and you see some is high, some is low. It's not really so different between two groups. Now we want to see something whether it's this top peaks is really different. And we want to see a select based on retention time again. We have this first one is 3, 3, 9, 1. 3, 3... Oh, okay. No, no, no. Let's see this range. Me is 3, 3, 8, 2 high 3, 3, 9, 6. Okay. 3, 3, 8, 2. Okay. 3, 8, 3, 9, 7, okay. Now we just go back to the previous previous command we issued. 3, 3, 9, 7. I assume I remember it. 3, 3, 8, 2. 3, 3, 18, 1. So, now because we saw that peaks are significant, we want to select and select and visualize it. Okay. Now we'll get a different group index. You changed the value. So the value is 3, 3, 8, 2. Yeah. Yeah. You will be slightly different, but when we plot and we are selecting the same thing, they should be different. So, now we get this extracted ion chromatography from also the first one, or you can try the second one, but then just use different. Now we plot it. Where is plot function? Okay. Here is plot function. Go back and plot it. And you see the difference? Now, this peak is really common and you see only the black, I think, not caught. And it's very low expressed in the other one. But it's still there. So this peak is detected in 12 samples. So all these peaks are real. And but in one group is significantly higher. The other one is lower. So this really gives you the confidence. This peaks more likely to be real peaks rather artifacts. So here's a command I used earlier. Yeah. Is it lower? I think this one is I select this range is too high. It's too high, okay? I select something is probably internally belong to so even you got a slightly different index but you plot, you see the same graph, right? Okay. Yeah. Because I so so far have any questions. So that's the main reasons we want to use XMS. We can interactively interrogate this data. Really want to see the peaks. We found interesting there. And if you use that meta analyst or use even XMS online and you just it doesn't allow you basically whatever peaks is peaks and you just get this. So that's why we want to learn this. So the peaks are not supposed to be perfectly aligned there? Oh yeah, it should be. But I think sometime it's done doing a job at one. It's ideal. So in the best scenario, we should see all the peaks at XMS over them together, right? Yes. The position wise but height will be different. So the previous one looks nicer, right? Yeah. Yeah, it depends some. Yeah. Your criteria you have got the third criteria for the number of peaks has to be greater than eight. Yeah. You can ask, you can get them. Okay. You have to show up or have to show up in at least in all 12. Because you know you have 12 samples. You want to appear in all 12. Yeah. So is it not being a peak? No. No. Zero is no peak. Basically this is what it is for. You've got some. That's informative. Yeah. Okay. We'll do another one. So follow what you suggested. So I select the first one and first one actually show up in here. You see 12? So it show up all of them. In five, it's not there. So we'll see what it looks like. So I'm in the other wild type only have one being detected. Only in one being detected. See, I assume the other five is all zero value. So it's no peaks there. Let's visualize it. So how to visualize this, we need to Yeah. This is same retention time. Different mass. Okay. Here. Can we select directly in this one? Okay. So this is unique. Let's try this one. Okay. We want to select the retention medium medium retention index equals to this value. Because this is different from first one. This is medium retention time. So because when I select it here you can immediately see they are the same. So you are selecting it's hard to tell from the first one. Let's try this one. This one seems unique. We just experiment, okay. Sorry, I don't quite get it. Yeah. Because you see here you get the same values. Ah. So retention time alone sometimes kind of tell the difference between two peaks or what I'm saying that's probably not aligned. So here you see the retention time. Another dimension is the mass. So mass MZ is a slightly different between this. So we can select peaks in based on mass and based on retention time or based on just single one. So now I think the reason is because they belong to different peaks that's not aligned. They're even the same retention time but you can see the different there's two different peaks at least from here. They have the same retention time here but they are two different peaks, right? This is 300, 300 or what? Yeah. So that's they're not perfect aligned because at this range there's multiple peaks and that's all real and they're not supposed to be aligned. Yeah. So to concentrate on that same one you then you're in medium retention time and the mass Yeah, you can do that. Basically we see this, we can really refine our query. We want to see the medium retention time and medium mass equal to this one. We can all do this. Let's try this. I just need to remember these parameter names but we do see medium mz equals Now we're doing mz We try to see if we can get equals but this is kind of dangerous because r shows up some values, it's have some more digits and if we come up here it's not exact so it's a range is better right? So it could be let's go back to the let's use this one m that copy So now we have this one and we have mz max mz max We want to find mz max here and this is the table Let's try this time we just try to use mz values we saw from that second peak and then we hope to find them and if we couldn't find it unique we can further add this retention time we just get more conditions and see we hope to get a unique peak because we only see the top 4 and potentially some other peak have the exact same mass there so it's we can say it if you're not probably aligned that means contaminated we get about 5 a bit higher I'm not sure 12 not sure let's see what's here oh right mz they do have all these things and okay you want to I thought you wanted to run instead of 1,949 so we didn't have a criteria we didn't get a hit so that's interesting mz let's see oh we do get 1 this time just get 1 okay oh I just relaxed that criteria because before we gave it very stringent we just got no hits so it's one time you just lower the threshold and increase and see what you get so again you try to extract it plot it yeah well so this is when we extract it so once we see the peaks important and we know they are mass range and retaining time range and we can really query our whole peak list table based on this range to visualize it so here I show you how to use mz and mz range from main to max and we can always here let's just here and you can keep adding and gt group with retaining time main again over see 2000 okay and group less than see 2 so this is basically I'm giving some random numbers but the point is you can really select the peaks try to narrow down your range and really make sure you select the right peak because so far it is even combine those retaining time and mz probably you still have duplicate in that case you probably just really hard to tell but hopefully it will combine all of them you can really get this peak and visualize it actually you can you see it it's going to range your plot and you see it it's a question so far yeah yeah so so new see um that's good see you are you can copy post on post on the right you copy this image and generally you can put in there one by one you can do that this is basic thing intuitive right you copy this image and if you open uh uh new presentation no no no I do have I do have but if you I mean I do have also so they are okay you can paste this now you see here right you can really do see this more um easy but if you really want to learn the programming ways you can always do it and I can show you some basic come on to that title yes yeah this is very basic one I didn't cover it because I assume yeah the title equals your name so now uh if you want to plot two side by side I'm for role C one two okay I'm not going to explain just plot then I'm going to tell you what it means and uh first you close this one and you're doing this okay there is a graphical parameter okay then you plot random plot okay one two three okay oh I'm for role uh I'm for oh sorry I'm for role okay no um no I'm after all okay let's let's try this seems this one I'm okay okay good thanks yeah now uh yeah let's uh okay I'm going to redo it again so I'm after all okay um yeah it seems okay I'm after all um then you try to plot now you plot see it's I'm up so if you want to um so if you now you get the side by side if you really want to uh two rows this is one one row if you want two rows but it's vertically aligned as you can do this I think and let's uh try it oh okay let's close it and uh try it again sorry and uh yeah you see it's vertical so you can and you can really have a actually a matrix if you really give this as a two by two right you have four plot one two three so so as this one you can do almost anything you just need to spend time uh get this low level command and coloring and tell where to label things so it's it's uh just in yeah and and google it and google it it's really people any questions?