 Good afternoon everyone and hopefully you guys are going to keep awake because it's afternoon and people can go sleepy and I just had some coffee so there's coffee if you need. And so afternoon is our lab on the type analyst and first I will go over some our lab what we are going to do and you are going to follow whatever in your tutorial and one comment I want to see is that my type analyst is very comprehensive developing past 10 years and it has about almost 1000 different functions. In this morning we just touched very some specific point that's very commonly used but seems more advanced so we didn't really go to the very basics like t-test and over but all this basic stats you it's all there so it's all written tutorial and all available on various kind of interface stuff so if you really think some basic stats you want to do it's not available in my type analyst you have to ask somebody to help which probably not the case and it's all there and so let's just go through the tutorial and then you can play around if you have your own data and I'm more than happy to see you to discuss with you what's the best strategies what make more sense in some situations and so we already talked all the basic concept in doing omics data analysis so my type analyst actually developed just around this concept from the data input basically integrity check whether your data is probably labeled and have enough replicates and how much missing values so this is important to to get through now after that you're going to do some quality checking basically outlier detections and whether something abnormal and deserves you further going down and normalization after that in the final doing stats so and here's a typical experimental design for metabolomics and usually you need a lot of biological replicates so the minimal in my type analyst three I think there's three and so for you in metabolomics I don't really think there's a big issue people usually have at least five years so if you have just the three or five permutation has probably not going to work in well because they're going to soon exhaust all the possible combinations and there's also technical replicates especially if you do an untargeted metabolomics and you're going to have some issues and variations so usually people doing that triplicates and doing that you use the average and to help reduce this but if you want to do that my type analyst doesn't handle these triplicates because it's originally designed for targeted so if you're doing the triplicates you're better to average them yourself okay unless there's a strong reason so many people doing that I probably adding extra support for the average technical replicates so far it's not there yet and here's a they've already mentioned about target on target so the whole idea is that you can actually doing that spend a lot of effort doing this identification quantification then you you're going to upload your data and doing all this statistical analysis functional analysis and if you're doing on target most likely you're going to do a main list on target stats because functional analysis mostly rely the metabolized IDs so if you don't have metabolized IDs and it's very hard to see what's the functions and a new version in the latest version actually have one called the mommy chalk if you have the mass peak list it can actually try to infer what's the most likely path within impacted so this is one effort we try to address for on target metabolomics so this is a comparison of the all the workflows we're going to do so the you can see all of them share the integrity check and but they use the main differences that target analysis is doing the identification is at the second step and this on target is usually last one you're doing the features and the final signal features and doing a target in that doing a common identification this is outside of metabolism analyst because it does not it does not link to any common database specter database at the moment so here is that data integrity check and a lot of times this is but it's mainly as for on target metabolomics for target metabolomics you there's a good because you manually curate it and for on target metabolomics unfortunately especially LCMS peaks yesterday you get from XCMS online it still contains a lot of noise so if you kind of sometimes it's multi peaks going to be constantly pointed out to the same compounds and if you're doing this mommy chalk in metabolism and as you can see one particular compounds can have multi hits which is actually good to increase your confidence most likely it's common there but what it says is noise in on target is much higher so it's always it's better for you to just based on equipment based on what you believe to further remove it before you upload to metal anise so this is the aspect alignment which we are for LCMS peaks if you align you use XCMS you can just upload that peak a list of table if you haven't done that you've just that peak peaks and save the peak file so each sample one file and you can upload to metabolism analyst will align the peaks just like XMS online so this is also fun so here's a banyan if you do on target using XMR and a lot of the machine and at least if you use a chemo matrix and they have this banyan support so all you need to do is do an aspect of banyan and specify what's the bans width so 0.04 ppm usually people use this and you basically chop up the 14 ppm arrange into multiple small features and now you analyze these features and basically like peaks and this is for NMR based on target metabolomics and after you're doing this other stuff and you do the data normalization scaling and this is a very important especially for example you're doing this urine and urine is a kind of common biofluid but the urine can have these different dilution factors if you drink a lot of water and your metabolite concentration going to be very low and when you compare this different dilution and you should think about how do you adjust this for example in this case you can see one is very high one is very low but overall pattern there if you normalize them more likely they will be more similar rather than just look at compared to raw so this normalization and scaling is really based on your data what type of data what's the experimental conditions and it's all different cases and you can do a better job by just think about these situations and people want that automatically try to address it this is very hard always this after we discuss and we found out hey this one makes best sense so this is you need to spend your time because you spend three months or one year to apply for funding and get do the experiment finally go into a data analysis you only want to spend a minimal time which is just doesn't make any sense think more and try different ways and read literature what's the best way it's the best so I think so metabolism is the basically one of the initial reasons why we design metabolism is that help people doing a state of process normalization at the beginning and a state statistical analysis so so far it contains about 12 or 13 different normalization methods as I mentioned we potentially adding more but really based on how we how user requests how we feel whether it's deserve it a lot of time it's very special cases and we introduce they are put in the current workflow sometimes it's done really fitting into that whole scheme and but so far we already have a lot if you different combination going to be very very high already and after we doing a QC outlier removal and data reduction and this is another thing you should consider but as I mentioned if you do this basically you need to have a good justification also don't try to remove your data features based on supervised method if you do a t-test and do a piece of USDA if you found something that working well then remove it basically you're going to increase the chance of overfitting your data and you get some conclude that seems too good and that's usually the reality is you better in a clinical sample a lot of noises so it's just that's that's common so and and we are going to show you our tutorial what's the outlier look like extreme case a lot of time is it's not so obvious and it's up to the specific design or your experiment to think about where this looks you know not what you expected okay and so metabolism has been been there for about 10 years and it's version currently version four we just published two or three weeks ago and it's very comprehensive and it started from our target and to gradually supporting more untacted and now it's more integration with other omics so there's a there's a clear trend also we spend a lot of effort to make the interface more interactive and intuitive it just to supporting all the users so a brief history of metabolism this is a much earlier compared to xcms online so xm online published 2012 and we were there 2009 actually version one already have xcms built in there we use that to do the spectral analysis it just we don't have a powerful computing stuff to gradually we drop it and now we just basically allow people if you want to do the raw spectral analysis you just go to xm online because they have a very powerful computer and through the years we are adding this from stats to more functions and here's basically we put move to the cloud in version three and have much significant improvement performance so actually when we see the user trend it's about 10 times suddenly it's from hundreds to thousands increase and the version four is basically we just keep adding more support for integration also we're adding a package called metabolites are and because some people are very advanced they want to use a metabolites feature to in their pipelines to do more analysis and the report generation and this is all fine and so we are very happy to see users become so using the tools also can actually contribute to actually develop more pipelines and beyond metabolites we interface graphical interface is flexible but it's not as flexible if you can program yourself that's that's out here made freedom if you can do it so now i'm i'm going through the basically the main tutorials and you're going to explore after this so over pictures of this metabolites this it accept mainly list basically metabolites names of features pick list and concentrations of peak list the peak tables and in all spectral beams so it actually if you download it install locally you can install metabolites locally you can also upload the raw spectrum the so far as public we don't support it in the public because if we some people upload the raw spectrum and then nobody can do a more perform analysis so we just disable at least for the public for our tutorial and if we upload the target we need to be mapping it's mainly to the hMDB database all the names need to be mapped then for the peak and if you upload the peaks we are doing peak alignment if it's already aligned it's going to be all fine then integrate check is the same central point if everything's fine we do data normalization and sometimes if your data have a lot of noise and you can do some missing value in your estimation or data filtering and for biomarkers you can do some ratio steps so it do contain some optional steps it's really the dependent features but if your data is relatively good quality you can all street choose a normalizing like a log and all the way to here knowing do a statistical analysis which is basically all these monies course about details and all the PCA, PSDA, all the heat map clustering so and we this also we didn't discuss its enrichment analysis basically which part was enriched and also which part was how the past was affected and power analysis if you are planning a metabolomics experiment you want to see how many samples I need to collect and this is really you need to have a pilot study and see what's the what's it looks like and based on the pilot study you can calculate for your study what's the how many samples required and another one biomark analysis is very very popular use that people especially do some clinical studies they really want to find biomarkers and I want to want to emphasize that you need to always know your data first so statistical analysis always first and then understand the data you're doing a biomark analysis and the other part is less but it's a slightly advanced time series two-fact analysis so if your data actually contains time series and you need to try to use this module because it's specifically considered orders and try to detect the response across time so this is more suitable or two two-fact analysis basically two-way ANOVA you can think about that so here's a new feature has been added MS PIX to a pathway this is if you're familiar with XMS and the in XMS they called MamiChalk so this is almost same as MamiChalk but it's more parameter for you to tweak more libraries for you to select if you see the result interactive result in the CAD pathways is unbeatable compared to XMS so we have a much better presentation on MamiChalk and we also have the biomark meta-analysis if you think about XMS online yesterday they have a meta XMS and this is almost similar ways but it's a statistically more advanced okay so this is you combine you have your study like your healthy versus conjecture and another study in the States also have healthy versus conjecture if you if they have the similar platforms or similar conditions you can actually combine them see if the same biomarkers increase the power increase the robust of your biomarker so it's called the meta-analysis and the last part is network explorer network explorer is basically a tool you give the compounds or genes or disease and then explore what we know about this disease what we know about the compounds or genes relate to this metabolic stuff this is all based on the latest version of HMDB HMDB is very very comprehensive but if you go through it and the database and click through it it takes your time but if you drop into this X and network explorer you can actually link oh this compound that connect to this genes and connect this pathway sometime multiple pathway point to same same compounds and they really visually summarize a lot of data from there you can develop your hypothesis and go to particular pages in HMDB and find out find more about it so network explorer actually is put just condense a lot of this HMDB knowledge based into a network based X viewer so you can just interact and help you help you find some interesting hypothesis okay so that's being said and so this is all this modules and put it and try to put it in a in one page you can see it so basically click anywhere it will start the corresponding modules and all them share the same flow it's a processing normalization analysis and interpretation okay this is it and I'm just to quickly walk you through this and I'm not sure you're good if you want to follow this okay but if I think you I'll quickly go through it and you probably can do it later and because I'm probably going to mention some point if you you don't want to miss so here's that you can have your own data but we have the building data and here's that this example data is the first one is four groups based on the cow also I believe the cow rumen and it's simple so it's four groups and it's just a different each different concentration of grain and see what's the change of metabolic level so using that data you download the upload and and the what's the data here is that it's a constant table and from targeting metabolomics and here that below is that you can see if you're on target you can pick list you can pick generate from your own machine specific software and this is from the spectra beam from RMR and raw spectra is disabled I'm just there so it's just I don't want a silver crush well to the demo and if we click here and this is stats as you should always starting from here and you upload your data one thing you need to make sure that samples can be in the rows and come in columns unlike a gene expression data almost the samples always in the column because a gene expression 22,000 is it's almost impossible to it is so wide it's always the long but metabolomics is not that you can be you can be transposed in different ways so just make sure your data is like that now you your the target is always concentrations and you select this and upload the sample and submit okay and alternatively you don't need to upload just click click here because this is always built in and and this data is from cow room and collect from RMR and the hypothesis is that they have high grains feed is going to introduce stress to the cows and so it's not too healthy you want to see what's optimal concentrations you cannot be too high or too so because targeted metabolomics usually high quality you basically click submit you can see most likely there's no missing values everything is fine and and you don't need to do the missing value imputation sometimes if you have a only small portion of missing value just let the default default will be a very small value it replace your data so that's all fun and so for here is that oh how do you normalize your data and one is a sample normalization so this what's the best way there's always a case by case and here we said that we want to do a normalization by for the sample from group so we sometimes you can choose a particular sample this is a probability closer normalization if you heard about this this is a they think for metabolomics it's very useful and if you extend on that not from particular reference sample you can have a particular reference group because you're not have you don't have a particular sample is most standard you can pull out all the sample average from that group so this is basically what we choose here so you have four groups zero is your reference to time point you don't have zero percent of grains and 15, 30, 35 or 45 so what we do is choose group zero and do use their average at the reference so everything against them normalized okay so this seems reasonable as if you have a better alternative that's all fun it's nothing this is correct and that was not okay and on that we choose a auto scale so this is so you can you can see what the result looks like i'm not if you're doing that and you can click and below there's going to be a visualization see what's before and after okay at this point the data being transformed to a matrix and now it's samples in rows and variables in columns and and this one is you know we in the first one we just said is a row wise normalization really make the samples comparables so especially for the urans for these rumors i saw flood is more or less more well homeostasis is controlled very well but the uran and alfalfluid are tissue extractions you really need to think about it if you're put more mad tissue weight or volumes stuff could could cause some variations this variation you should address it that's because it's not biological variation so this is the first one and column wise in modernization is basically and try to make all the features more normal physical this is mainly for the log on the stuff and the scaling is basically applied to you can apply scaling to almost all the features our samples and we we here commonly use the kind of log transformation auto scaling Pareto scaling as our range scaling so this is that i mentioned about that paper and material is actually supporting most of them actually seems useful we all put it there so my recommendation is like auto scaling Pareto scaling you should try log transformation you should be your first to try because most time we found that concentration in blood like metabolic concentration is a log normal so if you log most likely it became very normal so it's your preferred method so this is after that you can see the method that the data became more normal and it seems okay and the point on this is that there's no guarantee if your data like this normal going to be multivariate multivariate is still normal so there's no such things to test that so it's the individual looks normal but overall it will be normal most likely it'll be better than not normal okay we can see that so at least when you do your t-test it's going to be more robust during pca i think it's making more sense because you reduce the variance of the very high values so make it more comparable i think that's most likely you want this a lot of times you can do your analysis with just a raw no normalization you can compare and back and forth a lot of time you'll see normalizing giving more biological meaningful result so this is a metabolize this interactive allow you to change and think about and change your parameter and go back and visualize and and see that that's better so this is something that people want it to be hard to code it said give me a one solution and going to work all the time which is just the cannot we cannot do it because we just cannot find that the best way working for all type of data which is uh i think that's fine in research because you want to have certain freedom and you have certain hypotheses you want to test it when it says become fixed and it became as not that useful so after normalization and you need to see there are some outliers and something abnormal or stuff so this is important especially for the untarget or very high throughput large large-scale project they probably have very strong batch effect have certain outliers and it is really really important the most time if i'm being in my experience if you have three months working with the data probably at least half of time is dealing with understand the data design and find the normalization and find the data is actually good quality and another half doing actually doing the statistical analysis so that quality tracking and processing is spent most of your time which is not that fun but it's so critical before you come with your data you shouldn't really try to jump to analysis and get some biological conclusion which is too dangerous so this is uh we used this on target from the cloud data so you it's it's a it's fun it's no outlier and so to do this actually we artificially create an outlier this is something if you don't do normalization i think in the tutorial you're going to tell you if you don't do normalization do some things and you're going to have some outlier very standing out and in this case if you do a pca you can see one particular sample is standing out here and the majority is clustering here if you see that pattern like this it's clearly outlier you this is basically something must be happening with this sample and you can actually see how it looks like is that if you go to the hit map hit map is also very intuitive uh to see the reason that all the contributions for this sample is very very high so it's it is quite it's biologically almost impossible so something happened to this sample and you need to check with your technician who conducted the experiment and it's better to exclude them so just this particular sample i don't think normalization will address them sufficiently and sometime this is not that obvious especially you have a more close and you're not sure and it's so you need to back in the force and you don't want to exclude too many samples but the case like this is very clear and so if you found some samples can you remove it with a metabolite analyst and you can do it so here is that uh um place and if you go to processing if you click that navigation bar and you're going to have that data editor and you go to data editing you can remove the samples you can remove the particular features you can remove particular groups if you have the groups like one two three four you can just want to compare like two groups for our RC curve or whatever you can do it and because some stats only work for two groups you probably have to remove them uh exclude them and i'm not talking remove them because you can exclude and later you can add it back okay all of them the same so you can find the features and remove it you don't have to go and read up log out and reupload sometime it's it takes your time so you can do it with a metabolite analyst and the other one is data filtering uh if you upload the target metabolite lomax data you don't you don't usually see this one because this is a target one is usually noise much less noise if you upload spectra beans and peak list you're going to see this see this page and this page make a lot of people uncomfortable because why should they reduce my data why should they filter it because they spend a lot of time collecting like see eight thousand peaks and they really want to analyze them as it is and they don't want to any further filtering and they just want to analyze as it is the point is that if you do a good data filtering you actually increase your chance of finding significant the features that's more meaningful that's better the reason behind this is all omics experiment the assumptions that only like see 20 percent or 5 percent minority group it changed majority going to stay the same okay this is all omics analysis you are not talking about the apple versus banana you're actually talking about apple versus slightly changed a part of this side of change you only found the biological process certain pathway that's different so in that case and most features supposed to be constant and not supposed to be changed okay if that assumption is that everybody agree then if the features that don't change across all your data supposed to be removed if you don't remove and one direct consequence as a multi-testing adjustment it's really based on data if you're doing an eight thousand the testing and doing fdr the panel is very very heavy the p value had to be very significant to be still significant after fdr adjustment okay if you really filter in the data like from eight thousand to five thousand if your significant metabolism peaks or metabolism became more and with all more significant the list of features you can have more confident the pathway stuff so the data filter is better for the untargeting metabolomics is a very strongly recommended just because the the strong noise and the assumption really is true you and as long ago you believe that only a few pathways like see 20% of features going to change majority don't change and you should use the filter and so by default they have this kind of 5% or 12 or 20% and I think it's addressable so you can remove these features and data change this is based on the variance so it doesn't change the other one is based on abundance abundance is out here so abundance is something is a very low abundance you cannot marry them reliably so it's sometimes you we want to remove that okay yeah I think it's shared the same thing if doing a peak picking if you're doing a peak picking and you use that abundance and already filtered I don't think you you can ignore that you don't use abundance filtering you think of you just use the variance filtering right and so the other one is that low repeatability because certain metabolites or you just cannot marry them reliably and you can against this as a QC or technical batch if you have a triplicate in your technical replicacy and you see certain peaks buried so much even as you know it's technical replicates and you know this kind of compounds are not amenable to this particular platform in that case all the biological conclusion is not reliable based on that feature and you should remove it but this one you need to have a QC okay and of course if you have technical replicates you can see it very clear so data reduction statistical analysis so this is a disregarding whatever your purpose this is we are talking about a very generic approach so filter on data and then overview of the data and here that you want to see what's the patterns that's that's something interesting something important there so this is a some common task so first is that identify important features this is basically you're talking about t-test ANOVA okay and identify interesting patterns you can see like head maps so PCA's and identify the difference between the phenotypes this can be more guided like it's still like you're more advanced or still the t-test fit here and classification is more supervised so PRSDA can be part of here and here we just look at ANOVA PCA PRSDA and the clustering why I'm talking about here is that the data we show here is four groups so it's ANOVA but if you go to t-test it's going to have a similar things and because you have four groups and here all this filtering t-test going to be disabled and if you want to use this you have to go to data editor remove two groups so you don't have two groups and in that case you're going to have photo change and t-test enabled so here is that you can see only ANOVA and some things that are amenable to multi groups okay so ANOVA is here that we have four groups zero percent grain and 15 13 or 14 5 percent and try to identify those metabolites that's a different between all groups or just between stuff so if we do ANOVA and we can see this address the p-values and you can set address p-values they have the post-talk analysis anybody heard about this post-talk analysis okay that's so ANOVA is basically a test in multi groups any difference between the significant difference between any two groups it's going to be significant showing up so one is you really wonder hey which two groups are different you want to see that so usually you do ANOVA and it further do test pairwise t-test almost see whether the difference between the two groups are significant and so the result is basically showing up like here each dot you can see it's based on the log minus log 10 it's a smaller p-value higher it is going to be and you can click on it and show the summary of this one and I think one of the three ppa you click on it you can see this summary box plot and you can see that it seems with the concentration decrease across the different concentration of the grains and now you want to see more details it's different groups and you can see all this kind of the compounds is listed here if you click here you will see that the graphical summary still you can see this ursio and you can see it's increasing with time and you can see the f-values and p-values and minus log 10 and fdr and here's that post-organalism, post-organalism based on which two groups are significant up so in this case and though toxin this is the first one you see them all of them this is four groups are significant but you see the second one galactose it's only a few groups are significant when you do that it's not so it's yeah only a few so you can see that like this probably this one is that this two is going to significant this control versus all the group signal is this two or this two probably not and you can see this you see only all of them against the control is significant but 15 13 or 45 groups that the three groups they are not significant you can see it's clearly in this one are you going to use the same p-value? no no no let me see what you're saying so so the first statistical test ANOVA is based on the standard based on ANOVA fdr I believe first let me see now fish LSD is based on the same p-value but this is raw p-value we don't do the adjustment because once you do the adjustment there's no well-defined you have this significantly you do another test there's no further adjustment like fdr stuff because you are you're already doing a statistical analysis and doing fdr and based on selection and you you're doing a further testment so there's no well-defined procedure how you're actually going to do the multi-test adjustment but on the other hand you want to see who's different so here is that basically we just use a use the raw 0.05 to the cutoff okay it's mainly for you to see which two groups are different yeah so it's it's statistically I don't really see any way you can adjust in this because of first to do the omics and multi-adjustment then do a post hoc and what's the best way to do that it's I don't think a statistic should be considered too much at this stage it's just for your hypothesis generation yeah so what's next and if you compare this we see which one's most similar most not and we can click these links and see how how the pairwise the compound correlations and to the common class let's see this if we do this this is on your correlation hit map so if you do this you can go to this overall hit map and see which commons are correlated with which one and you can see that people want to see the correlation value you can download it and and the p values correlation p value come down here you can regenerate here so this is basically going to give an overview about the change some people times people want to do the correlation at different time point and want to have different study what is the difference so if you want to do that I think you should try to fix the values always scale to minus one to one so it will be more comparable across when you have another study you want to see the change you want to do that and because we need one study it can be released from like minus five to five but if you force like this if you upload the next time from another study the value of color will be more comparable so this is uh something if you have questions raise your hand and this I assume this is just a very straightforward hit map see which compounds correlate which are that sometimes you clearly see certain components belong to the same pathways or some peaks belong to the same compound okay and another illustration that people want to see the image looks nice they want to vote for publication and you can click that uh uh icon here and basically you can really see the what's formative or png or svg or pdf you can see this dpi and the size can be default basically the current size but it's going to be a high resolution you can download it and so we already see it says disregarding your data your compound concentration really to the control or disease the other ones you will want to find the internal clustering some compounds are more likely to gather it seems interesting and the other ones you can you can explicitly looking for such a looking for such a correlations so this feature we call it the pattern hunter and this is mainly for you have multi groups like uh especially for time series you want to see the change increase with the time or decrease with time and you can see here that some patterns increase and sometimes change like this sometimes change like this as people tend to have a lot of theories behind it and this you find these compounds behave like this you're probably going to um help you reinforce your hypothesis stuff and this if you go click the pattern hunter you can go over here and the pattern hunt is that you can find the features that are correlated with particular features so which compounds are more associated with this one three d you can see find all the things positively correlated negative correlated and the other ones that have this specification one three one two three four basically want you to special increase with the time okay so if you are a special pattern that I want to see that uh uh linearly increase a compound concentration linearly increase with the concentration of the grain you a concentration of the grain you you in your feed basically your class your class label you want to see linearly increase and this is sort of positively correlated so the induction and glucose and this concentration are going to increase with the higher countries of the grain in the feed and in contrast is that three pp are isobuturated going to decrease so this is a negative correlation so this feature just help you to explicitly looking for size things you of course you can design patterns to see in this case you are going to see one four three two basically you are going to looking for at one sudden increase and decrease okay you can specify the patterns here to to really change to find your desired patterns and the other one pca and and pca we already discussed and basically pcs for the overall how what's the separation going to look like and I think for pca at the beginning is that you should try to look at overview the basic color pca one pca two top five pcs sometimes the pca one pca two or not naturally give you the best separation and i found some platform the first pca one actually is the batch effect or platform specific things and pca two pca three actually getting a good separation or pca one pca three is give you a good separation so don't always focus on pca one pca two and you need to go to overview and see what's the separation patterns okay and now he after you say that you can go here said oh pca one pca two actually give the best and sometimes pca one pca three give the best so you can this score plot is focused on one new and you should just starting from overview and after you get a score plot you should go to loading plot see which one drive the separation and you can see here is that there's some separations across this diagonal okay it's not a clear separation but you can see some chance so you special one with this color and one thing about this color is I'm not sure you go to process and they are going to be a color palette or color editor you can change different color you don't like this default color you can switch whatever color you like okay and now you can see the loading plot you see this same you need to look at the loading plot on the same line of the of the separation you go the same line and go here the same same things and these two and these two probably drive the separation across these sections and you will see what are these components click on it you can see the three pp and and some of this compound so it's actually confirms about your other observation yes no VIP is a it's different problem VIP is not loading VIP but VIP borrow the loading and use that kind of the waiting to adjust so VIP plot is more special design so this is the loading you can actually click this click this this table icon and you can get all the loading people you can sort it based on whatever direction absolute value okay this is two values you can see that have x and y we have you only have one values so you can have a cutoff two values hard so best the cutoff you basically showing a graph and just the circle this because at the end it's already help you doing this right yeah it's more based on the visual selection of the loading in this case it's because we have here's the one one number it's easy and this is two so but yeah I don't think there's a hard hard coded rules who's which one's best the other part is 3d and you can rotate and see it's 3d and people found it very intuitive so you can play around with that and 3d loading part and so psda and let's go to psda so you go you have done pca and we see some separation and they want to see you the psda can give you a better separation and psd also have this tab view share the similar things like pca because we want to give you a consistent view and so the one or main difference between pc and psd is that pca is try to capture most of the variance in your data variance in your data your data has your x in this morning we said psd is try to capture a covariance basically the not the data the variance in your data x must be only that portion that covariant is with y okay it's covariance so it's not the total variance of your x it's a covariance that is only a small portion of your data total variance okay so where this goes is that we have this score plot we have q square and r square people ask that we compute all of them and based on cross validation we have the vip score so this is all standard and and the vip was important and one or 1.2 it really based on the stuff and here's that you can see the good separations and and you see explain the variance and here and sometimes you can see that p component one actually don't explain more variance compared to the component two okay and people send me emails about why is that like that and the point is that you read below and this is more advanced information because it optimizes the covariance or optimizes the covariance okay this covariance and what is plot is explain the variance of the x but we optimize the covariance so some most time you should see the pc component one give more variance but a few times it is it's less but it's still normal because because of the not optimized based on variance rather than covariance just read here and also i believe there's a link to the detailed data so you basically if you happen to have this data like this and you should get into that now we talked about the evaluation of the prc model and you can see all this kind of overview 2d 3d loading and that's similar to pca and here's cross validation so how to do that we can do a live one out cross validation l o o cv and we have five components and the cross validation is help select how many components you're going to include in your table including a model that gives you a best performance and here and you can see one two three three components give you the best based and they'd be highlighted and so it's two components can overfit so if you include too many it can be overfit and you have the q square r square all being plotted if you click the table you're going to see what's the exact value here you're going to give you a graphical summary if you click you're going to see it and and again and people are going to ask hey my sometimes my q square is negative so q square is not actually real square if you're really square and you shouldn't have an active and the calculation of the q square is put it here so sometimes can be negative when that happens that really means your model is not a good model and how do you interpret in detail just go to read here as I try to put as plain as possible but there's a most statistical interpretation here again this is not quite common things but it do happens which doesn't really it's with this PRSDA there's some some things happen and so this is a VIP plot and this is a VIP and you want to see what's cut off and sometimes it's natural and again if you choose a zero and you have a lot and this side you have this mini head map really summarizes what the change from high to low low to high and low to high and here's this this is actually the highest point at the 35 percent so it's interesting to really put a lot of your information out there now here's a permutation and people ask about it it's a permutation is mainly developed for the PRSDA it's not for everything we don't do a permutation for everything it's a lot of most of the statistics still working after doing the proper normalization the statistics the normal distribution of statistics the p-value is valid okay but for PRSDA and especially the PRSDA overfit and this is people found this permutation actually give you a more robust evaluation in this case we have four groups and we do a permutation and what's the what's the criteria here is that between it we use between this distance so basically we calculated distance between the separation and we want to see if a random label still the same distance of separation and we can find it here that is not the case and we do it once on the way we found the original data give the best so p-value is less than 0 0 1 so this is a permutation so again if you do a PRSDA don't just look at a 2d plot and saying this is a good model and try to publish always try to go through is a cross validation and a permutation and make sure everything looks fine yeah it's still I mean for PRSDA is that it's very susceptible to cross overfitting even a cross value in that address it so if your permutation not a significant one thing of that how many samples you have 18 per group yeah it's probably okay so you need to try to try to use other contextual studies like what it looks in a PCA and what's performance in the like random forest and SVM and what's it looks at what's what's yeah you need to try the other ones to see to see the your result okay if you have decent number of samples 18 18 I think you can try the other one to see whether to give you some confidence because PRSDA is notoriously overfitting societal so if you really found sometime I also want to say the permutation also have its limitations that some specific data structure and they can have very ridiculous it's always having 0.99 stuff and they have very specific structures and you don't just a permutation actually done working too well in certain data which is not quite clear to me why some cases like that I do but I do see that happens so what I'm saying that if permutation doesn't give you a significant and if it's like see if it's located here as it's somewhere close you can still try to try to use other method to see it's because of probably PRSDA have certain limitations stuff but if the whole whole thing is a 0.99 square listed here then it's going to be uh must be something specific to your data and I'm saying that's that's hit map I think hit map is supposed to be very familiar to all of you so hit map is really help you see your data actually in a way that's intuitive just like your excel table but it's hit map is clustered and colored so you don't need to see the real number but you see the color and they're going to just room out and see the overall patterns and you probably find something then room in see what's this pattern from which compound and for example you can ask some questions what's the change like low considering 0 and 15 but increase decrease and which one is only changes in particular group so don't treat your hit map as just all our pretty figures putting my publication and don't actually spend time with your hit map which I quite don't understand is that the only thing that you probably just don't not quite come with hit map so really make sure that you're spending up time understand the power of hit map I found it for all mixed analysis a PC and hit map really save you a lot of time by looking overall patterns by see your actual data and so in this case and you have the hit map and there's the hit map have a lot probably have the longest number of parameters just because people want to do all different kind of combinations and eventually grow this long so the latest one we add is that they don't want to see the hit map for each samples and sometimes you want to go to sample they want to have a group average so this is how four groups and if you try only show the group average you can actually merge all the samples into one group okay and this case and you can see some patterns more showing up so you can see this one come from here to here if you choose this parameter you can be looks like this so sometimes save you some space probably more clear view so a lot of you can change the different colors all the stuff and so it is so I think that statistic component really is that a lot of you need to spend like 45 minutes at least just play around and feel it and that's how it can give you rather just just to treat it as it is and finally if you think you are going to down and you need to save it and you need to click download the button here and down the button all your graph going to be all your results are going to generate here and and part of here is our history and the other one is general report so remember to click this button it's not generated by default it's generated when you click this and then you're going to have the report analysis report and you download this zip files and download the analysis report and the other part of our history let me see we actually so no questions so far and we're going to the next one okay so here we're talking a little bit more on a metabolism or reproduced analysis so if you click that generated the result and you're going to have PDF result and we actually spend a lot of time trying to I think just spend several three months create all this kind of the refine all these instructions and put all these kind of graphs embedded in and finally we have all these commands there so basically the thing that you see they also have your time version stuff if you really have this all available with you you can basically choose back what you have done what parameter you choose and if you want to do because if you're familiar with R you can use metabolize R to do that get identical result yeah yeah so here the thing that if PCA is doing good job you don't need to do psda and psda is yeah I'm just if you psda is permutation not doing a good job and pc are separated it's basically why you need to do psda if you for example a lot of time you have the like I see a lot of situation like six samples three three and you have a good separation similar good sampling in pca and you you want to psda you just cannot why because you have six samples you just permutation not enough you just you want to try all the best you cannot get you rather less simple no one just the small sample we let let me go through we discuss with your sample later okay yeah so if you play around with metabolize you can see on the right hand side we have to go to full screen it's just because we want to introduce this commander history it's like here real time here every button you click you're going to have a commander show up here this is all based on R so if you install material R and if you this is a tool on the github it's going to be on the current on the on the R depository you can download the install and then you can copy and paste and run the same thing and get identical ones sometimes you can do more than that if you change certain parameters some parameters not exposed to the web interface but you can do it here and you can also shift to the interface if you want to the batch analysis this is the best way you can rerun a whole thing again and again you don't have to go to the website and click each button if you want to rerun it and the best is do it within metabolism and we do this is because a lot of people want to use metabolism using a pipeline do it in a just automatic fashion and it's it's better done in this way if metabolism is the the main strength is that real-time interactive analysis if you want to do the automatic and do it in within R just use it metabolism it's no no need to click button and click dots and view the result if you don't like that so and so we talk about the statistical analysis which is most widely used and it's supposed to be your first stop and the next one we're talking about is the enrichment analysis and this is for mainly for the targeted metabolomics so the key motivation behind is that there's a gene set enrichment analysis so you have a list of genes how do you find a path with so you perform enriched over-representation analysis and with the whole purpose that you have one or two compounds enriched it doesn't really suggest if just these compounds in this pathways that means this whole pathway is affected because when compounds come in involved in multi pathways why is this pathway and one compound change doesn't really represent the whole pathway change and so if you're talking about the pathways being changed and more likely you need more compounds and it's this pathway is more likely more statistical stringent test okay and in order to do this it's basically you need to define a library a library is called a metabolite set so the most obvious is metabolic pathways and there's also disease associated metabolite set metabolite set basically in for example in diabetes we have five metabolites always like upregulated or downregulated this is became a signature of the disease so individually it's not as strong enough but five both upregulated you say oh this is a diabetes associated with the disease signature so we have the cacaixia associated with the metabolite signature we have cancer like a prostate cancer signature we have the colon cancer signature so all this this is basically how to understand where the certain signatures define associated with certain disease is also enriched to your data and so this set is compared to pathways not natural pathways it's just an organization and they're not as mechanistic at that so finally there's some location based for example oh all my compounds it seems they're all enriched in like in mitochondria so it's probably give you some hypothesis probably something related to energy production or something so there's a lot of the things you can put in on organizing this metabolic set to make it some biological more meaningful so of course the pathway is most well understood but the other one is better than t-test ANOVA okay this because it really gave you some feelings biological context help you read more about that disease why my signature in my genes similar to the diabetes and then you're thinking about more and more and probably there's certain pathways or some molecular interactions share certain mechanisms so this is is more close to biology than statistics and one thing that is mainly based on human this is because a lot of knowledge based on the human metabolism database so we have this overrepresentation analysis and a single sample profiling and a quantitative enrichment analysis so we have three modules and we're going to discuss about later but most valuable part is here then all these kind of signatures or metabolites sets in a blood we have 344 sets in the CSF cerebrals spinal fluid have 166 and for urine have 284 a lot of them just a disease associated based on all the literature stuff location based based on the lipids tissues or mitochondria so pathway based is 148 and sometimes people can see the SNP associated with the metabolites that this is the largest one so I think this is some is from population studies some actually from the computational in computational prediction okay so basically you have some SNP changes you some of downstream metabolites can be up-regulated down-regulated it's this is very interesting even the how they affect from the SNP to the downstream metabolites they don't know but the link is there so just that so drag related pathways 400 something so this is so what's the input for the overrepresentation analysis it's just a list of compounds and it's probably from your classroom this cluster components from like a heat map one thing is all similar to each other all from your t-test above the 0.04 and SSP is a list but with the concentration so what it does is try to compare with the literature so your particular concentration is higher and lower compared to literature standard values or normal range qe is a quantitative enrichment analysis basically upload a whole table of metabolic concentrations and they will do an enrichment analysis they don't use any cutoff so the qe is basically very useful if you do an individual use your use your data you go to t-test or whatever you don't find any significant features it doesn't really matter if you're doing a qe it that it that's they only use a ranked list from high to low and doing the enrichment analysis it does not use any cutoff so the power of this qe is that it can detect the subtle but consistent change in your data so if you have a lot of significant change you have a list of the significant compounds and all upload doing this you're going to have some conclusions but in a lot of case you don't have the significant compounds and because you don't the change is very subtle and in this case qe can help so this is basically cutoff free you basically just they don't select the different choice don't select significant compounds then do the enrichment analysis they do it in one go no cutoff involved so we already discussed about this we have a list of compounds i'll have the compound names plus concentration for single sample profiling or the whole concentration table and we just go through the one by one the first one is that a list of compounds and after that your metabolism is going to give you some mapping basically you have to standardize the compared library and something is wrong and you can't view to find some you because you type something it's typo and it will help you correct it so this is now you need to select which library it really depends on where you get your data okay if your data is from blood you probably want to select it from blood this is pathway associated this is independent some of the location indicated is blood csfu you really want to select the tissues or bio fluid based on your sample and so this deep associate predicted metabolism and there's a lot of things it's basically not publication over the last 10 years so there's a lot of things keep going and going result so the result is show up as a network on as a hit as a bar plot so bar plot is the auto fashion a way basically its p value is more significant it's going to color the red and you have the more photo change photo change means i expected to only found one compounds in this pathways but we actually found a four so the photo change is four so basically this is a photo changes like here and p value like here so this is a ranking of your of your enriched metabolite set and one thing that I'm using in this figure is that a lot of the pathways on metabolite set they share it they share certain compounds okay for example some metabolites like involved in both them I just cannot see both these pathways and both these pathways if more than 25 percent of the metabolites shared between them they are going to be connected by edge so this this network of you also tells you connections similarities between the metabolites set okay because this one you give you unique actually the top two or three actually you can see I'll share this three compounds okay and you can see they probably all link the together here so this is different things and you can actually click any views to see which common actually math you can see here is actually two compounds glycine and perivate acid so and if you really click SMPDB just introduced yesterday about how the SMPDBs and how they can explore more there now you can go to single summer profiling and it's basically the motivation that is used by physicians to analyze a single patient so and how you compare what's the standard and there's a textbook and you can go check oh you are too high or too low so it's glucose and for diabetes so this same thing so where the standard is that SMPDB or not SMPDB that HMDB actually have a large database have all the text reporting what's normal range for particular compounds and this one is all curated within a metabolism analyst and when you upload you are going to compare whether this range is higher than all the reported range or lower so we are be very very conservative so you have to be higher than all the reported values to be marketed high or lower than all the reported value to be marketed low for example here you upload a list of these compounds being meridians concentrations and this is urine concentration that's normalized by the creating it okay this is stuff and and you can see all this data and this is your data and it's normally reported values so they bought 10 studies and this is normal range and this is the range here it's probably already very high so either you marry wrong or something you need to think about so it's be a normal range oh if you're below here and so this is expanded quite a lot over the recent version I know this one last year last version only have five or six studies now we have 10 studies but you can see a lot of new studies still forward in the same range and so this is so this is the last one is a qga quantitative enrichment analysis and here that you show this use the example data you can upload your data as long as there so this is a cacactia patient we used and all this they basically going through all the same process and as we show and finally you see the result and and what's been changed and this each one of them is clickable you can see here is actually so many parts with significant and and this is quite on euro but that's data is data and you click it I think you can see what's the changes that a lot of associated with the phenotype is cacactia so a lot of this all these compounds are actually very very high in cacactia patient so this is in this pathway this has been matched to this pathway so it's very high positive correlated so keep in mind there's no cut off you just directly upload the data normalize your data and they will give you there what seems to be significantly enriched okay so that's the one so pathway analysis is very close to an enrichment analysis the only difference is the path we have the structures and also pathway analysis is supporting more organisms so the enriched analysis is mainly on the human data so pathway analysis we have about 21 model organisms and so far it's mainly based on cacact and we're probably going to update to a s and pdb in two or three weeks so so you can see that you can upload data as enrichment analysis here let's we do the example data and the same same same data okay and we do this out of scaling and the whole purpose is to show you the interface we are not discussing why we're doing this why we're doing that it's really case by case and so the pathway thing that here is slightly new so you here is that you see the what's different organism and you go here you see what's the tester you want to go and this is a specific called a global test or global and cova it is basically testing particular pathways all their compounds together whether they are associated with your phenotype label or outcome so they do it all at once not really testing your significant particular gene metabolism and do it in one go so it's cut out free and all this method actually developed for the gene setting enrichment analysis and so it's well established okay if you're not quite understanding you can really google it or read all FAQs and the stats is very dense I can tell you so I'm not going to too much to discuss too much and this one is relatively new is topology analysis because pathway has structures and and why certain position is more important than the other so for example here that we have this two commonly used called hubs and bottlenecks so hub is here so it's called the party people like party a lot have a lot of connections the other one's a bottleneck because this is more strategic teaching point if you break this this whole thing can break into three pieces so this one is considered a structurally or topologically more important so this is called betweenness it's called degree so degree high is basically the propellage it's a high degree and between is highest this blue is very it's deep between so it's really so you can see that betweenness and a degree okay so based on a degree or their location on the puzzle we can actually try to combine this mirror and if the pathways are being mapped or being changed it's more likely in an important location we think that they're more likely to affect the pathways and this is this is something we try to put it together for example using that selection we can see when the pathway is standing out have the highest the p values and here have highest the pathway impact and you can click on this you'll get to this figure if you get to the figure you can see a lot of the match actually in a very important location here in the center here in the center here in the center so if if your pathways a compound affecting mainly like downstream the last one so it's very hard to see that your overall pathway going to be affected because you you see that this is a whole pathway and you are the last output that if this one is high the hope how it is going to impact that pathway so so this is just some underlying thinking why we think this location is important so so the pathway important impact is that based on topology this is based on p values so you're always looking for the things that have strong p values and somehow the location within the pathway is more or less more in a significant important location so here is that we're not we give you a graphical summary we'll also give you a more detailed summary it's like here basically which one actually matched so if you click here you can see it's at nine out of fourteen eight or fourteen six just matched so this pathway have a it's a big pathway but you have matched at nine and you can see that all this nine match is red and if you go right here I didn't see the expected sometime you can see what's expected you can see expected like a two or three and you have nine so it's already significant so again after you're done and you can click download and click general report you're going to have all your results saved here and embedded with figure so the thing that it's all all down like that so the other part of biomarker analysis so I know a lot of you already asked me biomarker how do I know biomarker so this is a we're going to cover this also biomarker is very important for translation of bi metabolomics and biomarker analysis have a different thinking compared to just the statistical analysis but before you do biomark analysis I really strongly suggest you do that statistical analysis because a biomarker when you go to translation or just application you really did more strengths that means you really need to have a large number of samples if your sample is very very small sample and the best way is just doing some statistical analysis it's very hard you have just less than 10 samples each group and you want to biomark analysis it's very hard to believe the performance matter I was thinking you said it's going to be true in real life it's you need to have a larger sample okay biomarker is a I found is you need to really have a like 15 sample five zero I think about you talking about the number well balanced you can do a good biomarker analysis below that is just really it's not too much meaningful it's a you said what a number you said is am I going to trust it's going to be applied to the real population it's not likely so the larger the population required but if you have a small population a small number of samples the best way is always focus on exploratory statistical analysis so you have the since biomarker analysis so traditionally we have this univariate analysis basically when come on at a time have a RSI curve like a glucose diagnosis of diabetes it's the fun of blood pressure for this high blood pressure stuff so the the whole focus is the RSI curve but how to generate RSI curve that's a kind of arts almost like you can use a single biomarker you can use multi biomarker and use multi variety statistics you can sometimes you can manually build a model use some regression regression algorithm to be to create so it's a lot of physician and the clinicians actually have their own ways of doing stuff so this is all fine if you just do the ways traditionally but really if you're using omics data you really want to go to largest cohort and you want to report want to applying for like a big funding to do that and sometimes it's really better to follow a more standardized practice because a lot of time is make a huge difference if the performance is not that promising so you need to be more realistic to justify that stuff so for biomarker analysis we have two data set and and you can choose the top first one and of course you can do whatever your data but make sure your label is two groups as yes or no and zero or one so it's only for two groups okay and here's that a 19 patient is three month pregnancy so 90 patient is a really good number already I think it's about 45 15 s or it's usually have a very confident or smooth ours a curve okay and see so this is a 45 eclampsia 45 is normal and try to find a biomarker for the predictor earlier pre eclampsia so this is a very nice data set so we do it as normal we click skip and we just do an log and we see the normalizing before after looks nice and we do it multivariate our secret analysis and and we just use a linear sbm and I'm not going to explain svm because that's impossible but svm is supported by the machine it's widely widely used and a very good performance in most time and a very robust and much less susceptible to overfitting so in my experience sbm the random forest is much superior than compared to psda but the only thing that it's hard to explain and no very good visualization support so that's that's the part it's within the people who is not comfortable with this black box algorithm it's but on the other hand is that you don't have to explain them in like expert to publish because it's already well accepting in the field and they know this is a very robust algorithm so view rc curves and you can click there and you see the rc curves and here is that the good biomarker model is not using is that is that usually how the application of the metabolomics is that you are not going to measure all the components in time you're going to select biomarkers only measure like a handful of biomarkers like a three or five or ten you're not going to measure more than 100 simultaneously in clinical so so the idea for this is they automatically found biomarkers like select a top two top three top five and top ten and you are going to have like multiple models built and you can see their rc curve looks all decent some is very good and they're close to almost 99 some is slightly lower so and this is all the models rc curve the plot together okay and you can see I only choose one model you click here and choose the model fourth it is the five ten features basically ten compounds and you can also include some 95 percent combinator interval you can see this plot and the days have 0.9 to 1 so it will see a uc curve is 0.98 something so this is uh you can not only compute rc curves with combinator intervals this is how they computed our intervals based on bootstrapping basically randomly sub-sample and redo it so get give you something this is standard practice okay and you can see some features and glacial all the stuff you see some features expression so this is stuff so uh I know a lot of people want to uh I'm just briefly talking about this classical universe basically you look at individual samples how they difference between control and disease okay you can do some rc curve for individual that's also fun this is very close to uh traditional clinicians how they do that but more powerful is that how you combine multiples together and last one I should mention is that you can manually build if you click here and you give you our tables and you check my gut feeling tell me this one is going to work in fun say like the top one select second one select uh whatever you find and build your own model this all allowed it's it's uh you can explore uh all here so the only so the only thing with the last one is that if you use that one to do it the performance most likely going to be over optimistic because when you select uh they give you some information so you're probably going to have very good performance and you need to build this model on a valid and new model okay this is something that uh we understand the risk you can be over too excited over optimistic but on the other hand is remember the validation if you're on a new cohort you still have the good performance and you're all fine so power analysis so this is power analysis is basically as I mentioned uh you estimate the whole what number of samples you required you know to do the to do this to detect this effect so if the effect is actually there so this is quite a uh uh challenging especially you don't have your own data yet and what's the power is that if we assume it's actually there and how good you are to detect it so the power if for example power 0.8 is in a clinical trial I mean that a study have any chance ending up with a statistically significant treatment effect if really there was an important difference between the treatment so it is kind of involved in talking about this so how powerful is my study how many samples do I need to have to get from this one so usually if you apply for some funding people ask for probably 18 percent or higher if you're too low and they and they that's not that promising too so what's affecting your power is that uh uh sample size so the larger the sample size you definitely have more power the second one is significant criteria so how do you think it's significant so the more leaning you are at the more power you have basically if I have a p-value kind of like a 0.2 okay and you have p-value 0.05 who has more power opponent to have more power because opponent to is more likely detecting something significant because he's very lenient towards the judge of the significance right the other one is effect size which one is the effect as you paid depending on your what you are studying for example if we're doing some drug treatment or toxicogenomics effect size very strong because if you treat it under certain cells certain stuff people not people are talking about cells or stuff they die the effect is size is so dramatic and detected but if you deal with this human like a nutrition and stuff effects are very very small you have to have very large size sample size to detect that small effect so the dietary effect is very very hard to detect so this this kind of thing that you really need to balance what the design and what's the biological problem you are dealing with and what's the significance criteria this one you probably cannot change too much because it's in each particular field they have their own rules are they accepted the cutoff so some need to be slightly more stringent than any other field okay so how do we get the values and effects as you don't know and you can estimate from a pilot data nowadays all metabolomic mix now we have the repository like a metabolite and there's metabolomics workbench there's several repositories you can actually try to find the metabolomics data that close to what you want to study and use that as a pilot study to upload to metabolites to estimate the number of samples so effects as can be estimated from pilot study and significant criteria and basically you have fdr is 0.01 0.02 0.05 and sample size is we are going to compute and power is what you want to compute okay the tools that you need are the data you need to have have an idea what kind of cutoff you want to detect the significance so here is the after doing that upload your data and doing some stuff and you're going to get this is that you have samples per group and here is your power and you can see if you have a kind of like 0.8 power and how many if you want this 0.8 it's almost like here and almost like this dot 0.8 so how many samples you need about six zero samples a lot of times you want to have like 17 or 18 samples because some people can drop out so you you want to have a certain buffer other and just use 16 and suddenly you don't have it so you need to have slightly higher and stuff so you can see here that the power going to change based on your cutoff if you have 0.1 this you you have the power if you have 0.01 so if you became more stringent the power curve going to be like lower okay going to be have more you know to get to this uh levels then probably you need 100 samples you have 0.01 because the more stringent you need the more sample most more samples to get there and here is that sometime you couldn't even reach to that and even go to here still not 0.8 you can maximum 1000 usually when you go to 800 1000 it's the most model going to be saturated and so again and we and let me see we have more yeah just basically I just covered the most commonly used modules stats and the function and biomarker and there's so many new things here we just telling you about 700 800 functions we just cannot cover so clustering you need to explore classification as we are in random forest and we just touch it briefly and time series analysis two fact analysis pathway analysis peaks to pathways so use cobb and pace their peaks and try to get the pathways so a network explorer biomarker met analysis so it's all the similar interface but it requires that different input so if you want you want to explore it's all in your protocols you follow it and you have issues just raise your hand I'll help you because that's definitely I'll be happy to help you and if you still don't have time to finish and send an email after you come back after you come home and I'll be able to help people's me and tears will be able to help you so this is time series analysis you can see there's a two way ANOVA and some other methods and you can see there's a specific method to help you to detect what's the change across different time series and this is a predictor pathway activity from LC MS peak list and to the things and you can see that this is a mommy chalk the thing that you have the uploaded peaks and click submit and go into a process and you can get all this potential pathways potential hits and and you can download it and you can also visualize it within the pathway so you can see all this pathway when it is a kag style interactive map so you can double click this one and to see what's compound being matched what's the edX what's the other possibility to help you define your stuff