 Okay, very good. Okay, welcome back. And just say this is... This is what everyone waits for. This is the big moment. So your introduction to Metaboanalyst. Usual slides here. So to introduce you to Metaboanalyst, I think what it'll do is just cover a couple of general issues with respect to analyzing or working with metabolomic data. Some of this should be reasonably familiar with you, some of it's a bit of a review, but it's just sort of a gentle introduction to what you'll be doing with Metaboanalyst. So there's a standard workflow. There's issues of data checking, detection about liars, quality control, scaling, normalization. And we'll dive into Metaboanalyst. So this is a slide we saw before, and this is just again to remind you that what we do in metabolomics is we often have group studies where we will have 30, 40, 50, and we call of an animal, a plant, humans, whatever. And those are what we will call biological replicates. So the normals are supposed to be biological replicates. The abnormals are supposed to be biological replicates. But the intent is to have some measure of biological diversity. And it should be some normal distribution, which is why the replicates usually are 30, 40, 50, or 60 animals, plants, or humans. And then we will do this technical replicate thing. Not enough though, and this is one of the major issues with metabolomics is that many people don't do technical replicates. And many people don't do enough quality control to assess both the performance of their instrument and their sample preparation. So we're all pretty good about getting biological replicates. We're all pretty bad at doing the technical replicates. So this is a reminder that you should consider that in your experimental design. You can have duplicates. In many cases what people do is they don't do two of everything, but they'll have quality control samples one every five, one every ten. Sometimes they'll re-run a sub-sample and just to make sure things are consistent. Sometimes they'll pool samples and create a single reference sample that they'll throw in to make sure that things are performing properly and reproducibly. So it's not like microarrays, or for many years people would do three and four of the same thing. But in the case of mass spectrometry, yes, it's not a bad idea to do duplicates. Two routes to metabolomics. Again, this is another slide we've talked about. But there are essentially two different workflows depending on what route you've chosen. So one we've been trying to emphasize is this idea of quantitative targeted metabolomics. The other one that is still widespread, some of you may still do this, is that larger scale chemometric one. So the workflow in the chemometric one is first to collect your data, check it, that's the data integrity check. You've got a blank sample or blank run or null data, okay, that's not good. If you've got scaling problems or things that have been lost at the tail end of the run, that's not good. So you're checking that. The usual route is to then perform some kind of spectral alignment. You guys learned about that in XCMS. Some aspect of binning, although more and more people are just simply doing peak identification. Computers are so much faster that you don't need to do binning anymore. Then there's a data normalization. This is to make it look Gaussian. But there's also a scaling thing that may also be done. There may also be some quality control efforts. Those may be at the very beginning. They may be at later stages. And then outlier detection removal. And then you do their data reduction analysis, which is the PCA or PLSDA. It's after that that you've presumably identified which features, which bins, which peaks are most symptomatic of the changes. So this is where you'll be looking at your scores plot and trying to figure out which of those peaks are important. Then you'll go back and look at the peaks and stare at them and stare at them and stare at them and hopefully you'll be able to figure out what they are. The targeted approach is a little different, although the same sort of components, but just in a different order. Data integrity check is done to make sure you've got the right data and that it's correct. Then you go immediately to the compound identification and quantification. So that's what you guys did with the kinomics thing. And what we tried to do in some aspects of the GCMS examples and things like that. Once you have your list of compounds and their concentrations, then you do this normalization. You'll see how it's done in metabolism analysis, but this is to try and generate Gaussian distributions. You also do some scaling to deal with dilution effects. You will after that also check to see whether there are any outliers, whether there's some strange looking data sets. And at that point, you can do your data reduction analysis and produce your PLSDA and PCA plots or your rock curves or anything else you want to do. So these are, as you say, as you can see there's different order, but largely similar components. And so these components are part of metabolism analysis, but they should be part of every metabolomic analysis. So I'm going to talk about these a little bit, some of which you've already discussed already. So the data integrity issue, it's a particular problem with LCMS. It is also a problem with GCMS. There's lots of fake peaks in all of the data. So we learned about ADEX. We learned about neutral loss issues. In GCMS, there's the extra-derivatization products and that's why you have to run blanks. There are isotope peaks and you have to do deisotoping. There are inadvertent breakdown products and you have to deal with those. And so those are usually dealt with with some cases of software that we've mentioned, some of them dealt with with the instruments that you're using. This issue of false positives is not a problem with NMR. That's one of the reasons why we chose to do it as an example. It just was a little cleaner for you guys, but since most people don't do NMR, it was just done as largely an example. The way you can help sort these things out is using those technical replicants. So what I've just illustrated here is some real-life data where there is a contaminant that pops up. So you've got two samples where the same peak is there and then another sample where some other thing just sort of blows up. And the question is, is it a real peak? Is it not? That's sometimes difficult to sort out. In some cases, it's a matter of checking to see if you've done something differently in the course of it. So this is why usually running duplicates or replicates helps you sort these spurious peaks out. We've run and worked on XCMS. You guys have learned about aspects of alignment. This is particularly useful for MS, LCMS. It's also particularly useful for GCMS. And this is because there is always drifting in terms of how LC or even GC columns perform. And this is just an example of what's done and these are some tools that you guys have gotten familiar with or heard about. Binning is another thing to talk about. It's slowly going away. You can bin NMR spectra, you can bin GCMS spectra, you can bin LCMS spectra. This is done typically for people doing chemometric analysis. These days because computers are much bigger, disk drives are much bigger, bins are getting smaller and smaller to the point that bins are just simply peaks. And so it's just peak picking. So what is binning? So it's a way of dividing, in this case we'll see an NMR spectrum into 14 chunks of like a half PPM or one PPM bins. And then a bin will have several peaks in this case and usually an integrated area. And so you just have the area under the curve and the position of the bin. As I said, that was when computers were small and weak. Now we have much more powerful systems so the binning has largely just become peaks. And I just marked down all the peaks and the areas under each peak. Normalization and scaling, you can see on the right side where basically it's an identical spectrum but one is about three times bigger than the other. You want to deal with those things so that you're sort of comparing apples to apples. So this is a dilution issue which is common with urine but it can also be a problem with sample preparation. It can be a problem with solid samples with having too much water and so things didn't produce as much as you want. So you can scale and there are scaling methods we'll talk about. The scaling allows you to look to some cases internal standards. So in urine we use an internal standard called creatinine but we can also look at the total specific volume or specific gravity or total organic content. There's a technique called probabilistic quotient methods. The integrated area is another way to help normalize or scale. Each of these requires a little bit of knowledge about your system or sample. And particularly with cell samples, in some cases fecal or microbial samples you have to consider these quite seriously because it makes the samples comparison difficult if you can't normalize properly. Is it common practice to normalize to more or less very small range of concentration that you have to deal with to sample? Yeah, for urine the standard, the clinical standard is to normalize to creatinine. There are better methods I think. People are finding that normalizing to the total organics is better if you can measure it by NMR or you can measure it by other assays and you can measure all the components there. That's a more robust method. In the case of say fecal water this is a problem. It's something like creatinine. In that case you're trying to do a wet mass of cells or material and you have to sort of post hoc measure that. But if you can normalize then to the total mass of the extracted tissue or whatever material you're working with then that makes it reasonably consistent. So again this is just more about the scaling. Some of the other things which are normalization is that log transformation. So terms like log transformation, autoscaling, pareto scaling, probabilistic quotient scaling, range scaling, these are all actually available through Metabo Analyst and they're explained in a little more detail. They're ones that you typically do by trial and error. Some work, some don't depending on the data and having something that's interactive and visual like Metabo Analyst helps a lot. Data filtering, this is again something you guys did when you were doing with economics. You removed say the water peak, that's one filtering. Noise filtering is done with GCMS. It's part of AMDIS. The removal of outliers, the removal of false positives that's done with some of the local software that you get with LCMS or GCMS instruments or some of the freeware that we've looked at. The data reduction, we spent a good chunk of time last lecture going through PCA and PLSDA and the clustering which is all part of the process of figuring out what you've just seen and why it's there. So that's just a quick overview of some of the workflow, some of the issues and all of these workflow components are part of Metabo Analyst. So that's what we're going to talk about. When it was developed in 2009, it was quite new novel and the idea of actually doing Metabo Analyst work online was pretty unique. It was designed to handle all types of data that you would typically have, LCMS, GCMS, NMR, to do all the univariate and multivariate testing to help generate useful plots and then to link that to the biology pathways and metabolite set enrichment issues, other clustering approaches. All of those were built in and it's progressively been added too. This is the work almost entirely of your TA, Jeff, and it's now in its second version and he's working on version 3 which he'll be hopefully ready for next year. Faster, better, even more capable than before but we're going to work on version 2 right now. It's proven to be very popular and that's one reason why we might find a little bit of, maybe a little slower than ideally we want. We have, what, 300 users at any given moment using Metabo Analyst. It's broken down into sort of four general steps. There's a data preprocessing step, that's sort of your cleanup, the data normalization slash scaling, second step, the data analysis which is sort of the fun part and the data annotation which is also critical. It's involved some aspects of metabolite identification but also creating useful plots or generating reports that are useful. This is a flow diagram that's a little more detailed. Hopefully you guys can read some of the text. It's very small but the data input it can take the raw spectra. It can take peak lists or if you want binned lists, spectral bins or in the case of targeted quantitative metabolomics you can get your concentration tables. Those are the ones that we'll probably work with or ideally would work with is that concentration table first. However, if you've got raw data you can do spectral processing, you can do peak processing, you can do noise filtering. You can also put in what's called missing value estimation. Some things if they're below the limited detection you want to have some sort of smallish value which helps generate a more Gaussian distribution. You can then normalize or scale by rows or by columns and it depends on whether you called your rows your sample or your columns your sample, your rows your metabolites or your columns your metabolites but each one is possible. After that then you have a whole bunch of choices. Metabolite set enrichment analysis you can do metabolite pathway analysis you can do time series analysis or you can do the classical multivariate analysis. So those are your four options and then all of those things will give you graphs and pictures and tables and reports. Additionally you can do some quality checking data QC assessments. It also supports peak matching, peak searching, compound conversion, pathway mapping as well. So lots of options. So it's a fairly complicated piece of software. So we're going to look at four components. We're going to do raw data processing with metabolites. We're going to do data reduction with metabolites. We're going to later move on to functional enrichment analysis using what's called metabolite set enrichment analysis, MSCA. And then pathway analysis with MetPA. Both MSCA and MetPA are now part of Metabolite Analyst. There are separate websites for them still I guess. They were originally produced as separate tools but now they've been brought into Metabolite Analyst and that has strengthened them quite a bit. So if you go to Metabolite Analyst www.metaboliteanalyst.ca you will see a page that looks a lot like this and you can click here to start. There's a little thing up in the corner and that can open it up. You can also then look at the data. There are data sets and information on data formats to choose. So some of them you can download directly. Some of them you can upload. We have both options. So in this case if you clicked on data formats you can see four or five different data sets that you can download to your hard drive. And these can be representing both bind data, compound concentration data, peak intensity tables, time series. So these are all real data sets. Some of them are also peak lists and these are much larger so they're zipped files. All of these can be used and are intended for you to help demonstrate the feasibility with that. Alternately if you don't want to download the data you can just go to some of the examples and you can just upload the data directly because it sits on the Metabolite Analyst hard drive. So we'll jump into that part one which is the data processing. So you're going to try and convert your raw data into data tables for doing statistical analysis. Rows and columns with samples and metabolite values. And so for the target analysis which is what we're trying to emphasize in this course and what is becoming increasingly the norm for metabolomics in general we'll look just at the concentration tables. You're free to work at some of the... or look at and use some of the other ones and these are typical of untargeted methods. Spectral bins, peak lists, raw spectrum. So that's some of your XCMS data. As I said you could have downloaded the files or you could also go upload and that saves you one or two clicks. So we have a couple of data types that you can choose from. So in this case if you download it you can upload it from your disk or you can go straight to try our test data which is if you scroll down you just click on the button and it will instantly load the data. So for this example and you guys are going to be working on your own we're actually going to be using this thing called the metabolite concentration data of 30-some Ruhmann samples measured by NMR from dairy cows fed different proportions of cereal grains. 0%, 15%, 30% and 45%. So I don't think any of you are doing agricultural research fine and most of you do mass spec fine. The important thing here is this is just it's real data and because it's from cows it's publicly available so we're not dealing with any clinical issues. And it's about tables. So you could have done this with mass spec. You could have done it with this GCMS. You could have done it with the HPLC. It's just a targeted analysis. So this is the data that you will try. Yes, Karolina? Say that again, what's the difference between the... Okay, so the peak intensity table is really just having the bins shrunk down to really narrow so they're just binning peaks. Bins are usually things that are somewhat wider and may include three or four peaks or even part of a peak. And as I said, back in 2009 binning was common. Now it's 2014, binning is just about extinct. But it's still there. It's still an example. So no one would want you to do the data frame to select an extinct specific mass to be range and specific size of a contract one. So this is what you mean by... Yes, you could do that. If you want you can call it data framing but you're just cutting it up into different frames. And it was an older technique. There's a lot of software that's been developed for that in the past. But it was done because computers five, six years ago were not able to handle some of the amount of data processing. The other reason that binning is related to non-marked is that non-marked is a short PDF and very crowded. Because the mass is about mass and dead and if you're doing binning, you get a lot of that binning as a node data. So for mass, it's actually a few minutes of it. But for non-marked, it's the same. So it's also different how any non-marked mass is actually sticking to it. So for our XEMats, XSFs today, that would be... Yeah, it is sticking to it. Yeah. Yeah, but... I think... You can use both, and it runs with both. If you are using bins, if you put big intensity table, it also runs. There's no problem. That's right. Okay, yeah, yeah. Because in front of the Twitter people, but in the summer, there are different cultures. Yeah, if you're being a mass specter, and you're doing something, because some column, if you're holding a zero, nothing there, always just one peak, it's very fast. The mass specter, the non-marked... But I think Rose is right in that more and more people are just using peak intensity. So to some extent, the cultures are converging. So everyone's just using peak tables, peak intensity tables, and a mass spec. So if you uploaded the data, and hopefully you guys will go through this shortly, you will see the data, it'll be processed, it'll look at it, it'll identify things past, identify how things have been grouped, any problems with it, and if you're happy with the report, you can just skip. If you're not happy, you can actually do some, what's called missing value imputation, which is to create usually numbers, so they're fake, but which are reasonable, based on the data that you've been already collecting. So if you've got missing data, and it's sort of like you've got this nice line, but you're missing three points, and the rest of the data continue, you can usually imagine that you could put those three points in there, and so that seems reasonable. The reason why you do data imputation is to help keep a normal distribution, because what's going to happen is that your missing data is either going to be, well, it's not a number, so computers don't like dealing with not a number, and the other thing is you want to make sure the numbers aren't all zero, which then creates a bimodal distribution, where you have a whole bunch of zeros, and then the rest of your data, and so bimodal data is not good. Statistical packages don't like that. So that's part of this option for dealing with missing data. Are there any best practices for replacing zero values, I think, gem and venture? Yeah, one half, one fifth, it sort of varies, but it should be somewhat lower than the lower limit of detection, but not zero. And some of it is something you may know about your instrument. So in NMR, we know typically that a lower limit somewhere between a half to a third is fine. Mass spec, it may differ, and it depends on the instrument, too, but it should be a real number and it should not be zero. Can you select a minimum value so you can use that? In metabolism, yes, you can choose a minimum value. Yeah, so that's something you can put in into the data. It will, yes. So if everything's gone through and in this example everything should go through so you don't have to do a whole lot of extra work, you're now going to do the scaling and or normalization. So in this case the samples, the different cows, the rumen are in your rows and then the compounds are listed in columns. So there's, I don't know, 50, 60 different compounds and I don't know, maybe about 39 samples rows. So it's about a 30, 39 by 60 roughly matrix. You're going to try and then look at the rows and so you're going to you could do no normalization or no scaling. You could do normalization and scaling by looking for the median, the sum. You could have some reference sample which might be some standard that's been added to it or some completely distinct sample and this one we're just pooling the average so we create a synthetic sample that allows us to reference it. We could have normalized to a reference feature so creatinine if this was urine. A specific normalization to a dry weight if we were dealing with some cells. Then we can normalize now so that was more a scaling issue then for the columns. These are the metabolites. These often more so with mass spec but you can also have it with NMR. You will have skewed distributions. Some that are clustering around one set and then another bunch that are way out there. And so you could do log scaling or in this case it's auto scaling is one that's selected but there's Pareto but all of these are transformations that would hopefully generate a more normal looking curve. What's that? Yeah, it doesn't really matter. Log base 2 is yeah. Frankly the best way is to look. I still think it's the best way but I think there are some I think mathematical tests that are sort of there and people have asked about them it's a QQ plot but yeah I think especially Jeff who's had to deal with this for five years day and night his view is that visual inspection is your best and surest way. That's right. This will do it for you. So this is that's why people like the program because it actually allows you to do all this very quickly. So just again the data is uploaded we have the samples in rows that's the cows, cow 1, cow 2 to cow 39 compounds and then we've done these row-wise, column-wise normalization you could do both together and the row-wise in this case the samples to make each other comparable so we just want to make sure that the sample we're not dealing with dilution effects so you could have had 38 cows that are nice but then someone spilled one sample and added a lot of water and so now it's 8 times more dilute this one will deal with that issue fairly well column-wise is the one that's trying to make your data look normal and the metabolite distributions look normal and that's pretty important because that's how we can do multivariate and univariate statistics so that's this log, Pareto range and auto-scaling so this is the most useful visual and this is why it's so important to look at it so here we've implemented auto-scaling so before we had auto-scaling which is on the left you can see that this is a highly skewed almost extreme value distribution so you can see in the plot whole bunch of things almost seem like they're zero and then a whole bunch of other metabolites that are ranging until four or four million months ago so here's a huge distribution this is mirrored in this this is a box box showing all the metabolites with their names that almost seem to have mirrored concentrations that they're consuming at once at the time of univariate that are huge so they skew your distribution statistics, even the man Whitney you test but work really well with that so what do you do so we could have done a log transformation but we just did the break-scaling and auto-scaling and boom all these things are now shifted nice two bars about the same width you can plot this all out and that's about as nice a Gaussian curve as we've got you have now changed your metabolite concentration data to a normal distribution and now you can do good statistics with it is bar-scaling cause any issues or interpretations that you have on analysis? nope you'll have good p-values you'll have good sets of metabolites to work with it won't change your interpretation okay so it's also at this stage that you can deal with some of these outliers this scaling process and looking at some of these data some cases going a few steps further allows you to identify some of the problems in your data and so it's important to look at this or to do this early on cause you don't want to have to find it later so one is to look for outliers mistyped entries unusual things that happen in sample preparation in the case of samples where you've been doing binning peak lists, sometimes you'll have to do noise reduction that's another step that can be done looking for outliers you can actually find outliers using PCA or through heat maps or cluster this is an example where you've gone all the way and you've done insane analysis so the green one is separated pretty well from the red one but here is one way out of here it's not even close to the green or close to the red that's one where something's gone wrong and it's either there's a scaling problem a normalization, a typing problem but you want to look at that and so that's how the data analysis process actually allows you to identify things another one where the heat map where you're seeing is healthy not so healthy but we're seeing this one sample which has this dark red marking all the way across something went wrong this sample is too dilute all the numbers decimal places were shifted by 3 or something like that so you can identify from these graphs you can look at the specific identifier number and everything else you can go straight back into your data and remove that outlier or fix it just so you can go to your Excel spreadsheet if you want to so either way so you can do it within or you can do it in your Excel spreadsheet there is a data reduction component a data filtering or noise reduction we're not going to go into that too much but it is typical when you have peak lists only and you can choose which numbers or variables are going to be filtered how many are going to be removed what is the percentage usually when you can identify noisy features they are typically the things that are at the lowest intensities not really certain about them and now those can be cleared out by in this case different choices we're not going to have to do this with this data set that we're working with but these are some of the options in Metaboanalyst so we've gone through these cleanup phases it's important to do that once you're able to do the cleanup then you can do the fun stuff which is identifying the neat features the patterns differences between phenotypes to do your predictions and classification to publish your paper in nature or science so at this stage I've given you a half hour introduction now is spent about 15 to 20 minutes going through or at least taking what I did with you just the last few minutes with the sample data set and for you guys to then use your slides to guide you through the next steps so you're going to take the cow data set that I've just mentioned you can use the slides in the description that I just went through so you can upload the data take the sample data you can choose another data set if you want but this was designed specifically for this cow rumen data and you can do some ANOVA analysis and so it's going to show you again it's very simple point and click point and click things will be done you should be able to get similar kinds of graphs look at individual compounds to see how things are different and then we're also wanting to try and answer some questions work together if you're sitting beside someone just talk to each other this is just a good way to learn it's not like you're being graded if you're all alone you can join up with a third group and shift over and if you're completely stumped you can ask Jeff again these are others I've just highlighted on the left where you can click on things what should happen after you click on them so again just use your printouts to help guide you and then you can actually save some of these images we're not going to print them here but you can at least save some high resolution images more questions more examples and so you're going to go all the way to just sort of following things along to which slide number is this it's about slide 61 so it's page 39 31 31 okay stop at the word metabolite set enrichment analysis and so this should take you guys about maybe 20 minutes so we'll give you 20 minutes just to work together take the samples ask some questions talk to each other if it takes longer fine this is the play around with some of the data you may not have finished as much as you want but I just want to make sure that people also have an opportunity to look at some of the other parts of metabolite analyst so one aspect which is actually a fairly popular part is called metabolite set enrichment analysis originally it was a standalone software web server it's now been integrated into metabolite analyst version 2 so you can go to either metabolite analyst or you can go to the MSCA website it's modeled after certainly it's very popular in microarray analysis an RNA-seq analysis called gene set enrichment analysis and you can do a couple of different kinds of analyses you can do looking at over representation analysis you can do single sample profiling and you can do quantitative enrichment analysis part of what gene set enrichment analysis and therefore metabolite set analysis requires is you either have pathway data sets disease sets predefined associated metabolite sets to pull this stuff out so this is how people in at least in microarrays associate gene patterns with a particular pathway or disease so this is what we're trying to do with MSCA so it tries to group things into biologically meaningful groups and you know if you see citrate and the conitate and sure what are some of the other ones malate all enriched then you can say okay this is the TCA cycle or this is relevant for that but in other cases if you see substantial changes in a certain class of lipids sphingolipids you might find that that's associated with Tasex disease and that's another one that's listed in the biologically meaningful set right now the MSCA is intended to support human metabolomic data that's where we collected most of the data although it shouldn't be that different for other mammals but to a degree so you can work with different types of input data you can have just a whole bunch of metabolite names so you don't have to have their concentrations and that's the overrepresentation analysis as I said it just if I gave you the citrate to conitate malate to isocitrate it's just a bunch of names but it allows you to say oh that's probably associated with the TCA cycle you can have the list of metabolite names and the concentration data from a single sample so someone's come into a clinic say I say I'm not feeling well and then you run through a metabolomic test and you see that they have well you measure a whole bunch of concentrations and you're seeing that they're phenylactate levels are compared to average eight times higher so this one would compare that data to the average or to the norm and this is what's been collected in the HMDB over a number of years which is average concentrations for blood or urine or whatever else the third one is a more typical one where you have just like what we've done with cows or in this case it would be humans and we have a whole bunch of metabolite names a whole bunch of concentrations for multiple patients and that's called QEA so this is just sort of the three options the overrepresentation analysis on the left single sample profiling in the middle and then quantitative enrichment analysis the classical MSCA or GSEA on the right all of them will then compare against certain databases and then output something useful so for this example because it's specifically for human data we would want you guys to not use the cow data but to use the metabolite concentrations from 77 urine samples for patients, cancer patients so there's about half patients developed what's called cancer catechia and the other also had cancer but did not develop catechia catechia is the muscle wasting so if you've ever known someone who's had cancer often in the late stages they get very very thin and quite weak and this is actually probably the number one killer for cancer and we have still no idea why this condition develops interestingly there is very strong metabolic signature for it and it's seen in urine and you can actually predict people who will develop catechia and those who won't so I'm going to leave it here you guys can use a few minutes where you can go through well actually I'll maybe race through because I'll give you these two options so you can play around with the gene set enrichment or metabolite set enrichment analysis and there's a bunch of things that you can take through and it shows you some outputs or if you're not particularly interested in that you can go to the next one which is called metabolite pathway analysis so in this case we're going beyond just simply metabolite set enrichment that we're looking at pathways structures of pathways and in this case it's not just about humans we can look at about 15 different model organisms so in this one we can upload, well we can still use the cancer data just like we did for the cancer catechia we can do classical normalization but it will do or it will look at a whole bunch of pathway analyses in this case some of them are from plants some of them are from prokaryotes some of them are from birds fish and mammals so in this case we'd be choosing humans but if you're looking at data from birds you could potentially use pathway sets and pathway libraries there what this does is it does network topology analysis and this is something again that's become important in pathways where we think about hubs and nodes and bottlenecks and we can measure things called degree centrality and betweenness centrality just like they're very important genes or proteins that are hubs to networks that actually happens in metabolites and this PA allows you to identify and quantify things uses keg pathways as opposed to the smith db pathways has some very cool graphics and allows you to identify which are the compounds that have very high impact they're plotted out here in terms of their importance for the pathway whether they're hubs to a pathway or not and then again there's some questions that you could answer in this case we don't have time to cover a lot of the other things there are different clustering methods different tools for classification some people are interested in doing time temporal analysis this is something that's offered through metabolite we haven't looked at some of the data quality checks and people have brought that up and this is some examples of what you can do with a time series analysis used to be called met at but it allows you to look at things over days, hours, weeks changing how these have changed over time and then there's some data quality checking where you can look at different cohorts this is an interesting one where three different samples are being collected one, two, three and the purple samples collected on the fourth day and everything was shifted up and this is a case where the sample evidently had been left on a counter overnight and not frozen so these are tools that are available that allow you to look at your data and see whether there are problems and then correct that in this case it is statistically correctable you wouldn't have to redo your experiment you just simply adjust the mean and bring it down so you can start doing some decent comparisons so in the remaining minutes play around, try it, you guys can choose one or the other or you can carry on with what you were doing before with the original analysis same time if you're thirsty or need a bit of a break do so now, we'll start the new version last module at 3.30