 Okay, I think we'll get started and as people drift in Let them know they missed the most important slide today And there it is So this here actually is a bit of a mix So there'll be a lecture of about maybe half an hour, and then we're gonna give you guys an hour or so to work on the tabloan list. This is kind of a mix of a lecture slash lab I'll Turn back on at some point about 15 minutes before the end of the session Which I guess will be around Quarter after three and then we'll cover a couple other items You have Actually both the slides for this lecture, but you also have a couple of tutorials And in particular there is a tutorial immediately after section or module seven This module as well with the slides has a series of Questions as well as Sort of a walk-through so you can use the slides or you can use the tutorial which is Published in nature protocols Sort of to take you through some of the data Likewise You guys earlier this morning prepared a data set and you can actually use that data set if you want There's also other data sets that are available through a tabloan list on the website that you can Go through The tutorial it's with your binders and the one that I'm using as part of an example is an example data set That's right on the website So what we're trying to do here is to get you familiar with sort of the standard workflow for metabolomic data analysis It's generic Many of the same techniques and technologies produces similar kinds of data sets with similar kinds of issues But obviously every experiment is a little different We'll try and go through things like data integrity tech detection or checking of our detection quality control Normalization scaling and transformation and we'll use this with an online tool called material analyst Which is something that Jeff wrote what he was just a young student So as I said, and we've seen this slide and we'll see a few other slides But typically whether it's metabolomics or any of the old mix experiment we will have Control and cases or treatment a treatment B and we'll have populations Where we're collecting samples or measuring things And those will largely be called sort of biological replicates They're not perfectly replicated, but they are the samples that allow us to get good statistics In many cases, we'll also do technical replicates, and this is an important part of almost any study And we talked about that yesterday And I'll mention a bit again today, but many cases people will especially with LCMS will do multiple runs at the same sample Sometimes they'll put in quality checks Which are a little different than technical replicates, but the technical replicates are a way of measuring the performance of your instrument Whereas the biological replicates there are really ways of measuring the variation of your biological system And we know there's a fair bit of variability in human and even rat and mice Biological systems fair bit of variation in plant and microbial systems That's important because that gives you your population statistics Technical replicates are really intended to help you in the dress area issues of systematic error As well as issues to noise and false positives and things like that that typically arise and Particularly MS-type experiments We're looking here at two routes to metabolomics again the protocol or workflow The verge is a little bit depending on whether you're doing Kind of metric or untargeted versus a targeted method But there is also a fair bit of convergence at some point and What we're giving or going to give you this time At least with the example I'll be using these slides is an example of quantitative or targeted metabolomics The data that you would have had for your XCMS work would be one that you would use for untargeted or a chemometric method So the workflow with the chemometric or untargeted methods Is a data integrity check? You guys have done the spectral or chromatographic alignment and binning There's aspects of data normalization That's important Quality control once QC outlier removal and there's a data reduction analysis And then once you will load we know everything down to perhaps a few Dozen peaks or features then you get into the compound identification And then targeted methods you start off with the data integrity check Sure things are okay, but then you start straight away into compound identification quantification Then with that data list you will go to normalization Quality control outlier removal and then you follow on with the data reduction and the data analysis So slightly different sequence, but very similar Steps that are done In the case of the XCMS data you typically be probably starting about here in The data sets that I will be gifting you. You'll also be starting about here So in terms of data integrity I've talked about this a lot particularly with MS. There's lots and lots of false positives We talked about how to eliminate those and I think it was natural three So there's the addux and derivatization products that we see with PC is isotocomers breakdown Neutral loss issues some ionization multiple charge species To say this is not a problem with NMR But obviously NMR is not as sensitive So it's sort of a compromise These are some tools and links that help identify or help deal with some of the adept I gave you other links before But these would be sort of examples of perhaps noise or shifts or issues of data integrity and quality We've just spent a bit chunk of time going through spectral alignment That's what we saw XCMS is used for particularly with GC and LCMS in NMR There are tools particularly offered I think through broker For doing the spectral line for NMR Although it's not that important Gain things don't vary that much except in cases of pH So some use warping algorithms others use essentially time shift algorithms or peak shifting And we've seen again some examples that do that sort of spectral alignment so Game typically with LC GCMS data if you are doing non-targeted work There's quite a bit of advanced prep work cleaning aligning It's quite onerous And because there's so many features typically people try and reduce those features very early on The targeted approach or quantitative metabolomics things are typically opposite to that you're already focused on the compound identification You don't generally have to worry about issues of alignment. You don't have to worry about issues of noise and other things You can do things like spinning This can be done with mass back GCMS. It's more frequently done with NMR This is a trick to deal with peaks In a fairly consistent way now these days Computers are so fast that binning is slowly disappearing and people are just simply measuring the data points But you'll look at the literature and there are claims examples even today where people use binning processes It's just the bins get really really tiny almost a point where they're just handling individual data points So binning is a way of sort of reducing the amount of data But Disc spaces cheap these days and so we don't tend to worry about that much Normalizing and scaling Sometimes you deal with dilutions Best example probably is urine But there are many other cases where if you're doing cell extracts or lipid extracts or other types of tissue extracts protocols will have differing Various of values of recovery To scale things so that you're able to to match And in fact the two spectra here are of the same sample, but just diluted by a factor of about three or four So you can try and scale or I'll use the word normalize with the alternate meaning To to match things again to sort of correct for that systematic dilution difference or Whatever that may be leading to that systematic change so things like normalizing to the integrated area of all organic peaks It's often done with urine. There's a probabilistic quotient method People also use internal standards. That's primarily used in NMR. It's probably often the preferred method Overall and we can also normalize or scale according to sample specific weights or volumes each of these depends on the conventions of The group people that typically analyze these things is sort of a consensus that many groups work on and depending on whether They're plant biologists or clinicians or people that study urine or people that study tissue or people that study microbes Another approach to scaling is to scale to features Here's a spectrum where we just see one giant feature Here and That can help Some cases manage or deal with certain types of analyze outliers But you can also I want to go back to scaling We also get back into normalization We can scale for concentrations To help give things a normal distribution and this gets back to this log transformation To give normal distributions There's also things like auto scaling and Pareto scaling which also can help giving you a better normal distribution range scaling So these things some are to deal with as I say systematic areas others are to deal with essentially making things look more normally distributed so that T tests and I know the tests and other things can be applied This filtering and moving things like case of NMR solvent peaks case of MS noise false positives Coming out from blanks Typically you need some level of justification. We can't randomly arbitrarily remove peaks the So that's a filtering step and we've talked about that particularly for GC and LCMS At least you have a liar move or another one that we've sort of talked a little bit about and Then in terms of data and reduction Formally you can think of that as trying to reduce the data so it's more understandable You can also think of it as dimensional reduction which we do in PCA. So hundreds of dimensions down to a few So that's also data reduction and it allows us to sort of handle things more consistently Clustering is another approach to help with data reduction data analysis So all of these things are our steps our components all part of the workflow and they say change with depending on whether you're doing targeted or untargeted in tableau mix so to help with the workflow and because those steps often involve combinations of statistics large tables and Combined with the fact that you can't do it all in Excel We develop metaboanalyst and so it's now in version 2.0 So Jeff got the idea some years ago seeing how much most of us struggled he also got himself very familiar with R and with Web-based design The table analyst is actually used by thousands of people now We've had to scale up servers multiple times just to deal with the workload today. We may yet break it If we've got 22 people hitting it at the same time We'll see Jeff is sweating there that But it is designed not just for one type of platform, it's for LCMS. It's for GCMS. It's also for NMR based in Tableau mix It's supports both workflows untargeted and targeted. It supports both types of data analysis Univari and multivariate, so you can do t-tests and over on PC and PLSDA It doesn't do I think the man witness you test So we'll see if we can mix Jeff to add that to Metaboanalyst But what it does do is it identifies significantly altered metabolites it produces a lot of colorful plots that you guys saw today earlier with XCMS and Jeff Spent a good deal of time trying to develop very detailed explanations in summaries The idea was just upload your dot data press a button and then your paper is written for you It's not quite that way, but it's getting close. It also links things to pathways and so this is as we go from lists to To pathways and this is this connection and then from pathways to biology and to understanding things And so this is what Metaboanalyst is intended to do So there's a workflow and it guides you through that and if you Okay, everyone heard that so don't use Chrome for accessing the table It's the phase of the moon. So right now Three or four steps you start with the data preprocessing step Then you do what we'll call generically normalization. So that includes both scaling and Converting the data so that it looks normally distributed. Then there's the data analysis Which is the PCA stuff that we've talked about PLS DA and then the annotation which gets into aspects of maybe identifying metabolites if you're doing untargeted but also allowing you to do Pathway analysis and other things that are probably more biologically interesting This is a very detailed description of What's done the flow through the types of support that Metaboanalyst does I'm not going to go through it in detail. You guys have this slide But it just really emphasizes the breadth that's there and this probably I don't know if this covers 2.0 or if it's still from 1.0, but it's there's a lot a locked in the system and a lot that's supported You'll see that there's metabolites that enrichment if you've ever heard of microwave Enrichment this is picked up from that. It's actually on its own become quite popular in metabolomics Time series analysis, which again is quite unique to Metaboanalyst It has its image centers that people are actually using a lot of plots for time resolution images for papers and things like that It has all the tools for the data processing and data normalization integrity checking There's the functional interpretation which is pathway analysis and enrichment the statistical work that most of you guys are Interested in and there's a whole bunch of other utilities for peak identification conversion name conversion lipid analysis batch effect temple drift studies comparisons and so on so We're going to go through sort of four steps here, although I'll only take you I think to the first One or two and then we'll leave it to you guys to sort of follow along using either the slides your computers Or the tutorial which is at the back of this section Which is actually quite detailed and quite popular with people So if you go to Metaboanalyst And I think that we somewhere in the website you can just type in the name Metaboanalyst on Google and it'll take you there the What you'll find is that there's a menu So you can click through here, there's lots of tutorials resources information about data formats and overview To start the program and this is probably the most challenging part to Metaboanalyst is that there's this Relatively small region which is you have to click here to start Most people in the website sort of expect that it's already started But you actually have to start up Metaboanalyst by clicking in the top If you were to look at some of the Data data sets There are by clicking on the data formats there are a bunch of data sets that you can download or zipped files Some are NMR some are mass spec Some are been summer process time series. There's a whole bunch of data sets That you can access you can download them keep them on the computer But you can also access a couple data sets right online that are not example data sets It talks about CSV files are you guys probably get familiar those are the Excel file formats that we were saving things in this morning So let's start in as I say to look at the initial steps of data processing So First thing you want to do is convert your raw data into some kind of data matrix, which is again something you did this morning where you have samples and then peaks or samples and concentrations of metabolites It's something that's that's associated with a value So those could be concentration tables, which is my own preference in metabolomics As I say, I think I try to sort of press upon you that quantity of Or target metabolomics generally gives you the richest data But the untargeted set which you generated today you'll get your peak list You can also do spectral bins and you can also do sort of the raw spectrum which need a little bit of processing obviously So once you've started pressed start you end up with this Sort of view here And you can choose to upload your data And I think if we scroll down and believe there'd be other example data sets You could just upload but if you've already downloaded or you've already saved your data from this XMS experiment or this work we did this morning or if you wanted to use the NMR data or whatever else is here You have to tell what type it is. Is it a Concentration table spectral bins or some peak intensity Some also format are there rows Samples and rows or samples and columns people switch them and you can't tell unless someone tells you And then you can select the file from your Computer it uploads and you can then submit So you can submit excel files or you can submit zip files Which is a fairly common way of doing this Now what you'll see here is that this is Sort of a control panel. This takes you through the workflow load your data process your data Do your statistics on your data and then try and interpret things Um, so it could be if this is untargeted work This would help you a lot of at some point start to figure out after you've found what were the Interesting peaks to figure out what they are This is more if you've already Figured out what those peaks are and know what those compounds are You can start doing things like pathway analysis and tablet set enrichment analysis and other things that are biologically interesting So as I said, you could have uploaded files that you've already saved Or as you say if you scroll down a little further, you'll find these sets of Files these are all have all been tested processed You can see what what came out of them you can Link to the publications so you can understand a little bit more about them And so these are all the types of formats and types of files that metabolis can handle So lcgc NMR and so on So here's something that I don't think anyone here is doing a lot of agricultural research So this is a little bit out, but here we're taking a set of dairy cattle And we're feeding them grain Most cows don't like grain But we do this Actually to help increase milk production, but they know there's a An issue with it a lot of cows get very sick when you increase the amount of grain that they so that's a min of interest and Basically grain causes stomach upset in cows really serious And so the idea here was we're looking at rumen rumenal fluid cows of multi-chambered stomachs and They're sort of walking fermenters So we looked at this using quantitative NMR metabolomics Um, so that's the data set that I'll sort of step you through and this is one you could use if you just wanted to walk through Um, so the first thing we do is upload that data. So it's a set of concentration tables Tablets have been identified. We're looking at different states. There's four populations If you want zero 15 30 and 45 And what's done here is after we've uploaded the data. There is this Essentially a data check. It's looking at how many samples there are four groups They had They found that the samples were not paired. They're 47 variables. Everything's numeric. So it's just reporting back saying everything's sort of There are some about three percent Of the data entries had zero values Sometimes this can be problems for metabolomics And then there can also be missing values. So someone made an entry mistake or they didn't put something in um so Typically for missing values and zero values What is done just so the statistics don't mess up is you replace those with a very small value And that value is usually chosen amongst some the lowest values divided by I don't know a factor five or ten It's a five. I don't know So it still allows you to work with normally distributed data But it's it's safe to assume that these values are below the limited detection Um, you can impute values. That's infer or calculate Values if you want to do a little bit better in terms of how you handle the missing values Or you can just say we'll take those low values and carry on and usually that's fine The next step is normalization or scaling And as I say normalization means many things and it's unfortunate that we use the term But we can normalize the rows And in this case the rows are The samples or And the no the rows rows or samples in this case and the columns are the Metabolites So We can choose not to do any normalization and Part of that is maybe looking a little bit at your data. We can normalize by Some median or a reference sample We can normalize by a reference feature Or we can do some sample specific normalization So there's lots of choices in this particular case. It shows A pooled average from the sample group and that's usually a safe one And then the next one is trying how do we normalize the metabolite concentrations And in many cases particularly with lcms Metabolite concentrations are quite distorted They can also be somewhat distorted or skewed in nmr. And so you can either generally Log or auto scaling works well In this case We chose auto scaling because we found auto scaling work better than log and work better than paraito scaling But these are options and these are things the intent is here to Not just say oh, I'll click this and I won't think about it What you're going to see is a result shortly and you want to be able to assess that result and whether that's worthwhile so Say the the data as it's coming in and we've got these samples and rows and the compounds or peaks Or bins are in the columns Um, and these are the different normalization routines for row wise and column rise And then I deal with the row wise that is dealing with the samples Is to sort of make these things Comparable so if there were dilution effects if it was rumen this would help scale things so it's More formally, I guess I would call it scaling with the normalization Column normalization is to try and make things in this normal distribution Need the normal distributions to do standard statistics Um, and there's several methods that the metabolismist offers. So in this case, we've chosen auto scaling And this is the thing you want to see so Here is what your distribution. These are your metabolites. There's about 40 here And most of them are pretty small. They're in tens of micro molars But then there's a few of them acetate and butyrate which are Really really high So the net effect is that in terms of a distribution For concentration, this is a classic exponential or extreme value distribution This is not useful for statistics If you just use this without transforming or normalizing you would get awful results So what you do is in this case, we've used the auto scaling function. This is To normalize and it converts this really bad skewed value into what almost looks like a perfect bell curve And you can see that acetate Is been nicely normalized and butyrate has been nicely normalized and so We now have a distribution which is normal Now we could have gone back and said are we tried auto scaling it didn't work Well, let's try pareto and this now maybe looks better or we could try the log scaling and so the idea is it can be interactive And you can't know in advance really so this is why this function is particularly important And it's particularly Key in any kind of data analysis to make sure that your data is normally distributed So we've now got the metabolites normally distributed the same thing could have been done with genes or proteins It's the same thing that's required So now we can also deal with some quality control Scale things and so we can do usually by visual inspection. You can't really write a routine for this but trying to look for for peace or Concentrations or entries that are just out there and these may be typographical errors If you think you can prove as they are you can remove them or fix them Um In other cases that quality control can also help with the noise reduction trying to get rid of some of the spectral bins So this is a it's a qualitative thing And that's what quality control is about you can't really Define exactly the way to do it. You just have to do it by eye and I think you just have to try and do it honestly Where you're you know, not just trying to Selectively pick the best data So here's what an example of an outlier looks like You can see an outlier sometime in your scores plot There's something that's way out there. You can see an outlier in a heat map plot And as I say this either indicates a typographical error or something very bizarre that Um You probably want to sort out and if you can't figure it out. It's just going to cause you no end of trouble So in this case you remove the the the value and Justify it to yourself or to your supervisor Of why it should be taken out You can remove things so you can actually go into the data editor in metabo analyst And look at your data tables and you can Decide if this particular liar is worth removing to to get rid of it Yeah, well unfortunately, I think that's basically the way it is. I don't think there's there is a formal algorithm that we can use. I mean, I think You know every data set is different and some of it will be quite different depending on the circumstances of How the samples are collected the instruments that are used how the data sets have been merged and and so I think there you know, there are probably more formal approaches It could systematically calculate each of the normalization steps and then calculate a fit to a normal curve and And come out with some quality thing But the reason why we tend to do this interactively is because It it would take too long as a web server And and so It's sort of a compromise and in fact there's There are issues that you'll see here where to make it a functional web server We have had to sort of compromise on a couple of things So we were looking at the quality check outlier move where then there's this noise reduction Sometimes this is the case of dealing with very small values Others that just seem to be constant values again. This can happen in some cases So I think right now we still have this limit of 5000 Points coming in. I don't know if that's still Necessary, but this is a again trying to get rid of Really arguably low value noise, which will have low value in the analysis, but which unfortunately increases a lot of the computation time So if we had everything local Something we may not worry as much about but because it's a server it is something we have to worry about Now a number of groups around the world have actually downloaded metabolism and run it as a local system for their whole laboratory. In fact, many of the larger National centers, I think the Netherlands and Australia. I'm not sure if people in states are using it yet, but Having a local version makes things obviously a lot faster Um, so there are ways of selecting or removing noise Things that have very low variance things that are very very low in intensity And there are ways that you can select to do that noise reduction Again, it's sort of interactive. It's up to you In the case of of NMR Data or quantitative data. You don't have to worry about this in the case of untargeted. It is an issue typically The data reduction is okay. Now you've cleaned up your data. You've filtered the noise. You've Normalized it and scaled it You've spruced it up. You've packaged it. So it's ready to go into Things like PCA, PLSDA and all subsequent analyses So this is where we're going to sort of launch you guys on your own Taking you through some of these slides about here's the data set you can analyze here's how you should be able to probably tweak it how you could fix it in this case the The data from the cattle But the the the next step is is you know, let's let's find those important compounds Let's see if we can interpret them in a biological way and most of you are not experts in cattle metabolism So maybe this will give you some insights to it We're going to look at different phenotypes. We're kind of trying to look at different ways of classifying You guys could instead of using the cattle data, you could start using the xcms data, which is on We think it was a knockout study on fatty acid hydrolase We don't know what the biology is on that one, but So I think some normals and some knockouts Find out what was unusual about that one But what I'd like you to do and suddenly we came in after I started Is to use the slides To take you through the rest of metabolism. So turn on metabolism Go to the web And use the slides and there are a series of questions that are asked the slides Give you different options about what you can try Give you some example output what you should be able to repeat That gives you some views that you should be able to try and then here are some questions specifically for the cattle data To to try and answer when you can try and write those answers in your book You can work together. You can work alone We're not Prohibiting you from doing anything on that regard Um You can Plot try the hierarchical mapping you can generate different images gain some more questions Looking at patterns of change and trends Correlations more questions We have some PCA plots that you should could be able to generate so if you're loading plots Again, we can see some of those things are up in the upper corners or lower corners Are the ones that are most responsible for that those kinds of separation You can go from 2d to 3d Some more questions to answer You can plot scores There's lots of evaluations of q-stars. You can look at the variable importance plot or vip plot Identifying the most important metabolites their weightings what they're contributing This is a particularly useful plot Frequently shows up in many metabolomic studies You can test the performance and the quality of your plsda plot This is a quite an exceptional separation in terms of significance through permutation tests And you can look at and answer some more questions Um, so i'm just skipping through this But these are things that you can do with the data with sample data You can choose not to do it and just work with some data that you have on your own So you have the next um I guess uh our To plug away on this So we could either Download a file. This is the cow data There's another human data set um So you could just click under there. So this is under data formats So that's one way so you can put the data locally on your computer alternatively you can just start package and It's in the upload section and you can scroll down you can say Try our test data And you can choose uh, here's the room in one You can click on that one and uh submit so Different ways of getting the data or using the data Um, you don't have to use the cow data. You can use the cancer data use the You can use that one So as I say you're on your own at this stage do what you'd like if you have a preference if there's whether it's physiology biology or platform that you prefer We'll take you there. So we're going to give you another I don't know 25 minutes or whatever. I'm kind of losing track of time We're going from 130 to 3 Okay, so we're going to give you until about 245 and then we'll get into the pathway stuff So what you guys should try and do is see if you can step along till about slide 82 in module seven and then at that stage you can say you're you're ready for the next phase And again, you can work with the slides But there's as I said a very detailed nature protocols tutorial Which is somewhat similar to this with a lot more detail just at the Back of module seven So get to here That's your last slide H41 Yeah, you'll be if you go through the the tutorial or the next 40 slides You'll it'll tell you so you upload this try this So just sort of follow that slide set along Uh This is your task just starting from there Some people is wireless It's on stable. So So what happens is keep staying basically You want to go back and So actually when you're using a metabolist in your real work use a hard wire to connect People don't use wireless people wireless Just unstable once you do the idea This And So when you are reading uh tutorials and once I just Your screen will sometimes shut off to save the power and that is sometimes the The connection the wireless And I I see there is a I should go on the range and show how the issue will be mentioned. But I think, I mean, that's the point. But I'm not going to dispute the report. Visually, we just wanted to say that it's done probably once a week. It's basically normalized. It's better, yeah, it's better when you want to compare one sample, for example, if you have another sample, do you want to compare? Do you want to fix the value? Otherwise, you don't have the real value. This time, we're going to do meaningful ones. There are multiple studies that are moderates. If you want to compare, you want to fix it. And I'm going to fix it. So how are people progressing? Are you getting towards page 38, 39, 40, and 41? Are people still on page 20? Again, feel free to work together. You don't have to do everything independently. Okay, I think what we're going to do, it's now quarter to three. And we're just going to go into this about five or ten minutes of the metabolic pathway analysis, which is part of Metaboanalyst 2. They'll still give you five or ten minutes at the end of this session, but you can also work through coffee if you want to look at some of these other things or to finish out what we were doing on Metaboanalyst. Michelle? Okay. So if you flip to page 42. So if you raced along, you would have started, you would have gone through some of the aspects where we were looking at these cattle, dairy cattle data. You also might have looked a little bit at the metabolite set enrichment analysis or MSCA. And then we're just going to go straight into the metabolic pathway analysis, or METPA. METPA actually was written as an independent module and published separately some years ago, but it's now been integrated in Metaboanalyst 2. As I said, we've gone from spectra to peaks to compound lists and compound lists to some data analysis, and now I have to go to pathways. So METPA was originally written to help enhance MSCA and was designed, yes, Michelle. It's actually still going, I think. That's great. So there's a good amount of blank here, but anyways, it allows you to look at pathway structures and also allows you to support pathway visualization. It works on 15 model organisms, including humans, but fruit flies and rabbitopsis, I think, and a few others. A lot of that is based on data that is a keg. So you guys are familiar with this page, and what you guys originally were clicking on was statistical analysis. And if you click on enrichment analysis, that's MSCA. And now if you go to pathway analysis, this is what you'll see. And in this example, or what we'd like to do after my little spiel here is to click on the use example data. In this case, it's data from a set of cancer patients who had what's called tachexia. So this is a muscle wasting phenomenon that happens in about half of all cancer patients. And the idea is, are there ways of actually identifying or predicting individuals who actually will develop tachexia? Because this is an important prognostic indicator. The individuals have lung and colon cancer, and as I said, about half were going to experience muscle wasting. At the time the samples were taken, we didn't know whether they were going to develop tachexia or not. So this is essentially trying to look at what are the processes leading to it, what are the processes that might allow you to predict it. So just like we did before, we do normalization. In this case, a row-wise normalization is not necessary, but there is auto-scaling to try and get normalized data. And then we jump down from the statistics or data uploads to sort of setting the parameters and the pathways. And so this case, we're trying to choose which pathway. These are humans, so no point choosing chickens or zebrafish, so clicking on humans. And it's now doing what's called network topology analysis. So we have a reference metabolome, and we're looking at pathway, either pathway enrichment analysis and pathway topology analysis, and there are several choices. These are explained a little bit actually in the paper on METPA. I won't go into a lot. You can ask Jeff as well on some of the specifics. But this is relevant not only for metabolism, but it's also in protein networks. And a lot of the theory and concepts were developed for this. They're also emerged through web and internet traffic analysis and other things. So there are hubs and there are bottlenecks. So hubs often are considered to be the most important proteins, genes or metabolites. They're highly connected, just like hubs and spokes in a wheel. So the pink ones are actually hubs. Then there are things that lead into potentially other areas and those are called bottlenecks. And the blue node here is identified as a bottleneck. And in the field of graph theory, there are measures of these types of hub-ness and bottleneck-ness, and so this is called degree centrality and between-ness centrality. So a hub has a high degree centrality and a bottleneck has a high degree of between-ness centrality. So one of the things that you can do is plot out these pathways. And this uses a very simplified view of the pathways drawn out. So this is keg and these are keg identifiers. And in this case what we're seeing is the pathway for glycine-serine metabolism. In this case what we're seeing here is a plot of the importance or impact of the pathway. And then there's sort of a negative log P, so the equivalent of P value. Minus .01, .05, .005. So the higher the value, the greater the negative log. So this node up here, which is an extreme end here, seems to be the most important one with respect to the Kexia. Clicking on that node, which right now just shows up as a point, pops up this visualization of glycine-serine threonine. So it suggests, as I say, that based on its position, pathway impact, log P, that this is the most important pathway. Clicking on any one of these ones, which are sort of up here, and you'd have to do this yourself, will identify other pathways of interest. Yes. Jeff can probably give you, yeah, a measure of both the centrality and the... What you're also seeing is that there's, of the pathways, or the tamelights that were significantly perturbed, many of them are in this one. So you've got one, two, three, four, five, six, seven, eight that are significant. There's a coloring scheme, which also indicates how important they are. And given that NMR is only doing a light sampling of metabolism, the fact that you're getting so many is quite unusual, and that's why it's getting this sort of off-the-chart impact, as well as off-the-chart P-value. So that's telling you or associating you with certain aspects of catabolism or anabolism. But I think it's also important to remember, and I brought this up before, the pathways that we're seeing are just those, the catabolic and anabolic pathways. And many, particularly branched chain amino acids, a lot of charged amino acids, have vital roles in signaling, in signal transduction, in hormonal regulation, homeostasis, osmolite sensing. There's many pathways that are not in KIG, and many processes that are not captured. So this is a, you know, first path gives you some good indication, and perhaps there is a, well, since KTXC is a catabolic process, this obviously suggests there's something going wrong in this pathway. We can zoom in through the nodes that are isolated here on the control panels. You can click on things and you can actually see the distribution box plots in this case, a steering that is elevated KTXC patients, but diminished in control patients. So each of the nodes or compound IDs are highlighted. Obviously it'd be nice to, and I don't know if there's a way, so converting the KIG view to actually compound names rather than KIG compounds. But again, since Jeff is here, if you've got any requests, fire them to him. He'll be happy to change things. And then here's a quick question, but these are results that you can see further on by clicking appropriate or sliding down, and gives you some of the actual numeric values. So one that was furthest up in the corner was glycine serine, but then we can see branched chain amino acid. There's sulfur methane metabolism. Then you're surveilling branched chain amino acid again. So these two show up twice. You can also see arginine and proline being involved. You can go down the list at some point. Some of these things are perhaps only mildly significant. Jeff has produced a number of statistics that you can look at with respect to false discovery rates, measure of impact. There are pathway diagrams that are much more detailed, that are hyperlinked both to KIG, and also to the small molecule pathway database that I talked to you about before. One thing that we probably need to do particularly, and I don't know if Rupa mailed you about this, but it's a little bit of cleanup we have to do with respect to some of the KIG pathways. In principle, there shouldn't be a, at least for humans, there shouldn't be a KIG pathway where there isn't a SMIC DB pathway. So I'd like you guys to try that. You still have about five or ten minutes to left, and you can play around with that. But Metabolanalyst has a lot of things, and in fact, Jeff just informed me that it does have the man, Whitney U-Test. He's put that into a non-parametric part of Metabolanalyst, too. He has random forest. He has support vector machines. He has self-organizing maps. He's got time series data, which actually is quite useful. I haven't seen anything really that does that at all, except in Metabolanalyst. Tools for peak identification, for those of you who are actually looking at that data for what we did this morning, may be able to get to starting to identify some of those compounds putatively. Those, there are tools in Metabolanalyst for doing that. I've said before it's quite popular. The fact that you have the developer right here is someone who's very responsive to requests and suggestions, and it's through workshops like these that we've been able to make improvements steadily over the years. And certainly in the last weeks of the Table Loan Society conference, lots of people were using it and posting things from the website on their posters and presentations. This is just a quick shot to the time series analysis, and some cool clustering options you can do. Some of the quality checking that allows you to look at batch to batch variations, which is sometimes quite important. This is an example where things are going nicely, nicely, nicely, and then something went terribly wrong. Often people don't look at that closely enough, and this is a source of systematic error. And it's actually more important than you think. If I had more time, I could spend the whole afternoon about the issue of quality checking. Most labs don't spend near enough time.