 Thank you everyone for being here. I'd like to go over a brief overview of what we're going to cover today, which is to address unsupervised learning algorithms in the context of dimensionality reduction. Namely, we're going to review two techniques, principal component analysis and non-negative matrix factorization. And we're going to put both of these in a specific example, which is to explore the decomposition of reactive materials. So if you're a veteran of this series, you'll know that we've been able to go through various different modules that have explored where you looked at learning from data. And this can come in the form of a cyber infrastructure and data repositories. In previous sessions, we've developed predictive models and computed classifications for materials. And specifically for this session, we're going to look at dimensionality reduction, where we're going to have an n-dimensional space our data exists in. And we're going to try to see if we can extract out certain features that can still explain the vast majority of the data. But this reduction algorithm will allow us to take our data further to do more post-processing than to have to consider all the data that we've captured, either in experiments or in simulation. And at the end of the day, this is to help guide our design of experiments to ensure the highest probability that our next experiment will be successful. As I stated, this session will specifically focus on dimensionality reduction. But if you're interested in any of the other labels here, they will be posted on the website shortly. So as I stated, we're going to go over dimensionality reduction algorithms, mainly principal component analysis and non-negative matrix factorization, and put in the context of a chemistry example. So I like to start out by breaking apart machine learning into two general categories, that being supervised and unsupervised learning. In the earlier sessions, we looked at supervised learning, where you take inputs, pass it through a model, and that provides you with some sort of outputs to explain what's happening. For example, we covered linear regression, where we take in a multi-temperature and try to predict Young's modulus over a variety of training data. And of course, we covered classification, where we read in various features such as ionic radius, connectivity, Young's modulus. And we tried to put them and categorize them based on their crystal structure. What we're going to focus here on is the second branch, which is unsupervised learning, where in this case, we're taking in a wealth of data that we've captured either through experiments or simulation, and we're going to try to abstract out certain features that explain the majority of what's happening in our data. This can come in the form of either clustering or dimensionality reduction as two broad categories. In clustering, what you can see here in this example is that we have certain features that we can extract out in our face in this image of a person, such as the eyes, the hair, the hats, or the sky, and the roads. And it's able to categorize that by doing a cluster algorithm, shown here as a K-means clustering technique. What we're going to focus on in this session is dimensionality reduction, where you can see in our base example of the number eight, we have an n-dimensional space, a certain number of components that explains how we can draw the number eight or how we can define it in our data. What we want to do is reduce the dimensions to something lower and still retain the overall outline of our original image, this being the number eight. So I like to start out by explaining dimensionality reduction and try to compare it with the algorithm that principle clone analysis actually does. So what we want to do here is we're showing a simple molecule composed of three atoms, and we're going to try to understand if we can actually describe the motion of these atoms in the system over time. And what we can do is we can break apart this complex motion into a linear combination of independent motions. And what you can see is that for n number of atoms, this being a three-dimensional space, we have a nine-by-time matrix for our complex motion of the molecule. And this can be broken down again into a linear combination of three independent uncorrelated motions. In order to solve this problem, we need to actually compute the covariance matrix. And as the name suggests, what this matrix contains is the correlation between any of the features inputted at the specific index. The correlation between feature one with feature two will go into the index of one-two, so on and so forth. And in order to solve this problem, as it is a linear algebra problem, we want to diagonalize the matrix. And what diagonalizing the matrix will do is we essentially want to eliminate the off-diagonal terms of our n-by-n matrix, covariance matrix. And in doing so, we ensure that we decouple or decorrelate the features from one another, thereby maintaining orthogonality or making sure that the motions are independent from one another. In solving this eigenvalue problem, we'll get both eigenvectors and eigenvalues. And if anyone from science or chemistry or physics or even computer science will know that this is akin to normal modes analysis, where eigenvectors will determine the normal modes or the correlated motions of the specific molecule broken down independently, and the eigenvalues are correlated to the frequencies that exist. Now, as that toy example, I'd like to move on to feature extraction using PCA and determining facial expressions. So to review, again, the mathematics behind how PCA works, first we need to compute the mean. Using the means, we then need to calculate the covariance matrix defined here. And then we need to solve the eigenvalue problem. And in an example of facial recognition, you can see that when we compute our eigenvectors, we get a certain amount of the original image described by each of the features here, going from the first to the mth eigenvector, or in this example, going from the first to the 500th eigenvector. And the way it's categorized is that the first eigenvector will describe the most information based on the original image. What I want you to take away from here is that eigenvectors will encode features. So the eigenvectors of our PCA analysis will actually relate to features of the original image. In this case, noses, eyes, mouths, as they, you do a linear combination to sum them up to recover the original image, which is the face. So let's get started with our first Jupyter notebook. So I would like you all to do. Hopefully you have a NanoHUB account. You can please follow me and go to this link here, nanohub.org, slash tools, slash dim red map decomposition. I will jump over there. So if you open up your web browser and go to that link, you should pull up this tool here, and we're going to launch the tool. And doing so should take you to the landing page. So I'll wait a couple seconds for everyone to catch up. So if you follow along and we're able to make it to the landing page, we're going to first look at this example, very first Jupyter notebook, which is an example of PCA. So clicking on that will take you to this browser here. And it's going to take you through an example of doing PCA in terms of visualization. So I like to just qualify that I've added some comments to help guide this example in more detail. So you're probably going to see more comments here than the actual tool, but hopefully it'll make sense if you're following along in this recording. So what we're going to try to do in this example is we're going to try to understand how PCA can extract out correlations in our data set, as I described previously. What we want to be able to do is use PCA to then de-correlate our data and to show that the eigenvectors that we obtain from our data are actually orthogonal to one another and to understand how the algorithm actually works. So I'm going to go over this toy example where we're only going to address two input variables. But of course, as I stated, you can do PCA on an n-dimensional space. It should handle it. And I will bring up two cases where this PCA will actually fail when this example was set up to succeed. So in this example, we're going to have a population and we're going to state that in this population, we know that there's some correlation between height and weight of individuals. We don't know this correlation, but we know that there is some relationship between them. So I want you to imagine sampling or querying from 500 different people and we're able to obtain their information. And then from the information of both height and weights that we get from these individuals, we're going to normalize the data and we're going to assume that our normalization follows a standard normal distribution. So the following lines of code will set up a scenario assuming that we've already done the normalization aspect. So going through the code, we're going to import two libraries. The first one is NumPy, which is a very popular package in Python. And the second one is Matplotlib, which will allow us to do plotting and visualization. Now this might be a new function or a new statement from any of you, even if you do Python. What we're doing is we're using the random class in NumPy and we're going to define a random state. What this does is it allows us to define a container for pseudo random number generator, one being our seed number. This allows us to generate random numbers drawn from a variety of probability distributions. You'll see where that comes up later. So we're going to do a little bit of math here, but essentially we're just trying to, at the end of the day, we're just trying to set up our data so that we see some sort of correlation. And so we're going to find two different variables, A and B. A is going to be a two by two array. The shape is two by two. And we're going to populate it with random samples from uniform distribution over zero inclusive to one exclusive. The second thing we're going to do is have our array B, this being a two by 500 array now. And we're going to have, it's going to be populated with random samples such that we pull from a standard normal distribution. And again, don't get bogged down in what the actual functions do. We're just trying to set up our data in a way that we can see correlation. And so what we're going to do is a dot product of two matrices. If you remember your matrix multiplication, a dot product, the solution will have A comma B. It'll have the rows of this and the columns of this after you do the dot product between the two. And then we're going to do a transpose of this. So at the end of the day, it's going to return an array of 500 one by two individual rays. So I wrote this code a little bit out for a print statement just to see what the X value is. And we're just going to look at the first 10 entries of our defined X variable. If you plot it out, you see that we have 10 one by two arrays of the data that you can assume is just then we've normalized our heights and weights somehow some way. And following from that, where they're going to plot both the height and the weight normalized height and weight, excuse me, as a scatter plot. We're going to set our grid color, red line color to gray line with the point five. Make sure the axes are equal in length and then we're going to show it. So if you run in that cell, you should see a plot similar to this where again, this data has been forced to kind of show a trend, but we see that there's some correlation between normalized height and normalized weight. We're going to see if PCA can actually learn this. I want to point out that in doing this development of our data points, you don't see data in the top left corner or the bottom right corner of our plot. And this is to avoid specific behaviors because as this method is a linear decomposition algorithm, we don't handle nonlinear data very well. We can't really handle multi-galaxy and data distributions. So this example is set up so that PCA can work perfectly, but know that it works best if our data has some sort of linear correlation with one another. And so we're going to use PCA to try to attempt to learn this relationship. So we're going to move on to the second code cell now. What we're going to do is we're going to try to interpret what PCA is actually doing. And so what we're going to do is we're going to import the PCA object from the sklearn.decomposition module. We're going to use our PCA object. And if you would like, you can go to the sklearn.decomposition website. It's in Python. And you can read what the arguments for the PCA are. We're going to have one argument and it's going to be very simple. We're just going to try to define our PCA as decomposing it into two components. We just want to do a two-component fit. But if you want, you can go online and look up more arguments. We set that to a variable. And then now we want to call the fit on our x data. So if you want to print the PCA, what it actually does is you're going to see, in order to run this cell, apologies if I forgot to say, but to run the cell, you hit shift, enter, or you can hit the run button. If you want to see what this PCA actually does, I stated that it's an object. So it does the fit, but it doesn't output visualize actually what the fit is doing. You can see that here's the PCA applied to our data. And here are the arguments that you can modify also shown in the documentation. The only thing we've changed is now setting end components to be two. So when we do this fitting, we can actually learn some quantities of interest. And the returns are going to be components and explain variants. So in this example, we're going to do some prints and we're going to see what they actually mean. So here I printed out the components of our two-component fit. And I've printed out the explain variants again of our two-component fit. This is the component explaining the first component. This is the eigenvector. This is the eigenvector of our second component. And then these are the explained variances. And what these values actually mean will be explained in the next slide. But I can go over explain variants. It's fairly easy. What it's saying is that in our two-component fit, we're decomposing our data into two components. This is how much each of the principal components explains the variance in our data. And quickly you can do a summation of them and you can show that if I do a two-component fit to our data, the cumulative explained variance of the two components represents 77% of our initial data through being explained in the variance. So hopefully that kind of made sense to you. We're going to go over how to actually interpret the components explained variance in a more visual aspect in the next code cell. So what I want you to know is that when we use our components, it's going to define some sort of direction, right? It's an eigenvector. So it's going to be a direction, in this case, normalized heights, normalized weights. It's going to be a two-dimensional vector. And the explained variance is going to relate to the magnitude of said vector. So going through this third code cell, essentially what we're doing with this definition here is we're trying to create codes such that we're able to draw the vector, draw the arrow on overlaying our data. And so this is a fancy way of defining a dictionary that essentially defines a style of how our arrow is going to look. So you can see that this is the arrow style we want, this is the line width, a bunch of other parameters, right, the color we want it to be red. And then when we do an annotate, it reads in the variables and this is how we, whatever we read in, we're going to put the arrow in the figure using the defined style. So again, we're going to plot our data. This is the original data scatter of our x-original normalized weights, x-original normalized heights. And this alpha just adds transparency to make sure the data doesn't overlap. We're going to loop over, now iterate over both the explained variance and the components and calculate some vector v and then actually draw that vector v. That's all the lines are saying. Then we're going to plot them. So if you hit shift enter and trust me if the code works, what you can see is our original data in lightly shaded blue open circles. And you can see that we've actually drawn our two principal components onto and mapped it, projected it onto the original data space. And so what this actually shows you is that the first principal component explains the largest variance in our data. And the second principal component explains the second most largest variance in our data. And these are the principal component axes along which the data lies. So hopefully it makes sense now what the components, the arrays themselves are doing and what the explained variance actually means when you do the mathematics and project it onto the original space. So this had the largest explained variance. Therefore the magnitude is longer. The component itself is an array that defines this direction. And again, this is the second most explained variance. And we only have two arrows because we only wanted to fit our data to two components. So now let's move on to actually reducing the dimensions of our PCA, of our data using PCA, right? This is dimensionality reduction. So what we're going to do is we're going to develop our model object again. I'm going to call it PCA2 this time, but we're going to call it PCA object. And now we want to have, we want to describe all the data with only one component. Now we're going to actually do another function in the PCA module. And it is fit underscore transform. So remember previously what we did was we just call PCA2.fit. We did this, what I now call step one. We did PCA2 of the fit. So I'm going to run this and show you what fit does. What fit does again is just creates the object and ensures that what we pass as arguments to our original PCA, which is components equal to one. It's going to learn that here, components equals one. It's going to pass that along. So that was what we did previously in the step. But note that this function is slightly different. We're adding now a transform function on top of it. So if I uncomment these lines and do step number two, which is after we've done our fit and we've modified our variable. Now we're going to transform the data. I'm going to print the first 10 values of step after doing step number two, which is after we fit and after we transform. And you can see what we've done is we've reduced originally we had two by, we had a 500, one by two arrays because we had two component fit. Now we have 500, one by one arrays because we're fitting to one component. And what this command actually does, if you do fit underscore transform is it actually does both of them. It is a two in one step that does the fit then transforms the data and manipulates it in a way that we're only returning the one component. So it's a two in one step. What I want you to notice is that this actually provides our data in the reduced dimension defined by the fit. And it actually provides, so we're going from a 500 by two shape to now a 500 by one shape, right? This is our transformed data going initially starting at two component, 2D, and going down now, reducing the dimensions to a one dimensional object. Because we have one component, we only have one eigenvector, and therefore we only have one explained variance, which is that first component. So now that we've transformed our data and it's been reduced to a single dimension, let's do another function that allows us to actually visualize going from the principal component space back into our actual data space, which is normalized weights and normalized height. So now we have to do an inverse transformation of our transformed data to put it back into the original space. So if you run this cell, what we're going to do so by defining this new data set, x new, and we're going to plot them both on top of each other, right? We're going to plot the original data, what is our normalized height, normalized weight with some sort of transparency. And then we're going to put on top of that, we're going to put our x new, which is our inverse transformed after doing PCA of now a normalized height space and normalized weights. And we're going to have some sort of transparency. And again, the axes are set to be very similar as the lines above. And what you now see is that we're able to take our data that could be explained by two different principal components. Remember the first vector pointed in this direction, that was the most explained variance. The second vector was this direction, that was the second most explained variance. And what we've done is we've gone from two components to one component by completely getting rid of the component that explained the least amount of variance. And so we've mapped all the points that have variability in that principal axis, and we've projected it down to this line to the first principal component axis. So essentially we've collapsed all this data onto major principal axis. So that is what PCA is doing when you do dimensionality reduction. The components explain the largest variance in your data. And when you do dimensionality reduction, you are getting rid of variability in the, you're getting rid of the principal axes that show the least amount of variability or explain the least amount of variability in your data. And then you kind of project that onto the remaining principal components that explain the largest variance. So hopefully that gave you a little bit of a flavor of how PCA works. And with that, I'm going to go back to the presentations and move along, oh, sorry, move along to the next portion. So as I stated, we went, we started by looking at normalized weights versus normalized heights, which is a two-dimensional space. We've done dimensionality reduction using PCA to reduce it down to one dimension. And then we projected it into the one-dimensional space. And we've actually been able to successfully learn this linear correlation. Of course, this was a toy example in a setup in a way that is understandable. So let's actually move on to another example in the context of chemical decomposition. Before we do that, I'd like to bring up the concept of non-negative matrix factorization as we have been focusing on PCA the entire time. This is a very good paper. I'd urge you all to read it on your own leisure. So the authors wanted to test the effects of PCA and non-negative matrix factorization by looking at facial recognition of the original data. And this is just an image of a face. And when they conducted this PCA algorithm on the face, you were able to break it down into a product of two-cell matrices. Again, the principal components and the weights. And if you do the dot product, you should recover the initial image of, say, the first principal component. What I want you to take away from here is that the weights are a varying color, white, black, and red. And what red actually indicates in this paper is that we have weights that are negative. And if you think about it in the context of this algorithm, all we're doing is diagonalizing the covariance matrix. So it makes sense that we can get both positive and negative coefficients. But when you're thinking about it in terms of image recognition and interpretability, negatives really don't make any sense. What do negative eyes mean? So it doesn't really make any sense in the context of interpreting the algorithm, PCA algorithm. So now we're going to move on. And they looked at non-negative matrix factorization. And what you see are vastly different results, shown here are only white and black, which are zero and positive values for the weights, as well as their components being positive and zero. And this is the hard constraint of what is considered the non-negative portion of this matrix factorization technique. As you can see, it's more interpretable because we can extract out certain features in our components that are attributed to the overall facial expression. This being eyebrows, maybe a little bit of the mouth, so on and so forth. And so NMF actually provides interpretable features when you start adding them up doing summations to recover the initial image. So let's move on to the next portion and explain it in terms of a chemistry example. Let's look at the interpretability and make sense here. So if you can imagine a simulation or experiment where we start off with a bunch of reactive material, it's stable and ambient conditions, and then we do something to it. And all of a sudden the material takes away, reactions start to occur, things start to happen, the atoms are interacting with one another, and it's a very complex process. And what we want to be able to do is use the information we obtained from here. We want to be able to down sample or extract out common features of our complex chemistry and explain it in just very simple curves. That someone from an introductory chemistry course might realize as, say, a reactance starting at a fraction of one going down over time, forming intermediates, and final gas products. The end goal of this is to develop great equations which we can then pass into mesoscale models or continual model models and actually use this data in a meaningful way. A naive approach to doing this would be to track all the atoms over time and track how they interact with one another, bonds breaking, bonds forming, their positions evolving. You can imagine that this data set will be very, very large and it'll be very hard to handle it in do post-processing. So what we are proposing is that you can do PCA or NMF on this manipulation of the data somehow, some way, and we can extract out only a finite number, 280 different environments that explains the overall behavior of what's happening here. And what you can see from these curves are shapes that more or less can map to the common understood interpretation of reactance intermediates and products. You have curves going down, curves coming up and going down, and curves going up. And what you can see is that there's a huge bundle of data that are more or less centered at zero, and this is information we can throw out. We can do dimensionality reduction to eliminate these unnecessary curves. So I'd like you to move on to the second notebook so you can follow along and go back to the landing page. We're going to now go to the second example, which is shown here. So if you click on that, you should bring up this notebook. The overview goes over what I just explained in terms of the context of the problem. At your own leisure, you can read about PCA and NMF and what it does. But in the interest of time, I'm just going to get started with the actual notebook. So we're going to run the first code cell, doing so, shift, enter. And what you're going to see is we're going to be able to plot the data of what we're reading in. This is our file. What's interesting is we're going to read the data and show the first few lines using pandas to actually manipulate it and show this. This line of code just says that we're skipping the first six rows, which are useless information in terms of header. And then we're going to show five rows plus the very first one, which this is considered the header now. So you can see that our data is situated in such a way that we're going to have to do some manipulation. So because this code takes about one minute to run, I'm going to ask you to run it and then we're going to go over it briefly. So shift, enter to run the cell. You're going to see that it's going to start running. So to go back to the very top, what we're doing is we're going to try to manipulate data so that it's in a more readable form. And that we can capture each of the individual bonding environments in that 280 different combinations or different features. The reason why we come up with this 280 is because for CHNO atom types, this is organic reactive material. We have a finite number of combinations and we assume from basic chemistry that we can only have a maximum of four bonds. And here are some examples of how the bonds can look. These are different molecules we know from intuition that can happen in our system. So I'm going to import a bunch of libraries. These just help with post-processing our data. And I'm going to just go over some of the code. So this is a simple line. If you're not familiar with Python, we're just opening our file and we're going to read it. This is the R. We're going to loop over all the lines in our input file. And we're going to start delimiting or splitting the line based on spaces, or in this case, a white space character. This is in Python how you define a conditional statement. We're going to say that we're going to take the second item of our line that we've split, and we're going to make sure that it matches exactly the expression of time step. Why is it the second element, you would say? Because in Python, we start out counting at zero. This is for loops. This is for comparisons. This is for arrays. This is for lists. And if the condition is met, we're going to take now the third item and we're going to pass it to some set variable. Hopefully that gives you a little bit of a flavor of how what we're using in terms of this Python code and the very root of metric level. So I'm going to go down and hopefully, yep, the bond table is done. The one thing I will briefly go over is that what we're doing here is we're writing a file and we're going to read from the file, which is okay. But the one thing you need to know is that sometimes if you're going to write a file out and then actually read directly from it, it might not actually be written completely in time. So what I want you to take care is to make sure that you flush your output file. So we're going to visualize the bond environment. This is going to essentially bring up the curves that you saw previously and running this. So again, to review, we want to extract out three commonly correlated curves that are could be represented as a loss of the initial structure, something coming and going and something coming up. And so how many global concentration profiles or similar shape curves do you observe? Because of our basic chemistry example, we're going to say three looks like we can explain this by three correlated curves. So again, we're going to develop the PCA model you might have seen as previously where we import the PCA object, define our number of components to be three. Put that with a bunch of other arguments, but the main important part is that we define our PCA with three components and fit our model. And again, in order to actually obtain information about it, we need to transform a model after fitting and define the number of components. So in the interest of time, I urge you to run this code and you can see that the amount of variance explained by each component. The first one explains over 93%. The second one is 5% and the third one is 1% for cumulative explanation of the variance of over 99.6%. This is a great, great fit. Three components is more than enough to retain the original information of our original data. We're going to visualize what these three components actually do. And uh-oh, it doesn't look like what we'd expect from a concentration profile. We actually have some negative values, which we might say is not really interpretable in terms of concentrations. So maybe PCA is not the way to go. Maybe instead we should move on to applying the NMF algorithm. So I'm going to move on to this code cell and we're going to run it. Again, what we're doing is developing the NMF model, calling NMF object, again defining three components, defining our model, putting in three components as our argument, doing a fit underscore transform, as you previously saw, and extracting out the number of components. Doing so, we'll run the code very quickly and we've completed the application of the model. Now let's actually visualize what's going on here. And wow, that actually looks very similar to what we'd expect. First, we have all the curves being greater than or equal to zero. We have a curve going down, a curve coming up and going down, and a curve coming up and staying at a number greater than zero. However, you might think, well, it seems interpretable, this is great, but we actually need to normalize, because remember concentrations are on a scale of zero to one or zero percent to one hundred percent. So we do so in this code cell, normalizing the concentrations. And what you see is, okay, we put it in the form from zero to one in terms of concentrations. We have a component that's red, which can be a pure, more like reactants, a curve in blue, which looks like intermediates, and a curve in green, which looks like products. So with that, I'd like to bring you back to the presentation. Let's see. So this is what we just saw, which is we were able to extract out a three component fit using NMF, which has greater interpretability to our data. And now we can fit it to Kinex equations. And we've done so. So these are for a variety of conditions, which is the initial temperature of our system. And you can see that we can get a fairly decent fit of our kinetics model with the actual post-process data after using NMF. So this is really useful in reducing the dimensionality of the problem. So I'd like to go over very briefly what NMF actually does in terms of matrix factorization. So if we have our V matrix, which is our encoded matrix, all the data that has been post-processed in some way to be able to run this NMF algorithm on, we can express this or equate it to a dot product of two sub matrices, our W matrix, which is our features, and our H matrix, which is our weights or coefficients. What you'll notice is that P is usually of smaller dimension than our N or N dimension. In the previous example, this was 280 different body environment or features, and the N was our time variable. And so we decided to pick P of three, which is of course smaller than 280, and I believe about 90 frames worth of time. What makes it non-negative in contrast to PCA is that we have now again that hard constraint the elements where the entries of both the features matrix and the coefficients matrix must be greater than or equal to zero. They must have non-negative entries. And a common approach to achieving this constraint is to minimize the Frobenius to calculate the Frobenius norm or the difference between what our initial data was and what we expect to obtain in our features multiplied by our coefficients matrix. And if you're interested in the algorithm, I urge you to look at this paper. They go into very great detail about pseudo code and how to actually compute the W and H matrix. So I'd like to summarize and wrap up this discussion by going over characteristics of both PCA and NMF. So we've shown that PCA can provide you with both positive and negative weights. Remember the image where we had faces or features that could be expressed in positive and negative weights? What ensures that the vectors or the eigenvectors that we obtained are orthogonal to one another? And this is a global transformation because what you might have noticed in the PCA is that some of the eigenvectors we obtain had both faces and noses. It wasn't very localized. It was more of a global transformation of the original space. And the way that PCA does its algorithm is it defines the principal components as such that those having the largest variance. And when you do a dimension reduction, you remove the components that have the least amount of explained variance. On the other hand, NMF is more specialized. As I just stated, we know that the coefficients in the components matrix must have non-negative entries. And this is due to its only additive properties. And we also saw in the face example that some of the features were parts-based. We actually were able to extract out only eyebrows or only noses. And so this actually allows our results to be more interpretable in those specific examples. So with that, I'd like to thank you for your time. Hopefully you understood as we went through the rundown of principal component analysis and non-negative mixed factorization. And with that, I'd like to reiterate the grand scheme of data science, machine learning, and science and engineering. Again, we have cyber infrastructures that allow us to learn from a variety of data. We've shown that we can create predictive models and do classification. In this session, we focused on dimensionality reduction, going from n-dimensional states and reducing our data down to very simple three-dimensional features. And all this is to help guide in the design of experiments. So with that, I'd like to open up the floor to any questions that you may have at this time. Thank you. I'll start with a question. So in your first example, Michael, you were describing PCA with two different components. Yes. So how much of the variance can be explained by the first component? So in PCA, we're trying to explain a complicated phenomenon using simple components. How much can we do with the first component only for your toy example? I'll go back to that right now. So as explained in the first example, the explained variance will not change if you do this dimensionality reduction. So this example, let me find it, we're able to explain the first component already 75% of the original data space was being explained by the first variance, 75%. And you can see that when we do this dimensionality reduction, I believe this variable should not change at all. Because they were still explaining it by the exact same first principle component. So to answer your question, we were able to explain 75% of the variance through the first component, and the second one really didn't add much. It didn't explain much more variance. So this was an example that we wanted to show, yeah, you can do a dimensionality reduction going from one component to two components to one component. And you'll still more or less retain the majority of your explaining the variance in your data. So 75%. Thank you. So I was wondering, can you hear me, Michael? Yes. I was wondering whether these types of techniques of dimensionality reduction could be used in tandem with machine learning. So it could be the first step to reduce your problem. And then you could, for example, use some of the machine learning techniques that Saquette described in the in the previous sessions of the workshop. Do you have any comments on that or Saquette? I mean, I believe it's all you have to make sure I think is ensure that you do the correct post processing. Make sure that you follow along in what you're mapping it on to what space. If you're doing dimensionality reduction, make sure you map it back into the original space. Just don't do the fit and say, hey, this is our data. You need to do the fit inverse transform after you fit. But I think there's definitely value in taking this what we've done in our reduced dimensional space and actually moving forward with it. I think I definitely agree that you can do machine learning, training a neural network with this information. All we were trying to do is remove unnecessary variants or more or less we were trying to remove any values that might contain noise in our in our data set, I believe. So I'm pretty sure you can definitely link this and move it forward in machine learning algorithms as a calf. Maybe he can provide some input on that. Yeah, I was just going to say that in the past few years, there have been many examples of auto encoders or encoder decoder pairs of networks that try to do the same thing. You would have one sort of network that would learn the minimum number of features that you need to then predict the quantity of interest. If you can do the net reduction using non-negative matrix factorization or PCA, it would just fit in and replace the encoder part of the encoder decoder network and then the subsequent part stay the same. Right. So if you are interested in describing a given material, you can have elemental descriptors, you can have descriptors that comes from DFT calculations. And maybe not all of them are necessary. And the technique like this can reduce it down, distill it down to a set of features that you actually need to make predictions of, say, conductivity or strength or something like that. Yeah, there are many cases in almost every engineering or science application. The, when you have a high dimensional space, the actual states of your system are much lower dimensionality manifold in that very big space. For example, important sampling techniques to do integrals. That's their need is because if you do brute force exploration of your space, most of the time you're going to explore areas that are completely irrelevant. You can kind of see that in this 2D example that Michael showed, where if I throw points randomly there in that square, a lot of those points would lie in the blank spaces where that's completely irrelevant for my problem. In 2D, that's not a big deal, but you can very quickly see that if this happens in higher dimensions, you're completely wasting your time. Doing brute force exploration of space. That's the reason for things like metropolis. Markov chain metropolis type algorithms is to do important sampling. They're all based on the fact that we're in engineering and science. We always deal with a very lower dimensional manifold that we care about and identifying that manifold is key to people to start even thinking about solving the problem. Here's a question from the chat. Does the non negative matrix factorization capture non linearities or non linear correlations? I do not believe so. I'm sure it's very similar to the algorithmic approach is very similar to PCA, but it just has that constraint of non negativity. So as like PCA doesn't handle non non linear behaviors, just like if you as I was talking to say, if you have data that exists here and data that exists here, or if you can imagine that this data space was more circular, it's not going to handle it very well. I know there I'm aware of approaches that can actually handle such behaviors. These being kernel PCA is another one. So kernel underscore PCA. I'd urge you if you're interested in these, this sort of, this is just a flavor of PCA and MF, but if you're more interested in go taking this further, I'd urge you to go to the side kit learn dot decomposition, something like that. And it'll go through a variety of examples of how to do the decomposition techniques to handle more non linear manifolds, non linear behavior. So kernel PCA is one of them that I'm aware of at this moment. But there are a lot more listed there. And that's a lot more detailed lists. They have a lot of examples you can go through and try to read upon your own time. But for the interest of time, I only wanted to focus on PCA and MF in this tutorial. Another question from the chat. It says manifold learning algorithms can capture non linear data. Are you familiar with manifold learning algorithms? Oh, I mean, I think there was one of them that I explored, but I can't remember off the top of my head. I am aware that that does exist. But I think for the interest of the research, we've stuck with linear data. These are great. These are linear transformations. It's an, I can value, I can vector problems were essentially solving. But yes, I am aware of it. I just wasn't applied in this in this tutorial. I think the answer is yes. And it'd be good to maybe we can volunteer an expert and give us a. A seminar, one of these hands on tutorials on manifold. Learning algorithms that that can go beyond PCA. That'd be an interesting topic. Yeah. Are there any other questions? I just want to interject. If you don't have any questions or they come up at a later time, you're more than welcome to email me. And I will try to address it and get back to you as soon as possible. If you're not comfortable putting it here. Or if you're interested in helping us with some of these sessions, if you have, as professor shock and said, if you have expert knowledge in any of these areas, we'd more than welcome your ideas and maybe discuss with you how we can. Further provide interest into the community. Yeah, if you want to volunteer to run 1 of these, or if you have topics that you would like us to cover. Let us know. Okay, so if there are any, if there are not any other questions, please unmute your mics and join me in thanking Michael for this session. Thanks everyone for coming.