 So we have the beginning of module six here and I think there's two announcements I wanted to bring up. First one is at the end of this session, which will finish up in about an hour and a half. We will have a class photo. So be very quick. And it'll be a zoom picture. So be screenshots. If you don't want your image on, you don't have to. But obviously for a class photo, it's important to get your video camera on so that we can see you. So we'll remind you just before we wrap up. The other thing I want to remind or talk to people a little bit about is and we've been seeing this already. The lab sessions are intended to and especially today now that everyone I think is kind of up to speed with machine learning and the terminology is to start asking the TAs and to start asking myself about your work, your problems. That's partly why we're here. It's partly why the course is obviously being offered and it's partly why we asked you guys at the very beginning about your own work and how you think you could use machine learning. And this is something that's happened in other, you know, when we've given this course before is that typically people are kind of, you know, struggling with the concept for the first day they're kind of getting acquainted with the terminology the ideas the concepts. The second day they start getting a little more familiar, a little more comfortable with it. People are speaking the same language. And this is the time really to start asking us about your problems. Some of them are, you know, perhaps very simple things to ask and answer. Others are fairly complicated and it might require more detail. Obviously, you know, you can do the labs, you can do the exercises. But they're not intended to be that difficult and at this stage, some of them are more of just, oh gee whiz I can do that now, or starting out some of the challenges as co lab. But that's these really weren't intended to, you know, occupy your time they were really, I think to help inspire people to say, Okay, I get it now. How can I do it to help me, and how can this work for my work or my lab or my project. So certainly for the next two laps. I really encourage people to do that. Ask your TAs anything. And certainly, they're all really bright. They'll probably give you guys some very useful answers. In some cases some very useful collaborations have started. And hopefully that helps you guys with your, your research. I'm going to dive into module six now, same slides as per usual. So we're going to be learning about Keras and scikit-learn today. We're going to spend both module six and module seven on these ones. The lectures will take up the most of the time the labs will be relatively short which as they say is a good opportunity for you guys to start asking your TAs in the breakout rooms. And myself, either during the lectures or when I drop in about some of your own work. So, obviously we're introducing scikit-learn and Keras I mentioned that before and the reason why we're doing this is to show you how the code that we'd spent so much time explaining, and which other people and my group had written that spent so much time explaining in pure Python and NumPy are much easier, much shorter and some cases even better when you write this using scikit-learn and Keras. And we're going to show some specific examples we're going to take the iris classification problem that was done with decision trees written all in NumPy and Python and show how easy it is to scikit-learn and then we're also going to do the same thing with the neural net classifier and show the same thing. So what is scikit-learn you can call it also sklearn and that's probably how I'll refer to it more often. So it's an open source library in Python, and it has these algorithms for machine learning. So you can implement decision trees you can implement random for us and a number of other standard techniques. It's been around for about 13 years, and it's a very, very popular library. It's not the only one. And you can click on this particular link that's in your website to get, you know, bit more information about the history and the development and what's going on news and other things. Now that scikit-learn then there's another thing called Keras and then another thing called TensorFlow. So TensorFlow was a deep learning framework that Google created and that was done about seven or eight years ago. And Keras is a neural network library specifically for TensorFlow. So scikit-learn was for, you know, decision trees random for SVMs, but TensorFlow and Keras are really for the neural net stuff. They have application programming interfaces or APIs or apps to develop and evaluate standard artificial neural nets, ones that we've been using and deep neural nets, which have, you know, multiple layers and which are sometimes very much more sophisticated, you know, these convolutional neural nets, recurrent neural nets or graphical neural nets. And again, we've given you a link to TensorFlow if you wanted to learn more about these things. So TensorFlow got its name because it's named after tensors and tensors are fancy name for matrices or arrays or vectors. So TensorFlow also makes use of computer graphs, computational graphs that allow you to visualize some of the mathematical processes. The graphs are things that contain units of computation or operation objects. And the tensors are the objects that represent data that flow between those operations. Within the whole graph structure, you have a session. So this is sort of visualized here and if you're a computer geek, this may make some sense to you, most of us it's a little, a little obscure. But anyways, you've got data flow graph. You've got nodes in the graph that are connected to edges, just like we talked about the decision trees. Each node is an operation just as it's before. So it's a mathematical operation. So circles and then the edges are the data sets and what's the operations are performed. So the data flows through that. And so that's basically how you typically understand most operations. That's how you probably describe an algorithm in many cases. This is also how we can describe the graphs and TensorFlow. So the edges, which are those arrows are the tensors, and the nodes, the circles are the operations. And then the graphs, which describes the whole connection of those nodes and edges forms a session. Anyways, it's just, again, it's a conceptual thing, something you probably don't have to care about too much in terms of if you just want to get programming done and things solved. But that's just explaining sort of the origin of the name. So when we talk about Python, we've got various modules that can contain functions. We've given some that contain a group of functions of classes, and then there are obviously variables and variable names, which are what you standardly use in a program. We've seen how you can call some of these functions. That helps you save on code. And we've also used NumPy and pandas, which also made some of the mathematical operations and data handling easier. So we would import NumPy and import pandas that that made some of our work a lot easier. So Keras contains classes and functions just like Python. That allow you to create things for especially neural nets, artificial neural nets, deep neural nets. And as K-learn can contains the classes and functions, they allow you to do a lot of the preparation on data sets and training models for the decision trees in random forests. So as K-learn for decision trees around in forests, I think for SVMs and then Keras for those neural net things. How do you get TensorFlow to work? So we were, you know, how did we get pandas and NumPy? We saw how to do that. In Colab, you type in pip install tensorflow. That's a command. And pip is what's called a recursive and acronym. So pip installs packages, P-I-P. So it's just bringing in those packages or that module so you can do TensorFlow, which means that you can do things like SK-learn. All right, so that's the setup. That's just giving you some background and rationale for SK-learn and TensorFlow and Keras. So we're going to use the iris decision tree. So that means you use SK-learn, not Keras, because Keras is the neural nets. SK-learn is for decision trees and random forests and SVMs. So in this case again, when we wrote the decision tree yesterday, we used the NumPy and pandas. And that was imported into iris DT4, which you guys used in man. And in this case, we're going to use SK-learn with its own function called decision tree classifier. That's how it's written. And that is the thing you call. So that's sort of the preface before we start. Obviously with every machine learning workflow. Here's the six steps. We're defining a problem. So our problem as before was how do I classify iris flowers based on their floral dimensions. It's exactly the same problem we have with the original decision tree. We get our data set. And in this case, it's the same one that we've talked about. So we have setosa, virginica, versicolor, 50 of each, a total of 150. We measure petal and sepal dimensions. The same data set published in 1936. And again, everyone kind of knows these now. We have to transform our data set. Well, in fact, because it's decision tree, we don't, but we just put it into a nice table. So it's easier to read. And we've got our labels on the column under species and our petal dimensions of these other four columns. So it's a five column, 150 rows. We do exactly the same thing that we've done before. We pop up module six. We open on the iris decision tree SK learn in Python. And I think it's important at this stage just to remind people what the algorithm was like last, last time. So we had a function to read the data, we checked the data, created the training set that was, you know, 70% testing which was 30%. We had a splitting function which you know made a decision about how to split data split the loop based on our decision. We use the genie index calculator to essentially assess what was the best splitting point. We had a split function. We had a way to determine when we reach the terminal node. And then we had to be able to do this recursively because we're always cutting cutting cutting or making decisions. And we also managed to make a program that would be able to take any new input data and run it through the decision tree. So that had 10 steps. This one, if we use the new decision tree algorithm with SK learn, it's simpler. We just read the data using a function called SK learn datasets. Then we call a split function, which is train test split. Then we call decision tree classifier. And then we'll repeat four times, which is what our choice of maximum depth is. So 10 steps to four steps, calling essentially a whole bunch of pre written functions. We still have to do some math. So we still have to import NumPy. And we still have pandas to help read some of our data. So the SK learn datasets function is now what we use to read. And these are the commands. So it's six commands, pretty simple. The data sets function is the way that we read our data. We have a training and testing set. And we can import the training test split method. So it gets the training produces the testing. And again, it's just a function call. And then obviously we've already chosen the decision tree. So it's nice if we have to choose our model. But because SK learn has the decision tree classifier as its function, we just call it so we import decision tree classifier. And we have to fill in a couple of states. And so we choose a maximum depth of four, a random state of zero. And then we train the model. The other thing that we have to do of course is repeat it. And so we execute the decision tree graph with a maximum depth of four. We're talking or identifying certain levels of feature importance. We're also going to print out how well we're doing when we print out the accuracy. And I think again you can see the same call for the decision tree classifier. And that's called while it's still in the range of the maximum depth range of four. So, you know, it may not be trivial or obvious but there's a syntax to calling these things and we've given you that. So the total number of coding lines is just 52. It's not as heavily commented. We probably should have added a few more comments. Training time is very short and to run a test run is a couple of seconds. Compare that to the old program, which was 123 lines, which 91 were coding, roughly the same amount of time to train and to test as the SK learn. But it's roughly twice as long in terms of coding to write the old program. So you can also test your data. And what we're doing here is reporting a function called graph this and this allows you to actually visualize the tree, but also important things like the feature names and target names and explain how some of the decisions were made, and you'll see exactly what it does. So this is a really useful thing, a really useful function specifically for the random forest decision tree components in SK learn. So this is what you get when you graph this. If we've taken the samples training. We can say, you know, anything with a petal width of less than 0.8 centimeters gives us our genie index. This is the whole collection at the time. I think this is just from the training set so we're using 112. And it makes a split. So if you use that decision of less than point eight centimeters you can get all the ones into Satosa. And you have a genie index of zero, which is great that's very good. And then the ones that are greater than 0.8. Typically falling with a petal width, have a genie index of 0.496, which isn't great. There's 75 samples that fall in there. And that includes both virginica and versa color. Then we make a split and say if all the petal width is less than 1.65 or 1.75 we can get things very nicely classified into virginica. And you can see where the genie index values are, you know, close to zero. And generally whenever the genie index ends up at zero, we reach a terminal node. And when the genie index is more than zero. We're able to do some more splits. This is showing you the depth so we start with the full set at the very top. There's four layers starting at the orange one. That's layer one, layer two, layer three, and the very bottom is layer four. And you can see that in terms of the splitting, all the terminal nodes have genie indices of zero, which means we've done a correct job in terms of handling both petal length, petal width, sepal width to make the final classifications. So on that collection of 112, we perform perfectly. That's what you can see with the genie index. Everything was at zero. So we correctly predicted the setosis, the virginica's and the versa colors. That's the confusion matrix in the training set. And then for the remaining, what was it, 38. That was a testing set. It's not perfect. It's not perfect for setosis and versa color, but for genica, we get a confusion between versa color and virginica. And there's about 6% that are messed up. If we look at the old code, we got the same result with the training. It was perfect. The testing. We were more confused, I guess, on the versa color, but it's hard to distinguish between virginica and versa color. But actually, with the old code, the performance is slightly worse. 93%. This one is 94%. So, again, it's still a relatively trivial exercise, but the whole point is that you can do this decision tree with sklearn. And you can save code and you can get a slight improvement and you get some great graphics that actually draw out the decision tree. We made this analogy before. We spent most of yesterday and part of today learning how to code with the very basic, you know, how to walk up the mountain. With sklearn, and with Keras, you basically get a helicopter to the top. It's a lot easier. You also get a nicer view. So I think that's the point, the central point for these two modules on sklearn and Keras. There are tools and they're not the only tools. We've talked about a few others on the first day. There's Microsoft Azure. There's Weka, which is an old one produced by New Zealand. But these are tools that allow people to implement and write machine learning relatively intuitively, relatively easily to, you know, generate pictures to interpret their material. And to do the work without having to, you know, get into the nitty gritty of hard coding. So I'll stop here and maybe ask if there are any questions or comments. But we're going to switch now to the, from the decision trees to the artificial neural net. And we have the same set of slides and pathway. And again, it's just the same. What's our problem? What's our data set? We know these already. We're going to be going to the Python code. And this case, we're going to look at the iris ANN, not the decision tree, and it's with Keras. And we're importing numpy pandas, matplotlib and seaborn. Now, this is a case where, as before, we're going to upload the data. So actually the same, same code that was used in the original neural net one, which is, you know, reading the data, making sure there's no missing data, flagging if there's any missing data. We're also assigning things, flattening, we're doing one hot encoding. So this is some extra work that we're building. And so this is something that would have to be in any or every neural net that you're going to use. So it's not quite as powerful, I guess, as the decision tree one, which, you know, you didn't have to do your one hot encoding or anything else. So we've done the flatten, we're just still using pure Python. So we have the split dataset test train. So this is a function we can use. So it's different than the one that we wrote, I think, before. But this is where the Keras package happens. So this is where we're importing tensorflow and the Keras layers. And then specifically dense and specifically sequential. So sequential creates the framework. And then we also bring in something that's a little different, we talked about activation functions like sigmoidal functions. And this is called a rectifier rectified linear unit or relu activation function. And this is one that's used in the hidden layer. And it returns a maximum zero or the input value. And it's used a lot in neural nets. So this is a dense function, which is the architecture of the layer that's added. The ad is to add the layers. So are you going to have how many hidden layers or how many, how deep is your neural net if you want. In this case, we put in the relu function is being used is the both the hidden layer one in this case also the input function. So these are different than what we used for the original neural net. Softmax is used for the output layer. So I think in our original one it was sigmoidal was used in the input layer and then I think the hidden layer was softmax. So we're using relu and softmax for this one. So this again is just a call and tells us what's using again it's dense, which is the function we're having the code that we had. If you go back to the neural net that we encoded for the iris one, there's about 20 lines. And this is where we're talking about both softmax and the sigmoidal functions and calculating the derivatives and handling everything else. As I said, we've already chosen our neural net but then with Keras, we do something called compiling. We use the compile function which is called similar to other programming languages that need compilers like Fortran. And uses the data from the first layers and allows you to choose the optimizer algorithm algorithm and the loss function. The optimizer function in this one is called Adam, and it's a gradient descent optimizer. And I think you guys have heard about gradient descents and we went through a lot of the derivatives that we were calculating and all the details in the neural nets that sort of excruciating and difficult to follow. This is what's called by just choosing Adam. We're also having a cost function and this is called cross entropy. And again, that's also a simple thing we didn't have to calculate. We just call it and say loss equals categorical cross entropy. And then we're using the accuracy of the prediction as our metric, which also is easy to call. We use a fit method. And we have different batch sizes and we have different epochs and these were also called with quite a bit more code in the original neural net. But remember we use mini batches and we also had epochs. So in this case we've chosen 10 batches or batch size of 10 and we've chosen 100 epochs. So, one line classifier fit does what was done on the left side which is about 30 or 40 lines within our original neural net. Now I think it's important to remember that we still had to write a fair bit of code at the beginning for this neural net. We had to deal with the reading checking of data, the one hot encoding. But when it comes to actually invoking the neural net analysis, having to do the differentiation do the, you know, forward and back propagation, which was a lot of other code. It's reduced to typically a couple lines. So a real win at the, I guess I'll call it the back end of the neural net, but no difference in terms of the front end of the neural net because we have to do all of that. Text manipulation and reading and encoding. So, we also have a function where we can call predict. And that's what we have for all our other programs where we basically have to call our, our neural net, or our decision tree to say okay here's I've done my training. How do we do one of my test data. And so we're uploading our X test data. And we can compare those to, I guess it was my test. We're calling things again without Keras. It was about 15 lines with Keras testing. It was just one line. We can also call Keras to calculate our confusion matrix. And this is where we're using seaborn, and we're using SK learn metrics. And we're able to plot out some of the colors. This is where Matt plot live is used and so we can get some nice pictures. We're not going to show them, but this is what you get in terms of results. So without Keras, you can go back to the original result. We had pretty good results overall. We have a diagonal of mostly ones except for the Virginia versus the color. And then with Keras. We're actually slightly worse. We have, as opposed to a one at the very bottom we got a point nine four. But in terms of coding, the Keras one was 136 lines and iris and it was 250 lines. Most of those 100 lines in Keras are the text reading and text manipulation. And those same 100 lines are in the iris and so realistically it's about 30 lines of code to do all the neural network stuff with Keras and it's about 100 lines for the NumPy version. There's an R version as well. And this R version, you don't use Keras or escalar and you use something called neural net and DPLyr. In terms of runtime with those functions in R, you can see it's quite a bit slower, about five times slower. When we wrote it in pure R, or at least with the decision tree, it's quite a bit faster. So in some cases this is an example where, you know, the neural net as a method is one overkill and two less accurate. More code slower, less accurate. In terms of decision tree, it's less code, it's faster, and it's more accurate. But this is, you know, a didactic toy problem and wanted to be able to show how you could do, you know, both decision trees and neural nets. And in many cases, neural nets would be better or better choice for more difficult problems. And, but there are also examples where decision trees or at least especially random forests do even better than neural nets in complex problems. And again, it's a matter of try them, see what works.