 My name is Rishabh, you can call me Rish. I'm from National Junior College and I represent Building Blocks which is an organization in Singapore led by students, created by students, for students to promote enrichment of computer science education and we want to bring more people on board into the computer science journey. So before we begin, let me introduce myself. I am a final year student at National Junior College, yes, unfortunately, I'm taking my A-Levels this year. I am certified in machine learning and deep neural networks from Stanford University. I worked in and around Singapore at a couple of places. I was an ex-machine learning researcher at ASTAR and I co-founded an AI startup last year with a friend of mine and now I'm heading the AI lab at the startup called Unscramble AI. I write articles on machine learning and generally on tech and reviews on medium and I open source stuff on GitHub. I joined Building Blocks this year and so far the journey has been great. So if you're a student, please do take note that Building Blocks is hiring. So you can join us if you're looking for some fun. Okay, so for today's workshop, I'm assuming that you know a bit of Python, how to work with files, how to manipulate data in a basic way and a bit of high school math, you know, working with algebra, vectors, graphs and a bit of calculus that is differentiation. But don't worry, the math isn't that hectic or crazy and if you have any confusions or doubts just shoot a question. So machine learning is a subfield of artificial intelligence where we enable computers to learn without explicitly giving them rules. So given some training data, the machine learning algorithm is able to find patterns in the data and is able to give predictions when released into the world. So for example, Google Translate, it works using machine learning where it was trained on huge bodies of text in two different languages and it basically now has been released online. You know, you can quick Google search and you'll open up Google Translate and you can type in anything it'll translate for you. That is one form of machine learning. So machine learning, before machine learning came into play, back then in the growing years of tech ever since the internet boom in the early 2000s, we didn't have data, not much data and at the same time we didn't have enough computing power or lots of power to run really huge algorithms. So we resorted to traditional rule-based algorithms. So what do I mean by rule-based? Say you want to create an algorithm that's able to classify two different types of flowers. So there are two different types of Iris flowers, Iris Satosa and Iris Virginica. So say you made this algorithm that I give you an image of a flower and you're able to tell me whether it belongs to one of these classes or labels. So what a traditional rule-based algorithm programmer would do is he'd write rule after rule after rule and at one point of time he gets sick of it because how many rules can you write. But then it's worse if I give you an anomaly. If I gave you a picture that was supposed to be one of the flowers but your rule-based algorithm gave a wrong prediction saying it's something else, we have a huge problem on our hands if say in the case it's been released online or released to the public for use. So say I gave it a picture of Iris Virginica, it'd be a huge problem if your rule-based algorithm predicted it as Iris Satosa just because it looks the same or almost looks the same as Iris Virginica or Iris Satosa given the image that is provided. This is exactly why we have relied on the support of machine learning ever since we've been able to get lots of training data and lots of compute power ever since 2013 with Google, DeepMind, all these huge companies embarking on all these AI experiments and adventures so that life becomes so much more convenient for all of us. So in current times when it comes to training data or data as a whole, we can see that it's really complex, you know it's not really perfect, sometimes it needs a bit of cleaning, dusting, it's not readily available for use in machine learning so we need to clean it and at times data can be of low quality, sometimes they can be from surveys or they can be from these questionnaires but at times they're incomplete and you need to bridge the gaps, fill in the gaps too and show that your training data is good for use. So as the rule-based example that I mentioned earlier, you can't keep writing rules. There comes a point in time where you can't think of new rules and your algorithm keeps failing because well it's predicting the wrong thing because two things look the same, which is why again we need machine learning. So ever since machine learning came into play, we've had a huge boost in performance in algorithms and whenever you need accurate predictions, you would go to machine learning, some form of statistical machine learning or neural networks if you heard of them. And the beauty of machine learning or neural networks is that it's able to find even more complex patterns in data so that compared to the rule-based counterpart, it's so much more versatile in being used in certain use cases. So machine learning has penetrated almost every industry known to man. It's being used in banking, finance, social media, recommendation algorithms on Instagram and YouTube and even your assistants like Google Assistant and Siri. It all uses some form of machine learning which powers it to make your life much more convenient. And funny thing it's also being used in cucumber farming so that was this one person who used a machine learning algorithm in his cucumber farm in South Korea and he helped his dad by actually segregating the two types of cucumbers or the different types of cucumbers into different baskets using machine learning. He scanned the cucumber and saw what kind or type it is and well he put it into different buckets. So I think you have some picture or some clarity as to why machine learning is so useful not only in tech but also in other fields of life such as well agriculture. We see a huge boom in agri-tech and all these startups coming up and using AI and transforming the lives of many people who perform traditional manual labor and by leveraging the immense power of machine learning. So at the end of this workshop, I think one takeaway would be asking yourself how valuable you would be if you knew machine learning, right? You'd have so much power in your hands. I like to call machine learning engineers wizards because well they have the key to a better future, a more convenient future for all of us. So for this workshop, it's roughly two hours but I'll keep it short. I know it's late into the night so thank you for coming at this hour. So I'll be running through this basic form of machine learning algorithm called linear regression and then we'll be doing some theory. I'll be showing you a bit of math warning that yes math is involved but don't worry if you have a question just shoot. And then if everyone is okay and we can touch base, I'll be moving on to this basic neural network architecture called the single layer perceptron. And what we will be doing is applying all the knowledge that we'll be gaining in this workshop. I'll be doing like a live coding session with all of you and we'll be testing it on real life data sets. So yeah that's a spark a bit of interest. You know where it's applied and whatnot. Okay linear regression. Linear regression is a linear model so we know that the relationship between variables X and Y is defined by a best fit line. And the best fit line it has a relationship of fx equals mx plus c or y equals mx plus c as some of us have learned in high school. So m is the gradient and c is the intercept and together with the x and y variables they're able to create the best fit line on the graph. So to imagine so before we delve into the math and all the concepts I'd like you to visualize what linear regression really does. So say you're playing football. I give you a blindfold. You're blindfolded and then I spin you and you have no sense of direction or orientation. You have no idea where the goal is. And say I tell you okay immediately take a kick at the goal. I don't care if you know where the goal is or not just take a kick. And you take the kick and wow I mean obviously the ball flies somewhere else because you have no idea where the goal is and you have no idea where you are in relation to that goal post. And then I ask you to take off the blindfold and see how off your kick is or how how badly you kicked in a random direction. And I tell you to take a kick again where the blindfold and take a kick again. And you can see that now you kind of have some idea of where the goal is because when you open the blindfold you kind of saw where you were you saw where the goal post was and now you kind of have a rough idea of where exactly the kick or which direction you're supposed to kick in. And you repeat this process multiple times and you can see that the ball or the goal you miss less frequently and the distance between your kick and the actual goal post keeps decreasing as you keep improving or you get a better understanding of where you are in relation to the goal post. So similarly linear regression is a way of taking random variables the gradient and the y-intercept or the vertical intercept it draws the line which is analogous to a football player taking that random kick. It sees how off that line is and analogous to how when you take off the blindfold you see where you are where the ball went and where the goal post actually is. So and you correct the variables M and C the gradient and the y-intercept slowly just how the football player corrects his stance his positioning and his kicking speed and angle. And again this process is continuous so long as or until the person's able to get the ball into the goal or in this case we achieve the best fit line or something that's close to it. So this is linear regression in action so the green line is the best fit line or the thing we're trying to emulate and the red line we're constantly updating so you can see the red line is actually moving up right so and we can see that the gradient or the slope of the line and the y-intercept it's changing with every time it moves up or down which shows that the linear regression algorithm is trying to emulate or trying to come as close as possible to the best fit line and today we're going to do exactly this. So earlier I mentioned that the relationship between x and y or x and fx which is basically y is y equals mx plus c where m is the gradient and c is the y intercept. We call this the hypothesis function in machine learning so the hypothesis function is basically a guideline or this this rule that we need to follow to get the best fit line. So if you were to convert this to simple Python you can you can say you can define a function hypothesis function you can pass in the variables the y-intercept c the gradient m and the x value and return mx plus c. So while we plotted the line right so we have the values of m, x, c and y and we're able to get a decent relationship y equals mx plus c but as I said the football player took the wrong kick at the start he has no idea where the goalpost is where the ball is and where he is. So similarly the values that are initialized for the gradient and the y-intercept are completely random at start and slowly through the process of linear regression we optimize these values of the gradient and the y-intercept. So when we've predicted this line relationship of y equals mx plus c using these variables m and c we want to know how off the kick is right we want to know how off the line is to the best fit line or the line that it's supposed to be. So to find the error we use this thing called the loss function or the error function or the cost function there are many names to it but in machine learning we usually go for the loss function. So loss function is a mathematical formula which takes in your variables and it compares it with the real values that you're supposed to be in the future and gives you the difference between what you are currently at and the real values. So yes this is the loss function it may look scary but all it does is it takes the difference of fx and y which is basically fx is our line and y is the real value. We take the difference and we square them and this thing over here this we call this sigma notation some of you may have learned it in high school. So what the sigma notation does is it basically takes the sum of all these differences for all the examples in our data set or the training data set that we're feeding the model and it just takes the average so has to make the difference or the error more consistent across all the examples. Again if we were to compute this or change this in the simple Python we define a loss function where we take in the gradient the y-intercept x and y we initialize this variable called total loss which is basically across all the examples this is the loss that represents this round. So when I say around I remember how the football player he takes the kick and he checks like where the ball is and where he is in relation to the goal. Here I'll be calling it around but in machine learning we call it an epoch or an iteration right and iteration is basically one step that's closer to our final objective of optimizing that line to the best fit line. Again so we convert we get the hypothesis function which is basically y equals mx plus c or fx equals mx plus c. We find the difference between our line and the actual real value. We square them and we take the sum across all training examples and we take the average and we append it or we enumerate this total loss variable which is over here and we return the total loss. So what exactly does the loss function do? It takes our line as mentioned earlier, compares it with the real value of what the line should be and it gives you that margin that error margin of how off you are in your prediction of the line or the values for the gradient and y intercept. So that step that was in the brackets fx minus y whole square it's basically represents the line the blue line is the is the is the line that we currently plotted using our random values of m and c and the pink color dots are the actual real life values and the red lines you see is basically the difference of fx minus y. So if we take this y position and the y position that we predicted this red line shows the difference which is basically denoted by that inner statement in that formula that I showed you. But then if we take a random kick and we correct ourselves we need to know by how much we should correct ourselves. The loss function only shows our error but the actual process of taking that error and optimizing the gradient and y intercept using that error margin is what we should perform next so that we are one step closer to our final objective of getting close to the the best fit line or something close to that. So in high school some of you may most of you have may have learned about differentiation or derivatives and whatnot and so here when we perform differentiation on that formula I just showed you it actually gives us that change in the error between what we were at in the previous round and what we're supposed to be in the next round. It may sound abstract to you machine learning is is a bunch of abstract concepts but hopefully through this workshop you have a clear understanding of what really happens in these kind of algorithms. So if we calculate this derivative of that formula that loss function or the loss formula that was showed earlier we can we equate this to zero that means the change is close to zero and when something is close is zero the variables that are dependent on it which is basically the gradient and the y-intercept is either at its maximum or minimum right so here we need a value of that loss value or the loss function where it gives the least possible error or the minimum error which is if you were able to plot a curve of the rounds and the loss and the loss values you'd get a u-shape or convex function where that bottom or that plateau at the bottom of the u-shape curve is where you want to reach and we gradually go down that slope until the point where that gradient value is zero or minimum and we call this in machine learning or linear regression or statistical algorithms we call this gradient descent so I mentioned earlier that it forms a u-shape curve and we're trying to get to the bottom of that curve and we're basically jumping gradient values of the loss function where the loss is the minimum so there's one type of differentiation called partial differentiation it may be out of the high school syllabus but if you've taken a math or statistics course in university you may have come across partial differentiation but you don't need to know how exactly it works for this workshop but please note that it's what we use to get the derivatives of the loss function so we take the derivatives of the loss function with respect to the gradient value m and the current y-intercept c and these are the formulas that we get after performing partial differentiation we just basically get the average of the sum of of our line minus the real line or the best fit line and and that's the and that's the derivative of the y-intercept with respect to the loss function and we if we were to do the same for the gradient it would give us the same of fx minus y and if you actually do the math you'll calculate that we just need to multiply the x value at the end again if we were to convert this into simple Python we define a function called get derivatives where we do where we set the values of the derivatives and with respect to the y-intercept c and gradient m and we get the hypothesis function and as shown by the formula here these two variables update this value of dc and dm which is basically the derivatives with respect to c and m and we average them and we return them so now that we have the change the change in value for the gradient and the y intercept we can finally optimize or update the values for the gradient and the y-intercept so we we take the current values of the gradient and the y intercept and we subtract that change in error from it and we multiply that change in error with something called the learning rate in machine learning it's denoted by this variable called alpha or eta but here for simplicity's sake since most of us are familiar with alpha we'll use alpha so what alpha does is it decides the extent to which or extent by which the values are updated so we can use this formula where the new values of the gradient and y-intercept are the current values of m and c subtracted and we subtract the the change dj dm and dj dc which is basically the derivatives of m and c with respect to the loss and we multiply it to the alpha learning rate which basically shows us the proportion of the current values that's supposed to be overwritten in the next round and if we were to convert this again into simple Python we'd get the derivatives using that function we wrote earlier get derivatives where we get the values of dc and dm we update them we update the next round of values with the current values subtracted we subtract the the change from the current values and we return the the better values of optimized values of c and m so if we would do a progress check I understand that most of you may be still confused with the math but as we write the code it may be clearer as to what exactly we're doing and we'll be writing these helper functions along along the way in the live coding session that we'll be doing now and yeah let's perform linear regression now so we are using these functions that we've learned in this first part of the workshop we initially ran create get random values for the gradient and y-intercept we plot the line find the error as in we find how off the line is compared to the real values we find the gradient or the change in the loss function where the error is the minimum and we update it with we update the parameters of the gradient and y-intercept m and c with that change and finally we can redraw the line and of course the line is going to be much closer to the best fit line and finally if we were to sum all of that up into the final linear regression function we take in the values of x and y that we're trying to find a linear relationship to we randomly initialize the values of c and m as shown by the first two lines of the function here alpha we set it to a really low value not too low that it makes the algorithm too slow in learning the optimal values of m and c but not too large enough that it takes even longer time because it just keeps bouncing back and forth the optimal values and the number of rounds iterations will set it to thousand so we hope that by thousand rounds we're able to get values that are close to the perfect values or the optimal values of m and c so using the update parameters function that we wrote earlier we give in the learning rate x y c and m and by the end of these thousand steps it'll keep repeating and keep optimizing these values of the gradient and y-intercept such that by the end of the algorithm we can plot a line that's basically the best fit line or something that's close to it okay now we'll be doing the live coding session okay so do all of you or most of you have internet connection if you don't please raise your hands yeah okay we'll start in two minutes please do configure the Wi-Fi because we'll be using something online but so far does anyone have any questions or any doubts as to what's happening here no okay does anyone still not have internet connection okay so I'm assuming that all of us are connected so yeah now it's okay yeah it's coding time so I'm assuming that all of you have a Google Drive account or a Gmail account so please go to this website drive.google.com basically access your Google Drive please anyone not on the drive side yet okay everyone's in right okay so when you're in your Google Drive click the new button and you'll get this drop-down menu you'll see this drop-down menu and at the bottom of the drop-down menu you'll see this option called more and if you hover over more you'll get this other second drop-down menu where you'll see this thing that says Collaboratory it has this CO icon or logo yeah in some cases you may not have the the Collaboratory app but you can download it I think in Google Drive when you go to the drop-down menu you can actually download apps that you can put on to Google Drive so it'll open a pop-up window and you can search Collaboratory or CoLab it's spelt C-L-C-O-L-A-B I think if you type the first five letters you can probably get the anyone who didn't find it yet this side is is everyone clear is everyone okay again if you haven't connected to the Wi-Fi so far please do because we'll be coding on an online platform at the end of the workshop I'll be open sourcing all of this on GitHub on Twitter LinkedIn and Facebook I'll give you the contact details and my account so that you can check out these slides as well as the CoLab notebook for future reference it'll be better if you can do this on a laptop because we'll actually be writing some code okay then give me a second okay does everyone see this on the screen right anyone still unclear just give me a second yeah it's this one yeah anyone not clear please do raise your hand anyone else okay then this code okay let me just increase the fonts can everyone see this can the back row see this or how about this the back row can see this right okay yeah cool so what you've currently just opened is this thing called Collaboratory notebooks it's this online coding environment that is specifically meant for machine learning anyone can access it and it's free and best of all if you're doing some really cool projects that may be intensive in terms of computational power they can actually give you a provision of free cloud GPU for 12 hours so yeah that's pretty cool so what we'll be doing is a notebook is basically a cell based editor where you can actually type code in cells and you can run individual cells it's great for visualizing things and for interactive presentations okay so so first off we'll start off by importing all the libraries that we need so okay please make sure that you are connected to the server so you just need to click the connect button if you're not connected and if you are connected it'll show connected and it'll show you some memory usage okay everyone good so far okay everyone's connected okay so for this specific example we'll be using a few Python libraries or packages that come pre-installed when you're using Colab right and that's the cool part because machine learning you need to have all these weird sorts of libraries and modules but what Colab does is it has it all pre-installed so you just need to call it and you can use it so we'll be using this library called NumPy has anyone heard of NumPy before okay great so NumPy for those of you who don't know it's this library or module that enables us to do numerical computing so anything to do with vectors differentiation anything to do with matrices or matrix operations NumPy is your go-to library next we'll be importing this thing called Matplotlib anyone who's heard of Matplotlib okay so what Matplotlib does we use this Pyplot sub module from the Matplotlib library and this enables us to draw graphs so this enables us to give really cool presentations and visualizations by drawing colorful charts either in 2D or 3D and you can even show images on this which is pretty cool so to run a cell there's this play button over here and when you click play it'll actually compile the code that you've written in that specific cell for which you wrote where you click play for so now we can see that the code has run and does anyone have any errors so far everyone's cool right okay sorry okay okay great okay so there's this button called code and when you click it it'll give you a new cell so that's another interesting fact about colab you can just keep adding cells and run them individually and it actually uses the variables that you declared in previous cells above so you don't need to run the entire thing again and again you can just run cell after cell because it's all dependent on what you've already ran above so when when we start off with linear regression we well we need data right so for now we'll be creating some dummy data before we apply our algorithm to something do like a real-life data set like the iris classification data set so okay yeah so so we need the x and y and to create them numpy comes with this sub module that allows you to create well random digits hold on so it can either create random digits or it can actually create variables for you or randomly initialized data values for you that you can actually plot on a graph so if you type this thing called x equals hold on okay yeah so what this lin space sub module does is it basically creates a linear space and a linear space is basically continuous values from say one to a hundred so what we're trying to declare here is we want values from one to hundred and we can do the same for y so usually in machine learning we denote x and y x with the capital x and y with a small x it's just for notation simplicity okay and again if we were to run this yep it ran so now if I were to actually print these values of right click play and yeah so basically all it gave me were just random values from one to hundred or in the range one to hundred so this allows you for quick prototyping if you just want to test out your algorithm its performance on a random data set you can actually create a random data set with just this linear space variable okay so uh okay we ran this okay and for a line we need random values for m and c right so okay so if we were to actually print the values of m and c so at the beginning I told you that linear regression assumes that the values of the gradient m and the y intercept are completely random it is going to result in a horrible best fit line so if we run this yeah we just get two random values of m and c and next since we have the m x uh m x c and y well let's plot the line so let's write this helper function called plot line so it takes in m c uh and well x and y so for reference I've already written this function to save time so uh yeah this is the function hold on let me just copy this okay yeah so we're using the pi plots uh graphing capabilities so given the values of uh yeah given the values of the gradient y intercept x and y we can actually plot these values and uh let's call the plot line function for our random values so what this will do we're calling the plot line function below over here yeah over here and what it does is basically takes our values and converts it to a line so if we were to run this function hold on so uh this function uh we're just experimenting with different values to show how off the line is so as you can see from here uh points are all uh represented by these blue dots and sorry uh yeah hold on so if you were to run that cell we can see that uh the the line we plotted is pretty off from what we're supposed to get uh hold on let me just scroll up so you can get the code so over here we're just taking these values that we gave the function and we're plotting this on the graph uh we're scattering we're creating a scatter plot for the points that we created above uh the x and y points that we created above and we are drawing a line and we're labeling it the linear regression line uh does everyone have the code now uh if you haven't finished copying it please raise your hands okay yeah so the thousand is basically what it does is it creates a larger range for the graph to be shown so we have a rough idea of how off the line is is anyone still copying the code for every time you randomly initialize m and c x and y it'll keep giving you a different graph or a different line for all of them which shows that uh linear regression starts off with random variables and optimizes them as we gradually move along the rounds uh yes hold on okay is anyone still running the code okay so i'm assuming that everyone's written the code uh and ran the cell uh who doesn't see the graph yet everyone should have a graph by running the code right and we can see that this red line over here it's really off from what it's supposed to be it's supposed to be something close to these blue dots which are basically represented which basically represent your points so these are the current values for m and c and through the process of linear regression we'll try to get these such that it represents the best fit line for these set of points over here okay so the next process is uh we at the start I mentioned that we follow the hypothesis function which is basically a set of rule it's a rule or a guideline that we follow such that we optimize values for for it so here we're defining a function called the hypothesis function yeah and we basically return the mx plus c yeah okay is everyone good to go okay so next we want to create this function that represents this math formula here which is our loss function so the loss function again uh a fresher uh the loss function what it does is it takes our line and it compares it with the current line and sees how off our predictions are with what it's supposed to be so and that function on the slide we initialize this value called total loss to be zero and through the process we are going to iterate through we're going to iterate and we're going to find the average loss for that one round that we're running this for so for so what this line does is in the math formula we saw that it's represented by xi and yi right so xi and yi represent a single training example in our whole data set so say I have a data set comprising of cat and dog images right so one image you know it could be a cat or dog whatever it is that represents a single training example in our data set and in machine learning we call this a training instance so and it's represented by xi and yi and similarly we're taking we're iterating through the inter entirety of the x and y values our data set and we're looking at it example by example or instance by instance and we use a hypothesis function that we wrote earlier we pass in the values of m c and x so the next part of the formula is getting getting the difference between getting the difference between our line and the real values so I'm sorry this is supposed to be xi because we're calculating the hypothesis function for that one specific example so next we will we will create a variable called diff or difference so what this does is it basically takes the difference between our line minus what it's supposed to be next in the formula we have the squaring of this difference and what we'll do is we'll call this the square is equal to numpy dot square so earlier I mentioned that numpy is used for numerical computing one cool thing about numpy is that it provides all these functions math functions like absolute square root square power such that we can uh you we can take numbers and we can perform these kind of arithmetic operations on them really easily and we pass in the value of the difference next after we have the square we want to take the sum of we want to take the sum of that difference right so in the slides we have this sigma notation over here that over these instances we take the sum of all of them so we will call this the summation I'm not calling this sum because sum is a reserved keyword in python and we have the numpy function of np dot sum of the square uh has everyone written the code for this so far if I'm going too fast please do tell me I can slow down and now we have the average which is basically one divided by two times m m over here is the length of x or basically the number of examples in our training set so if the length of x is hundred that means we have hundred instances in our training set and we multiply this again with our summation I mean more than average we can actually call this the error itself or the loss because at this point of time we've successfully taken this math formula and converted it into simple four lines of code so all these four lines of code actually represent our error for this one specific training example and now we can finally add this or append this to the total loss variable and then then we can finally return our total loss okay so are we good to go okay so we run this cell and we create another cell so up next we actually have to find the partial derivative of the loss function so that we perform this process called gradient descent as I'd mentioned earlier so we've taken we've written this loss function and now we perform this so again we find the derivatives with respect to the gradient and the y-intercept and we can convert this again into simple python with this so we define a function called get derivatives we pass in the values of the gradient the y-intercept our x values and y values and then we initialize our derivative with respect to c and our derivative respect to m as zero and just like the loss function over here we denote m as the number of training instances in our data set again we want to take this change and we want to change all the values so again we want to be consistent across our whole data set which is why again we will be iterating through all the instances in our data set so the zip function is a reserved keyword in python which basically combines two arrays so it actually puts them column by column together such that it becomes one big or mega array comprising of two smaller arrays and you can iterate through each training instance or each set of values separately in this zipped array so again we get the hypothesis function we use it to get our f of x which is basically our line so the hypothesis function as we have written here it takes in the values of our gradient the y-intercept and the x of i which is basically the the ith index of the training instance and if we take reference to the formulas over here we take the sum of all the partial derivatives and we take the average of them so again if you were to convert this into simple python we can say dc the derivative with respect to c it is fx minus yi as denoted by that formula and d dm is fx minus yi multiplied to xi so if you were to actually do the math the partial differentiation for the loss function with respect to both the variables c and m you'd get this formula over here and we're representing the these two equations in python so the next step is since we're doing this for one training example we don't need to take the sum for this so what we need to do next is we need to take the average of it across all training examples so we can do this with a simple slash equals to which just divides which divides the value all the values present in the dc and dm array by m which is the number of training iterations and then we return them and we run it in the meantime let me go get my laptop charger if you have any questions or doubts with the output you're getting please do ask or hold on let me just excuse me for a moment i'll be scrolling up to the plot function because some of you may have not caught the exact functions okay so i understand some of you i have arrived late so i'll just do a recap so that even those who have been with us you can get a rough idea of where the code is going and our progress so we start off with importing the library's numpy and matplotlib numpy for numerical computation and matplotlib for the graph drawing and the curves we have the x and y data set which is random values in the range of one to hundred so it's represented by capital x and lowercase y and next we randomly initialize the values of the gradient and the y-intercept by using the numpy dot random dot random n which basically gives two values which are randomly initialized based on your computer's specification next up we have the plot line function which takes in our values of the gradient a y-intercept x and y and we plot a line so if i were to run all of this so we can see these blue dots are the points that we're trying to create the best line for best fit line for and this red line over here it is our line or our estimate using these random values we can see that it's really off showing that linear regression starts off with a random line randomly drawn and how we're going to optimize it in such a way that by the end of this linear regression function it'll create something that's close to the best fit line for these blue points okay next we have the hypothesis function which is basically our guideline or rule that we're trying to follow it creates a linear relationship or it assumes a linear relationship between the variables x and y we'll run this so after we get after we plot the line we want to see how off the prediction is so we have this thing called the loss function which basically takes the difference between our line and the line or the value that it's supposed to be our prediction and what the real value of y should be and it uses the loss formula or the loss function formula of this and if we change this to simple python we calculate the hypothesis function or our prediction we get the difference of fx minus yi which is basically our line minus what it's supposed to be we square the difference using the pre-built numpy dot square function and we sum all these errors together and we take the average of this error and then finally we add it to the total loss variable and we return this loss and we'll run this and this get derivatives function what we're doing is we're translating this change in error where the error value is the is the minimum so when you take the derivative of something and equate it to zero the variables that are dependent on it such as the gradient and y-intercept are the most optimal values when the error is zero so we change this to simple python and we get this function where we calculate the prediction or fx and using this formula of fx minus yi and fx minus yi times xi if you do the math you can actually get these formulas but for times sake we'll just end it at this and then we calculate the derivatives with respect to c and m take the average and we return them okay so that's a quick recap of what we've written so far because some of you entered it so this is a general recap so after we get the derivatives the next step is well to update the values of m and c so we move on and we use our update formula of m equals m minus alpha alpha multiplied to the respective derivatives with respect to c and m and when we convert this to python we so if we convert this to python we can define a function called update parameters which basically takes in the values of m c x y and earlier i mentioned that we use this thing called the learning rate the learning rate uh it shows us or it tells us how quickly the the algorithm is going to converge to the optimal values so when i say converge it basically means through rounds of iteration or through the rounds of drawing the line seeing how off we are and correcting that error margin we can actually get the alpha value actually dictates how fast we're going to get there or how slow we're going to get there so sometimes choosing the right or the perfect or the optimal alpha value is really important when doing linear regression because that because the speed of learning it well depends on alpha so if we were to take reference to this formula again if we convert this to code using this get derivatives for a function above we return the values of dc and dm and we can call this function and we pass in the values of m c x and y a gradient y intercept x and y points and then we can finally use the update rule or the update policy or the update equation uh to get to get the the next round of values for for the gradient and the y intercept so alpha times dm and c equals to c minus alpha times dc and then after this we basically return the values of m and c so through rounds of linear regression or through iterations or epochs we can see that the values of m and c get closer and closer to the values that they're supposed to be as in they become really close to giving us the best fit line so after the update parameter function has been written well the next step is to perform linear regression so all these functions above the hypothesis function loss function get derivatives function and the update parameters function all of these were basically the helper function that's going to aid us in finally writing our linear regression algorithm so again we give it the values of mc sorry so linear regression since it's the final function we just need to pass in x and y because the values of m and c they are created internally in this function so if we were to take reference with the code we can see that m and c are the first two lines yeah the first two lines we create random values for them in this case we're multiplying 0.01 to them so as to ensure that the values remain small and easy to calculate with because when you have two large values or really big values your computer is going to overflow because it can't compute with that large of a number so again we create random variables using the pre-built np.random.randn so this gives us two random values for the y-intercept and the gradient so now that we've initialized the random values of m and c next step is to create is to initialize the alpha or the learning rate which basically dictates or shows how fast the algorithm is going to be in learning the relationship between x and y so here we're going to set it to something really low like 0.05 so usually alpha values are set to something small so that they can converge faster rather than having a really large value where it keeps bouncing back and forth between the optimal value so supposing that the optimal value is 2 so it'll keep jumping from 1 to 3 multiple times or 0 to 4 multiple times until it finally converges to 2 but if we have something like 0.05 we can probably jump from 1.5 to 2.5 and then finally converge at 2 so learning is much smoother it's faster and more efficient so up next is the number of rounds and the number of rounds basically denotes the number of iterations or epochs that this thing's going to run for or the algorithm's going to run for so by the end of thousand rounds we hope that our algorithm is able to get a decent value for m and c and can optimize it such that it's able to give us something that's close to the best fit line so again we iterate through these different values through these num rounds or through these epochs and we hope that by thousand rounds we can get m and c that's going to give us the best fit line so c and m we can get it using this update parameters function which basically gives us the values of m and c and returns it so for this update parameters we give it the gradient the y-intercept x y and our learning rate offer such that it's able to get calculate the next value of m that's closer to the optimal values by using that formula that was shown earlier and then after that we can basically return m and c so by the end of this training loop we call this thing the training loop or the training job or the training cycle we hope that the values of m and c have converged the optimal values of what they're supposed to be and then we can return m and c remember this plot line function that we'd written earlier what we can do is we can call this plot line function in our linear regression algorithm just after we've initialized these random values for m and c just to give us a rough idea of how off the line is when we first start off and then what we can do is after this training job or training cycle we can actually plot it again so this shows us if the line has improved at all okay yeah and for simplicity sake or for greater visualization or detail we can actually print out these initial values of m and c and we can print the final values of m and c after this entire process has ended so yeah so this is the code for the final linear regression algorithm so what it does is it takes the values of x and y our dataset and assigns it random values of m and c and by the end of the training cycle we optimize these values to get something that's better at giving us the best fit line silly mistake on my part in the update parameters function in the linear regression algorithm in the linear regression formula above we returned m and c it's supposed to be m and c here as well we don't want to confuse the values of both of them so then after writing all of this all the helper functions we've converted math into code we can finally call linear regression of x and y that we initialize over here in this cell so now it is the moment of truth all these functions that we were written above all these contribute to the final linear regression formula or the linear regression algorithm to optimize this line so let's run this hopefully the m and c values are able to get optimized so we can see a horribly drawn line over here with values of these values of m and c and well we see that the values of m and c have improved in some way through 1000 iterations we can see that this line is not the best fit line clearly because it doesn't really is it's not really parallel to these blue points here it shows us that the either the number of rounds of iteration is not enough so either we need to pump it up to i don't know 10 000 100 000 or that the values of m and c are really small so we can play around with these values of m and c such that they give us values that we can optimize further within these 1000 rounds so here we get something that's almost close to the best fit line so what this does what so when we initially randomize these variables for m and c we get a really horrible line and by the end of 1000 iterations or 1000 rounds we can get something that's closer to the points that we plotted and we can even see that the values of the m and c have changed from what they previously were which shows us that the linear regression algorithm is doing something to optimize these values so for some of you it may work and for some of you it may not probably because of the internal state of your machine because when we initialize random variables on a computer it uh it's unique for each computer so your line may be different from the person sitting next to you which is why some of the values that were randomly assigned may not assigned here may not be fully reflective of the true capabilities of the linear regression algorithm because in the slide uh that i showed you this was a visualization of linear regression uh where we take in the x and y variables and we can see that it's supposed to be doing this so this is the true representative of the linear regression function such that it optimizes m and c uh to give us something that's close to the best fit line okay due to time constraints i will be moving a bit faster uh hopefully it's not too much of a trouble okay so now that we've uh we've uh performed linear regression and some of us or most of us have the bare bones concept of linear regression and how it works we will be moving on to this thing called the single layer perceptron so it comes from the root word percept which basically means think or has the ability to think so single layer because it uh i'll be showing you later on it consists of a single non-linear function in it which is able to give us a better representation of the data set and linear regression in this form in the most basic form it can only take in one value of x for the next value of y or for a corresponding value of x and y but in single layer perceptrons it can take in multiple values of x which we call features right so for a flower we have multiple features you know it's the length of its stock the length of length of its uh petal the width of its petal and uh we can use all these attributes or features or characteristics of the flower to predict what type of flower it is but before i uh begin uh okay it might be a problem because my charger can't fit in um okay okay so linear regression it assumed a linear relationship between the variables x and y so what single layer perceptrons do is it assumes a non-linear relationship between x and y so linear relationships are restricted in the kinds of mappings they can do between those two variables because well we're only confined to a line but a curve can have any kind of shape and form which makes it really flexible and giving us an ideal or more than ideal uh mapping or representation or relationship between x and y so as mentioned in the slides we can get the best fit curve for the data using a non-linear method such as a single layer perceptron but before we begin uh the inspiration for a single layer perceptron came from the biological neuron in the human brain so what it does is it gets information from the neurons around it it processes that information and it sends it outwards to other neurons nearby so it gets an information from other neurons processes it and sends it out the other end so the computational version or the mathematical model that we that scientists or researchers tried to give to this biological neuron was the single layer perceptron neuron so when I say single layer it means that there's only one function or a single function that were that were performing between x and y to get the relationship between them so as I said it can take in multiple inputs so x1 all the way to xn so if we to consider flowers such as the iris flowers we can have something like the width of the stock the width and length of the petal and all of these can be individual numbers or attributes or features that we can pass into the single layer perceptron and we can get the final prediction to see whether either it's iris satosa or iris virginica in this in this workshop we'll be looking at a single layer perceptron where it gives us multiple predictions and the prediction with the highest probability of occurring is the one that the inputs most likely denote so if I gave in multiple features as my x values it would give me the different probabilities of y1 y2 and y3 which are basically predictions and the one with the highest value among y1 y2 and y3 is well basically it shows us what it is so if one type of flower was y1 another type at y2 and y3 was another type of flower say that at the end of this entire training loop and cycle we get that y2 is the value with the highest amongst y1 2 and 3 we can say that the flower that we gave it it belongs to the second type of flower so what it does is it takes in these attributes or features it performs the summation function which is the same as we did in linear regression and it performs this thing called the activation function which I'll be showing you later on and it gives multiple outputs y1 to y3 or depending on your use case and the one with the highest value or the highest probability is the one that the input belongs to or corresponds to the fun thing about single layer perceptrons is that we only need the gradient to optimize we can completely ignore the y-intercept which makes this more efficient compared to linear regression because in linear regression we're only we're trying to optimize two values but in single layer perceptrons we are optimizing just one gradient value so in single layer perceptrons we'll be working with vectors and matrices so at the start in this slide I showed you that it can take in multiple inputs right so these multiple inputs can be represented as a single vector so a vector is a single dimensional matrix so it's an n times one dimensional matrix you can assume it to be a column of values one after another which denote different things like the petal width or the petal length so we perform the summation function on the different attributes we multiply it with the corresponding values of m we don't need c because as I said that's the key that makes single layer perceptrons more efficient and we perform the activation function on it so what exactly is an activation function the activation function is something that introduces non-linearity so initially I said that the initial relationship that the single layer perceptron assumes is y equals mx right but that's linear in some way what the activation function does is it converts that y equals mx into something non-linear so it takes the linear relationship applies a random function or the activation function on it such that the model is able to find even more patterns in the data which is basically the task of machine learning so in this workshop we'll be using this activation function called sigmoid so the sigmoid function it whenever it takes in an input it performs this computation or this calculation on the x input such that it's able to squash any number any size inter number between zero and one so if you have a calculator with you now you can just key in this formula and for x you can put in any value it'll give you something between zero and one and in probability theory we can assume a probability of an event occurring as something between zero and one right so if there's something that's close to one we can say that there's a high probability or high chance of it occurring and if it's something close to zero we can say that it's probably not going to happen so again finding the errors in prediction the fun fact about single layer perceptrons is that it follows the same loss function as the linear regression which is basically this math formula that was shown earlier so what happens next is we take this the same values of the partial derivatives but in single layer perceptrons we perform this thing called back propagation which is a step of taking these values of m and c taking their derivatives and passing them back up the chain over here so when we give a prediction we take in the values and we bring it out from left to right but in back propagation it calculates the derivatives of the value of m and it sends it back through using differentiation and this way it's you can think of it as a blame game so the final prediction blames the activation function for being wrong the activation function blames the y equals mx formula for being wrong and then through this whole blame game or process we can find who exactly or which step is giving the wrong value and we can optimize the value at that specific position which is what back propagation does but for again a time uh issues will not be covering back propagation and uh we'll just do a okay hold on i think my battery died uh can i get some help over here the charger's not fitting into the plug point here uh is there an adapter so single layer perceptrons and linear regression models they're really similar in some aspects but in other aspects they may be slightly different which shows the difference in performance if you were to use a neural network such as a single layer perceptron and you use a basic traditional algorithms such as linear regression we can see lots of differences in its performance accuracy precision and everything that gives it its final form hold on okay great so if we were to do a final study on single layer perceptrons and linear regression yeah we can see that both of them assume a partial or semi uh true relationship of y equals mx or linear relationship both of them compute the errors or the change in errors using the loss function both of you both of them use some form of gradient descent in single layer perceptrons we use back propagation but in linear regression you use gradient descent but invariably they both do the same thing of passing the error from the prediction to the functions that gave it the value and the differences is that linear regression uh works with just single values but in this case in single layer perceptrons you can pass in multiple inputs making them more versatile for many use cases and single layer perceptrons apply activation functions which introduces non-linearity to the model and since since curves have more flexibility we can see that they are able to find better patterns in the data and can give more accurate predictions so now let's code a single layer perceptron so in the linear regression we saw how math heavy it was and most researchers they don't have that kind of time to write the individual helper functions and those convert those formulas to code which is why developers uh in many companies such as google uh and facebook they created these open source machine learning libraries so what they do is all these formulas and helper functions it just squashes it so that your entire machine learning code can be converted to just a few lines of machine learning uh machine learning scripts so now what we'll be doing is we'll be using a machine one such machine learning our library where we're going to build a single layer perceptron we'll be pre-processing the data beforehand so what the problem with machine learning is that a machine learning engineer or technician usually uh spends about 80 percent of that time uh wrangling with the data spends lots of his time you know cleaning dusting and you know cleansing it such that he can spend 20 percent of his time feeding it into the model and now we'll actually be using one such machine learning library some of you may think it's psychic learn if you've heard of it but no we'll be using tensorflow so tensorflow is this open source machine learning framework uh it was released about three four years back uh and version 2.0 just released last week at the tensorflow devsome which makes this even more exciting because tensorflow is up and coming it'd be great if you'd get a hands-on experience with using it because nowadays most machine learning projects and research projects are using tensorflow or some form of tensorflow uh or abstractions over tensorflow for these projects making it allowing for quick prototyping uh experimentation you can just write a bunch of code few lines of code you can have a entire suite of machine learning tools at your disposal if you want to read more about tensorflow and its applications in the real world you can visit the website tensorflow.org and you can get hands-on code labs such as the ones that i've shown you and you can get a hands-on examples of different real life use cases data sets that you can play with uh yeah all of it's available at tensorflow.org so now we'll be using tensorflow to predict the flower type okay so flower type as i said it can consist in this workshop will be taking two kinds of iris flowers setosa and virginica we'll be using these uh features the sepal length width petal length and width and uh we'll be taking these four attributes and we'll be predicting the type given a new data set or given a new training example that the model has never seen so in the case that we take our model today and we were going to release it online so that anyone can give it different values of length and width variables we can actually predict what kind of flower it's going to be or what type of flower it's going to be so for the data set uh can you all please visit this uh tiny url link so capital f capital a iris slp all caps so what this should do is uh hold on let me just connect to the internet again okay so you should be seeing something like this does everyone have something like this yeah so this is the data set we're going to use but you don't need to download it we'll just be automatically downloading it within colab so one thing that's cool about colab is it allows you to perform command line functions uh inside one of the cells if you put an exclamation mark in front of it uh you can basically do command line functions on it so you can either use the same colab that we used for the linear regression example but for just to keep it clean i'll create a new one just for the single layer perceptron okay so okay so let me just close this okay so again if you do not know how to access a colab notebook you can go to your google drive login using your gmail account really simple uh you can click the new button in the drop down menu you'll see this thing called more and if you were to click on more again you can get this other drop down link and you'll see this thing called collaboration with the icon or logo of c o uh everyone's able to access a colab notebook right okay great okay so i have my colab notebook up and running and let me just connect it okay so in this page that you see uh just copy this link over here so the link over there that leads to this page just copy it uh we you don't need to actually download this onto your physical machine or local machine so what you can do is you can do exclamation mark wget so what wget does is it's a function that allows you to download content of the internet without actually having to download it locally then with that uh url you just paste it so that long link just paste it with the wget exclamation wget in front of it and if you were to run this you can actually download that entire thing yep it's downloaded and it's called the iris slp.csv uh additionally if you are not really familiar with the command line functions you could uh you can import this library called pandas so what pandas allows you to do uh is you can work with uh datasets you can make you can form tables with them allows for really easy visualization and representation of your data so so pandas comes with the built-in pd dot read csv function okay so read csv and you can actually dump in this link over here and it'll actually read that online url link and uh what we'll be doing next is converting this dataset into a data frame so what a data frame is is you can assume it to be an excel spreadsheet so we're converting that raw content that you saw these numbers these values and their flower types we'll actually be converting it into something that looks uh uh and works like an excel spreadsheet which makes it so much easier because i'm assuming that most of us have worked with excel before and we know that we can easily manipulate data from that so you can actually print the first few training examples on from that dataset by typing dataset dot head so let's see if this works okay it turns out that the download link doesn't work in the case that you're not using a colab notebook you can use the pandas alternative to doing that but since that since colab comes with this method that automatically downloads it we'll just use this okay so again we'll import pandas as pd and we'll call this uh dataset and over here in your download link you should see something called iris slp dot csv csv stands for comma separated values so uh as you can see all these values here they are separated by commas and you can after each comma you can assume it to be something like an excel sheet so a comma is basically one cell in the excel sheet so we can say uh pd or pandas dot read csv iris slp dot csv so since colab has already downloaded this uh link for us we can actually use it off the bed yep so now we have successfully loaded our dataset into colab right so again as i mentioned earlier machine learning tacticians and machine learning engineers those who work with ml on a daily basis most of that time goes into doing stuff like this opening or loading datasets seeing if there are missing values patching up errors in the datasets if something's wrong and only a small bit of time is actually is spent creating the model training the model so yeah now that we have successfully uh now that we've successfully created our dataset uh run this yeah so as we can see it looks like something off an excel sheet you know where each of these are the individual columns and cells and we can see that we're using the sepal and sepal width petal length and petal width to predict the final type of the flower so let's create another cell so what we'll be doing now is since we have the dataset we need to convert it into something that the code that the algorithm can read right so so the pandas data frame comes with this built-in function called dataset dot values we just basically converts all of this into an array that we can use so if we were to create an other thing called features so when I say features it basically refers to these values such as sepal length sepal width petal length and petal width and they are the attributes that describe the flower type so so over here we're saying that we want all the columns which is denoted by this colon over here comma states that we're moving on to the next axis which is basically the column axis and when we say colon four we're trying to say that we want everything until the fourth column or before the fourth column so in well in python programming indexing starts from zero right so we're moving from zero one two three four right so we have zero we have zero one two three four and in this function we're just saying we want everything before four and then we can create another variable called labels which is basically minus one so what minus one does is it basically just gets that last column if you have a really long set of columns or a large number of columns if you do minus one it just basically gets that last column and the colon basically says yeah we want the all the rows or all these hundred training samples or examples and now if we were to print features if we were to print the first example in our features dataset yep it corresponds to the values over here five point one three point five one point four and zero point two and we could do the same for so we could do the same for the labels and it'll generally it'll just return iris satosa so now that now that we have the features we can finally import numpy because we're finally getting into the exciting part we're going to create we're going to transform the data such that it's accessible or usable by our machine learning algorithm so features again we can call it np dot array so it takes np dot array or numpy dot array takes in a regular array and creates the numpy version of it which basically means that it's good for our computation we can do anything with it and let me just feed in the features variable but since labels are text values iris virginica or iris satosa we need to do a bit more pre-processing so we can actually iterate through the labels and we can say that if the labels if the ith index in the labels is equal to iris satosa then so now what we're going to do is we're going to above this we can create this empty array called y which is basically our x and y variables this is y so if it sees iris satosa we can append the array one zero so this one zero notation we call this thing one hot encoding in machine learning so one hot means that if we have multiple classes or multiple types that we're predicting though whenever it's a one it shows that it belongs to that type so if it sees something like iris satosa we're just saying out of the two categories of flowers that it can belong to when it sees these these kind of values for the petal and sepal lengths and widths we are assigning it to be the iris satosa so one and zero they're just binary values so we're just saying that one represents iris satosa if it's there in the first if it's the first number and then if it's the second number we create another if statement if labels i is equal to iris virginica then we append zero one which basically saying which is basically saying now if you see something that a label that corresponds to iris virginica or the second type or in our training example we want y to be appended with zero one this time showing that this value holds the features for iris virginica and not iris satosa and then now we can finally create the numpy array for y and for features since we're already pretty much done with the features we pre-processed it we can actually call it x because now we're finally ready to pass it on to our machine learning algorithm and we run it okay hold on okay my bad okay so now we can actually go on to building our model so colab since it was created by google and tensorflow it's also created by google naturally we'd expect colab to have some support for using tensorflow so we can easily say import tensorflow as tf again if you want more details and documentation for tensorflow you can visit tensorflow.org and you can see the cool kinds of playgrounds and the datasets they provide for quick experimentation so we are importing tensorflow and in tensorflow we have this sub module called keras which it comprises of other functions or smaller functions that's going to help us build a model with even fewer lines of code so we can say from tensorflow from dot keras dot models import sequential so sequential is a class or an object that basically holds the different layers for a neural network so neural networks they consist of multiple layers and information passes through each of the layers and performs different computations depending on what kind of layer it is and sequential is just basically a container for all of these layers so what we'll be doing is we'll be using this thing called the dense layer which basically represents one neuron or one set of neurons or one layer and we call from tensorflow dot keras dot layers import dense dense yeah so so far does anyone have any questions with the preprocessing steps because now we're actually going to go we're going to dive in to creating tensorflow models and passing in data through it next since we have called the modules we need for tensorflow we call them and yeah we click a new cell model equals sequential again i said that sequential is an object which basically represents a container for the layers and since a single layer perceptron is consists of a single layer it will just be doing model dot add so add is a function that basically takes that layer and dumps it into our sequential container dense is an other object which basically takes in the the input and gives the output so earlier i said that it assumes a y equals mx relationship and then applies the activation function on it right so in this dense function we'll be doing exactly that so over here there are two kinds of flowers that we're trying to predict either iris virginica iris atosa so we're going to put the number of new with number of output gates as two or the output units as two so depending on the probability value for either one of those two units we'll know which kind of flower it belongs to next we have the input shape so what input shape is is is the dimensions of the training data that we're putting into it so since here we're using only four attributes we'll just say four comma because four represents the petal length petal width sepal length and sepal width and then activation so earlier in the slides i mentioned that we're going to be using the sigmoid activation function which basically takes really large values and squashes it to a value between zero and one such that it gives us a probability value of what the inputs could correspond to next up we have the compilation step so in tensorflow you need to compile the model basically we need to give it the loss function so the loss you can just type loss equals mean square error so this formula i showed you here for the loss function it's called the mean square error formula or the mean square error loss function there are all there are different kinds of loss functions that you can find documentation for it on the tensorflow website you have things like hinge loss absolute loss categorical cross entropy but all these are the terms we won't be discussing it today so mean square error is our preferred choice of loss function epochs is sorry optimizer so the optimizer is out of the scope of this workshop but what the optimizer does is it basically makes the running of this algorithm much faster depending on what kind of system i'm using so here i'll just be using adam adam stands for adaptive momentum which is another machine learning term but we won't be getting into that you can ignore the optimizer bit but add it in your code and then the for the metrics what metrics are is basically what we're looking at for the model so of course we're going to be looking at the mean square error so through gradient descent and back propagation you should observe that the loss values keep decreasing over time so if i took the loss values from our original linear regression code and i was i was actually supposed to print them out you can see that the values are decreasing which shows that the line is becoming closer to the best fit line because it's giving a smaller error and that's shown by the decreasing values so for the metrics we want to observe the mean square error or that difference and we also want to measure its accuracy for the model so now that we're done compiling it you can actually in TensorFlow you can actually get a summary of your model it actually prints out this really detailed table of your model yeah so as we can see we passed so this is like the container you can assume it to be a box and then the different layers are basically things that we put in into that container so over here we passed in one dense layer or one single layer for the single layer perceptron which is shown over here and we can see that we're giving two we have two output units one for iris etosa and one for iris virginica so the one with the higher value at the end shows what flower type it belongs to next up we can finally see uh we can actually train this algorithm now that we have the x and y variables in our hands we have the model we can say model dot fit so i'm storing this more this training cycle in this variable called history or training history uh the training history is going to contain our metrics and the values for the mean square error or that difference as well as accuracy and i pass in x y and just as in linear regression we pass in the number of rounds so again the football player when he's kicking the ball he takes multiple rounds to finally score the goal uh so similarly we have multiple rounds of optimizing the values uh so here we're going to just take something like 50 because since it's a neural network and it assumes a non-linear relationship the good thing is we don't need to train it for something like thousand rounds we can train it for something as less than like as close to 10 or 50 depending on how much data you have and finally if you were to run this you can actually so tensorflow gives you this really nice visualization of your training cycle so for all these 100 examples and for these epochs it completes all of them and it actually gives you your metrics so over there we said we wanted to see its mean square error or the average mean square error and the accuracy for the training cycle for each round and if you actually compare you can see that on average these values keep decreasing and this shows that our model is actually learning because the lower the loss value it shows that the models learn better because the predictions are less off or less different from what it's supposed to be and we can see that the accuracy is also improving which is a good thing and wow we actually achieved a 98% accuracy on this so when you so this shows that the training cycle was really good and that the model has performed really well so if you actually take this model and you were able to put this into production and you can actually send it so that billions of people can use it across the world it would be really accurate in its predictions which is a good thing next we want to see whether the loss values are decreasing so so the training history variable we can convert this training history object into a dictionary by simply typing in double underscore or DICT a double underscore and that basically gives us a dictionary form of our training history and then we can see this dictionary over here and again if we access the history key over here so i'm assuming that most of you know how to work with dictionaries in python so dictionaries consist of a key value pair where each value corresponds to a specific key so similarly in this object over here in this dictionary object we can see that it has different things like do validation metrics uh params which is basically your parameters uh and we'll be accessing the history key with it and under that we'll access the loss and if we print it you can actually see that the values are decreasing over time which show that the model is training really well and if you were to do some further analysis on this you can actually put this into a variable called losses and for epochs or rounds you can actually use numpy there's this function called a range which just basically gives you uh one two three four all the way to that number you specify so over here we have been training it for 50 epochs so we'll just put this as 50 we'll run this then uh we'll import matplotlib again for some graphing to see if the loss is really decreasing over time and to see performance and if you were to plot the losses against sorry the epochs against the loss you'd actually get this and we can see that the loss function all keeps decreasing throughout our training cycle we don't see any sudden peaks that show that the model suddenly messed up in that training cycle which shows that a model is either ready for production or is really good but the problems with machine learning uh is that sometimes when we have too few of training instances here we only had 100 and it's actually surprising that we got the accuracy of closer 98 because when i tried it earlier i got something that was less than 50 percent accurate so we can see that the number of training examples you have show how well the model is going to be when fully trained because the more number of examples you have the better the performance which is analogous to doing math problems for example you're really bad at doing one type of math problems the more you solve those kind of problems the better you get so that when you do the exam you can actually score really high marks on those kind of problems which shows that you trained well on that kind of problem so now that we've finished uh building our single layer perceptron uh that is it for the coding part so now it's time for some concluding words now that we've finished building uh linear regression and single layer perceptron we can actually see that for real life cases in different use cases across different industries machine learning is really helpful but the thing is currently machine learning is at a state where it's still inaccessible to many which makes me really happy that you know youths like from junior college all the way uh the professionals are coming for these kind of talks right so shout out to all of you because you're uh you're progressing and uh putting your best foot forward so as to learn machine learning you can apply it to any dataset you have any real life problem and with the power you now possess to build machine learning algorithms you can solve major problems that the world is facing you can optimize it for anything you want but the problem with machine learning is that you know as we can see from the linear regression example there's lots of math involved lots of concepts that are really abstract for some to understand which is why it takes lots of time and effort to learn but uh this workshop is well an attempt to democratize AI education so a professor at stanford university professor andrew ang he was from raffles institution in singapore he went to stanford and is now a professor there he his big objective of big aim is to democratize AI put AI or machine learning into the hands of all different age groups so that anyone is capable of solving world world scale issues and problems and this workshop is one of many steps taken by governments and organizations around the world such that anyone can uh be motivated or interested in learning machine learning and computer science in general because you guys are the wizards of tomorrow and the solutions you have they are limited by your imagination and with that you can thank you for attending this workshop uh if you have any problems or any issues or you just want to chat about anything related to tech machine learning you can uh you can contact me on either github uh linkedin medium or uh twitter uh so yeah and then if you're a student who takes computing at school uh i mean if you take computing in secondary school or for your a levels and you are keen on writing the computer science wave or you're generally interested in teaching others how tech works and how computer science is able to change industries uh building blocks as i mentioned earlier it's uh well hiring so uh please do contact us if you're interested in joining giving these kind of talks because not only is it an investment in your portfolio you're enabling others to learn and improving their lives as well uh and with that uh yeah thank you all for attending this workshop thank you any questions uh single layer perceptron related or linear regression related anything in general about machine learning uh hold on let me just get the mic to you so that everyone can listen i was i was thinking about chaos theory and is there any um the ways to actually go along with that chaos theory uh so the question was okay so there's this thing called KL divergence which is a really complicated machine learning terminology uh KL divergence is one kind of problem that really hasn't been optimized yet uh i mean to all of you who uh kind of have a knowledge of machine learning you may know that it's an unsolved problem which we're trying to get there but as of now the state of computing uh KL divergence hasn't been solved but that's a really high level question uh you do not need to worry about that if you have any other questions anyone all right then i think we can wrap it up for tonight it's uh friday night been uh really long day for most of you so yeah thank you all for coming i'll just be like standing over here so if you're too shy to ask here you can like talk to me here