 So we want to slowly, but surely start with a serious stuff and Then go deeper and deeper and hopefully before reading week. I want to I want to have finished the neural network part So I start after reading week. We can do other stuff Okay, today. We will have the SVM Tutorial I'm able to do that. So since I am doing 100% conventional lecturing just on the board So I don't have any visualization aid to reinforce the stuff. So tutorial becomes more important because The TAs will show stuff that I cannot write on the board and I don't want to use laptop I just is not my style. So we'll have some benefit if I show from time to time But I guess the tutorials are for that purpose Okay, so let's let's get serious So the problem so when when we get serious we always start with the problem. So what is the problem? so the problem is understanding the input output input output relationship so That's fundamentally what All sciences deal with but we have some additional additional challenges So understanding the input output relationships Whereas f of x is unknown Well, if we knew f of x and X could be a vector of course not just a scalar because then the problem becomes too simple So if we knew f of x we wouldn't be here Nobody knows f of x. So nobody knows that function There is no equation for it that describes the relationship between in an output if it is about a tough problem So instead what we have is data from the past That's all we have So you don't know f of x But you have data from the past and data can be anything can be text videos images numbers symbols actions anything So which means Again, if you go back to the simplistic world of x and y two dimensional coordinate system and then we do some measurements so If you do some measurements And the question is can you find a function? that Basically fits the data So this f of x which is not actually the f of x that it should be So therefore we don't call it f of x. I'll say this is f of x hat. So that's an estimate Fits the data How can we find a function that fits the data then basically if you grab if you grasp the Tendency of the data and I look at it the problem is we cannot look at the data when it gets hyper dimensional So if I look at it and say, okay, it's going this way. So I see the trend I can I can do some statistical magic and get the trend So I can just go and find the average here and then connect the averages the simplest the simplest Fit you can possibly come up with very simple As long as you don't have a non-linear nasty non-stationary complicated chaotic outlier driven noisy image, you're gonna be fine But those those nazi nasty stuff usually happens. So we have to Deal with that in a different way So this is an estimate of the unknown of the unknown function F of x so nobody knows f of x Okay, what is the simplest solution this? This is not new AI people have not invented that so we have we have solutions for that So a simple solution that has been around for quite some time is of course linear regression You know regression have has been along for quite some time. So if you look into the dictionary To regress means moving backward moving backward or reasoning backward Reasoning Backward so basically It means learning from the past and the past is a has a special place for us so the past is everything we have in the AI and We say as humans as the members of the species Homo sapiens we learn from the past, which is of course not the case. We have never learned from the past Every generation makes the same mistakes again and again in new forms We never learn if you would have learned from the past there would be no war no poverty no demolition of environment nothing so no violence everyone good because we learned from the past no so Maybe a arm and maybe this is a high is the projection of that Corner dark corner of the human spirit. Okay. We cannot learn from the past. Let's create something that can learn from the past There you go. You've got AI so fitting the data fitting the data from the past Basically you say that my function y is An alpha plus a beta x plus some epsilon. Well, okay Of course, this is very simple and this is for us Something that should quantify noise. So we add something as you see here. This is not really a clear curve So it goes up and down is is jittery. So of course, so it's some noise and Then we have a linear combination of Alpha and beta so we have a linear combination of this Addition is a linear combination, right? So I haven't had a linear combination of two parameters alpha and beta Which will give me a line or curve that fits into my data from the past. Of course This is ridiculously simple, but we want to understand the basic concept so and Even that if you really want to formulate stuff even that may be too much So we still simplify it. Okay. Let me simplify it further and I put epsilon to zero. I Assume I assume I'm in a perfect world There's no noise. There's no fluctuation Like this so because I want to I want to come up with a mathematical model is about linear regression Let's say AI has not been invented yet So I want to I want to I have the data from the past the function is unknown I want to do something about it. I have what what tools do we have you have to come up with a mathematical model? What is a mathematical model a set of equations? so then You will get why is alpha plus beta x Which is you are saying what is the expected value of y given x and That is alpha plus beta x Which we can write Cautiously as w sub o plus w sub 1 x why well because I Want to connect the old and new Between the linear regression and the non-linear regression that we will talk about namely all networks Because in the neural networks, we are in love toward to use the letter w as the ways Because if I use alpha and beta that's that that doesn't tell everybody I'm doing something special I want to be special. So I use Delta I use w w o w 1. I have some bates Okay, whatever makes you happy. So So what what is what is then the issue? Now here we get at the conjunction of history and Some of us go this way some of us go this way. So either you get conventional and Whatever you try to come up with some equations for it estimate the equation or you say no I want to go model free. So you can solve this with a model or You can solve it model free So let's see. I want to I want to do once once I want to do a model And then we don't do models anymore because we are AI We don't do models AI is model free and when we talk about model again is set of equations People people talk about model. They don't know what what what they mean with model is a set of equations Okay, so what is intelligence intelligence is To find W o and w 1 so we have to accept that for the time being So what we had that in a model free way. I'm not writing that so if you if you do it in a model base way That's not very intelligent Because there will be no dynamic adjustment in it so Then we we write it in a more fancy way. So what is the expected value of w o and w 1? given x Now I want to formulate it in a different way that I can come up with a model So what is the expected value of those weights? When x is given what is x data from the past? so this is half of the sum of I Goes from one to capital N number of measurements in the bracket Y sub i minus W 1 x sub i plus W sub o Squared so We do not freak out. So some of a squared some of differences. This is we see this pattern again and again. So this is my desired Output of Course I have the data from the past I have x and I have y You say we know that from the past for x 1 x 2 x 3 x n I get y Again that table So you know what it should be and you are guessing it. So this is your estimate So you are estimating it by looking at this and this weights And Of course, I want to see when I estimate it Am I far off from what it should be? So this is the desired output this and this ideally should the difference should be zero if The difference is zero not just for one measurement for all measurements. You have a million measurements from the past you sum them up Over all of them the difference should be zero. So you have a perfect fit. That would be fantastic Can we do that? So if you take this and you now I'm doing conventional modeling. So I need I need a set of equations. Yes So again, if you are being probabilistic cautious and say what you find it not is not the actual value is the expected value because Depending on that x 1 x 2 y the table that you have you get a sample You don't have the entire population. So what you get are expected values are not Really the mean or the actual numbers that it should be So this is considering the uncertainty the imperfection that actually we talked about So I can just write what is the error? So I can leave that out and say what is that the error is The difference between desired output and what you calculate as output So, how do we do that when there is no when there is no AI? What what do we have even for AI not just without AI? What can how do you know things change and then how do they change in my favor? So you have to build derivatives Here in case partial derivatives because I have two variables It's sub I and why sub I are fixed. They are data from the past What is supposed to be detected or w o r w 1 so I have two variables So with respect to W's I have to build the partial derivatives and simplify Now I don't want to go through that so if you build the partial derivative with respect to w o and partial derivative respect to w 1 and Do the magic simply by bring this to this side bring to the that side It should not be a big deal to build a derivative of this right so the derivative of e with respect to w o and derivative of e with respect to w 1 Then this is for me a constant. This is a constant. So I have a power 2. I can just build a derivative It's not a big deal. Don't get scared by the sum Just just Think it away when you build a derivative and then after you build a derivative put it back in so You do this for fun this weekend. So Build the derivative and then simplify it and see whether you get what I get So after some simplification you get this the sum I the sum of y sub I is n w o plus W 1 The sum of x sub I is all up is over I From I equal 1 to n and you will get for the other derivative the sum of y sub I each sub I over I is w o Times the sum of x sub I over I plus W 1 The sum of x sub I squared over I So I build the two derivatives Simplify them Bring them here and there. I want to have y and x on one side and everything that has w on the other side Why because everything that we have a disposal is linear algebra. I want to write them in make vector and matrix form After a model which is a set of equations and writing it in matrix form will make my life a lot easier Because I want to come up with a model how to do that. So if I do it this way, which means so then I want to write The problem in the matrix form Which is a w is equal y and of course This is a matrix. This is a vector. This is a vector So I want to write it in that way That's the reason that I build a derivative and then I simplify them I brought y and x which are constant on one side and w's on the other side Okay, now I can put it in matrix one This is why some AI people don't like Too much math, which is a dangerous tendency Because creating a model is a lot of mathematical work and most of the time we cannot even do that so a Would be look at this here n w 1 sorry n the sum of x i the sum of x i the sum of x i squared so n the sum of x i the sum of x i and the sum of x i squared w of course is W o w 1 a column matrix and y is Also a column matrix the sum of y sub i the sum of y sub i is sub i is the other side So this is why? So then I have my weights w o w o w 1 w 1 and I have my matrix a So I I wrote it in terms of a matrix why Well, I want to calculate the model that line that curve has to be Estimated for every time that you give me data from the past another problem. So I I'm Generating a general model This is the model not exactly one one more one more thing is missing. How do you get w's? well, if I write it this way then w is the inverse of y times y In verse of a times y that's the solution Well, life has a structure life is good So simple Sure if your problem is behaving that that makes us happy so Somebody says look you started with x and x is of power one. Is that the reason that you call a linear? Okay, okay. What about this? What about y of x is? equal w o plus w 1 x plus W 2 x square, huh? Now it's not linear anymore. Sorry. Is it still linear? It's not about the exponent of the variable is About that you're still working with a linear combination of some weights. This is still linear Forget about this exponent exponent has is a fixed number my problem is Assumed still to be linear So you can have x cube and x to power 125. It doesn't matter If you go with this type of line and you have alpha and beta and gamma you have w o w 1 w 2 w 10 You are still a linear model So if this is Still a linear Regression then do we do Nonlinear regression Nonlinear regression What is nonlinear regression? So let's take a guess. Do you think faced recognition is a linear or nonlinear? Of course, it's nonlinear Illumination change the way that you are looking you're smiling you are crying you have a baseball cap is highly a nonlinear Guessing somebody has cancer or not. Is it linear or nonlinear? Of course nonlinear All interesting problems was all of them are nonlinear and the linear regression But this this is fantastic When you see this you can just walk through the floors with confidence. The world has a structure. I have some matrices They solved my problems But my problems are easy So how do we do this? Well at the moment we have a major one and That's no all networks. So no all networks are nonlinear regression So if you don't know their phobics and it's not the problem is not linear There is no way you can solve it like this. There is no there is no matrix formulation for face recognition for object recognition for robot navigation For financial forecast Not gonna happen All interesting problems are nonlinear. Okay That's okay. That's that's fine. So We can start and see what we get and what we do so go back to history 1943 my color and kids they come up with the idea of neural networks as intelligent machines as intelligent machines But what do we know what they are not really so from the raw idea to actually something that we can build it Takes decades and centuries 1949 Some ideas for rule for learning. How do we learn if I cannot put my problem in Form of matrices and vectors and then do my magic inverse transform inverse matrix and solve it So that means I have to learn the W's and We said that this learning this is intelligence Okay, you have to learn something How do I learn you learn from the past? You are not a human being you are an AI agent You can learn from the past so an rule for learning Which learning is Adjusting the weights Learning is adjusting the weights. I can I can go one step further Intelligence is adjusting the weights when there are no matrices How do you do that our entire mathematics becomes useless for this case So whatever can help me out it cannot be stupid 1958 you get Frank Rosenblatt now you're talking about supervised learning supervised learning Which is Perceptor I don't want to go deeper than that into the history Many people come so the idea of Frank Rosenblatt was still raw What he just Hitted that was it so take take the model the normal model of the humans or any central nervous system of any animal basically We are not very special As much as we like to think we are special Any central nervous system take the central nervous system or zebrafish You get something you get some insight into that So there is a building block for learning and he called it perceptron Perceiving automaton So perceptron So he did his PhD on this and this is the first time yet to see people draw biological Diagrams and figures and use underneath some equations. Oh, that was that was insane That was insane These are these are things you know that if you are in AI and it's Sunday morning and you're bored and you say Let's go back and read the PhD thesis of Frank Rosenblatt This is the sort of that did you do is you read that say oh my god How many years ahead of their time where these people? How how did they do that? How can you think outside of the box? So everybody runs after deep networks. I say oh Let's do shallow network Who does that? Nobody we always go with the flow We always go with the majority these guys did not go with majority the majority was doing this Okay good So one of the things that this guy's there they try to look at They try to look at so they say okay, let's let's go back and look at a look at the neuron Because we by then we knew that a neuron is the building block of The central nervous system So if this is a neuron, so this is the cell body and we have the nuclei Nucleus or nuclei of all of them and then when we leave we call this guy's dendrites Which are the inputs Into the neuron so they bring in electrochemical signals into the cell body and Then this guy is a special one which is an axon Which is basically the output of the neuron and then a lot happens when you look at the axon you see that there is Axon splits in many many many Splits what we call arborization Arborization So this signal gets split and you give the same signal to many others to many other neurons and You also get many many inputs from other neurons, of course This is a very simplistic image that I'm drawing Please do not take it too seriously. This is still an abstraction and then What is the magic of a neuron as a special type of cell? in the in the brain of any animal and Including the Homo sapien animal. So you see that there is axon from other neurons axon from Other Neuron and this axon comes and has also Arborization and then something magical happens. So then you get a synapse So where the axon connection of a neuron gets connected to the dendrite of other neuron and This synapse is what we in AI we usually use W to model that so that's synapse So between this between these connections a lot of electrochemical processes happens We have some understanding of most of it some of it. We don't understand what we know that This magical if the synapse is a strong enough a Signal goes through so We say that neuron is working in excitatory mode if the neuron is excited that connection is excited If it is not a strong enough then it will inhibit the signal Then we say the new one or that connection is in inhibitory mode. No signal is getting through now of course I Have a synapse here. I have one here one here one here one here one here one here one here one here one here So I can go crazy one here one here one here one here one here one here And I have one here one here one here one here one here one here one here one here one here And this is a kindergarten Diagram of the actual neuron so imagine those synapses to be light bulbs and You have a million of this stuff a million imagine you have hundred neurons and You connect every neuron to other neuron and the synapses are modeled with light bulbs and Now let them then turn off turn on well Magic it starts to happen So you can encode a lot of information How much what? intelligence is changing The synapses Now this is not AI anymore. Now. This is neural science and Therefore, I don't go deeper because I'm not a neuroscientist I'm just a some Amalgamation between an engineer and a computer scientist, so I'm not a neuroscientist The little I have learned is enough for me to develop a good understanding of the abstraction that we use in computer science so, okay So it could be in excited mode Send the signal the light bulb is on or in the inhabited mode no signal the light bulb is off and There are many light bulbs So imagine you do zero and one light bulb off is zero light bulb on is one Put that put that in an in a vector or in a matrix or in a tensor How many possibilities can you encode things get interesting how many neurons how many Neurons, let's look at two examples Zebra fish well Around 250,000 250,000 neurons what about your your Guess it's fair to say that you're the most linguistic species on the planet. So what about humans? well adult humans Because kids and young people still grow and we didn't know I even up to I don't know 10 years ago Maybe 15 years ago That the brain development does not stop until you are 24 25 Which puts many things in question. So is is the can we call somebody with 18 adult? I don't think so not according to neuroscientists But what I don't want to put that on young people because then you cannot drink beer until you are 24, so so What about what about adult humans? We have around 10 to 10 to 10 to 12 neurons In our brain so And you hear different numbers because it's all based on estimates 80 million to 100 million Sorry 80 billion 200 billion neurons so Is that it is that is that that so I have 10 to 10 neurons in my brain and that's why I Have the deceit to say that I'm an intelligent being It doesn't look like it. It's not about number of neurons but more importantly 10 to 14 Connections or synopsis Because the number of neuron is one thing number of synopsis is the number of connections between neurons So you have 10 to 4 you have very more synopsis that you have new ones Okay synopsis that increase The potential Increase the potential what potential what do you think electrical potential? So increase the potential are in the excitatory mode or state and synopsis that Decrease the potential Decrease the potential Habitory in habitory Mode or state so again light bulbs on or off and And apparently the number of the interconnectedness of neuron is what makes the brain a powerful machine Machine is an insult actually machines are stupid machine have been around 400 I don't know maybe the simple one mechanical one what 400 years 2,000 years whatever The central nervous system has been around for at least three billion years So it's not a comparison. So we do not insult the evolution by calling that brain a machine Good so one thing we know from neuroscience is that the synoptic Networks or plastic the synoptic networks or plastic This we didn't know even 10 years ago 15 years ago most people you're good to psychologists And say I have a problem with some people. I don't know I'm depressed something anything. I said sit down. Tell me about your dad Did your dad ever say he loves you? So you would go back and I say, okay, so it's too late your your personality has shaped you cannot change anyone So we seriously thought in psychology that with six years when I'm six years old. That's done That's a done deal. My personality is shaped Well, neuroscience is nonsense You can't change if you're 92 years old. You just don't want to change This is what neuroscience is telling us because the synoptic networks in human brain are plastic They can change It becomes harder. It's easier to learn memorized stuff when you were 10 and it becomes much more difficult in my age But it's still it is possible. You know, so I'm smoking since 40 years. I don't want to give up I'm just used to it Get over it You can do it if you want you just don't want to change because change is not convenient And also in AI change is not convenient. I need GPU. I need design. I need tons of low change is never easy But you can change. Yes plastic No plastic This is not my term. It's come from neuroscience is plasticity So things are plastic, which means they can change not not this plastic. I don't know. So plasticity is the most I guess it comes it comes from It comes from neoplasm and Plasm is the new material that shapes in human body when cells Reproduce themselves and that plastic is with not this plastic The plastic that is born and can change and is flexible and has mitosis and meiosis and all that So the plasticity is the most obvious manifestation of intelligence if if if you want to really nail it down and say, okay, tell me really what is intelligence plasticity What does that supposed to mean that you can change So does it mean I'm not changing so I'm an idiot most likely According to neuroscience. That's not an insult. That's a scientific fact If you sound used to it, I just Other words that the other favor arguments. I just like it What do you mean? You just like it if you are not willing to give up something So if I have a some sort of network Let me see I can draw some sort of random network Let's say this is let's say this is a subnetwork in the human brain and I will try to redraw this Is something random that I did so it may be a bit difficult To exactly draw that so the same network and Then you learn. Oh my god. This is so nice To smoke it's just a lot of pleasure when you are upset when you are happy When you drink something when you eat something you name it So every time is a lot of pleasure to smoke So and then it's get encoded so our mind get conditioned and Everything is physiological. We can put it. This is this hormone. This is that hormone We may not be able to pinpoint this exactly but in functional MRI on other type of imaging We can show that this type of brain is active when you do this We cannot really draw this type of sub circuits yet Then you get serious with life and say come on get serious Get some patches something And then you do some serious thinking as I know. No, okay. No, I'm 35 years old. I'm just destroying myself so No is garbage to smoke. So same network But you reprogram it literally you reprogram it. What did change the synaptic conditions? And one thing that human brain does which is very sneaky In a spirituality they call it self ego. It's very sneaky So whatever you like There's something that they call it my land sheath. So you you wrap protective material around this Such that nobody can come and randomly change this network because I like to smoke I'm I'm intentionally taking the smoker example because people say what that's an addiction What do you think addictions are? I'm not talking about the ones that are genetically conditioned and People cannot do anything about it So if you put my land sheet around the connection, it gets almost impossible to change them It takes additional effort to get rid of the protective layer first and then try to change the value of the connections to this and say Smoking good smoking bad, but neuroscience is telling us you can't do it Virtually for everything virtually for everything anything any trade any attribute any feature any way of life any style You can change it if you can't do it. So People who change constantly we have actually a bad image of people who change constantly You say this guy just goes with the wind any anywhere the wind comes he goes there Maybe he's a smart guy Adjust it himself to the circumstances So here the same network can exhibit The same network can exhibit different subgraphs of Connections so these are the red ones that I drew of course This is a made-up example, of course is not that simple of course things are much more complicated Of course the network that is Responsible for smoking is much bigger than that and in first knowledge from the other parts of the brain and gets reinforced to other habits And so on and so on yes But this happens and this is what we call conditioning of the mind. So our mind gets conditioned programmed to behave in a certain way And this is is a common sense that if you behave in that way and do not change Even if you see the the damage of it this can hardly be called Intelligent behavior. Okay Sometimes we have to do this because while you say artificial intelligence How do you can't talk about the artificial version of it if you don't have a basic understanding of intelligence itself So now we have an objective measure I'm not talking about this nonsense of IQ measuring you go to a website you answer 20 question and I says your IQ is 132 so and then that might you sleep well with oh my god. I'm so smart of Course is nonsense. You cannot measure intelligence What they measure is some cognitive capabilities Nobody can measure it The mere fact to attempt to measure intelligence is a stupidity to assume that intelligence is so simple that we to 20 questions You get it and say, okay, this is this Cognitive capabilities are not intelligence or part of it contribute to it or some manifestation of intelligence But they are not intelligence okay tangent but Important but necessary Because you want to go with open eyes Okay, so got it got it so neurons are important synapses are important Excited inhibited signal goes signal doesn't go. Okay Back to AI. What can I do with it? Well, you need an abstraction of neurons if neurons are the building block of our central nervous system and We assume adjusting the synapses constitute intelligence How can I bring that into the computer? How can I bring up with a with an abstraction a computer model of the neuron? so we said that of course you will get some inputs and And these inputs are generally synapses from other neuron From other neurons So most of the time could be that some neurons directly take input from the sensory information You touch something you see something you smell something and that's direct input, but Absolute majority of them take input from other neurons. So the processed signal so this input goes in into some abstract model that we have no idea how it looks like and somehow You have to accumulate Accumulate the signal The signals All of them are signals somehow it goes inside that that simplistic picture that I drew from a neuron Many connections come from the axons of other neurons Thousands tens of thousands hundreds of thousands of connections go inside a neuron So you have to just take them in accumulate them somehow All that information. This is excitatory excitatory inhibitory excitatory many of them are firing as we say which means they are on and So if I accumulate stuff the basic engineering principle with me I cannot say even if each of the synapses if any connection comes with 0.001 milli ampere Or even I don't know one micro Amper and you accumulate that oh that can explode How much signal can you get? So you need some sort of limiter all those signal comes You know if I'm connected to 10,000 others that send me signal and I just accept them I have to be selective at some point I said, you know guys I cannot take more than this and Then if 100 more scum and knock on my door, I don't care Because I already got the signal that I wanted Then something goes out which is the output Which of course this output will go to other neurons Well We came up with this and We said, okay, you know what our it would is x1 x2 x3 up to xn They come and go inside go approach a neuron and Each one of them has a synopsis Each one of them has a synopsis w1 w2 w3 wn and Then we just add everything What else can I do signals come in I can add them because you're just giving to me 2 milli ampere 4 milli ampere 6 milli ampere 8 milli ampere. Okay. I'm just adding them up and Then I need some sort of limiter Let's say I don't know something like this a function like this a sigmoidal function. So a function That limits the input. So let's say it limits them When we say limit so the limit is here So even if you send me a million other signals, I cannot it cannot go beyond that So which means every neuron has a saturation level So by the 500 signal, I'm saturated and I will fire because I get enough signals from others to fire So and then something comes out of why yes No, this is our logistic function Sometimes we call it logistic function. Sometimes we call it activation function logistic or activation function So the logistic or activation function is a limiter is a threshold. So I Add them up, but then at some level and say that's enough. I cannot take anymore Otherwise, everything has a capacity The hard disk is full at some point Everything has a capacity something has to give so when I get enough just I fire So I send a signal as a guys I get enough my Light is on Okay, so it's crucial that even though simplistic we understand the abstraction Because if you don't understand the abstraction, it will be really difficult then to follow up with the older a little bit of mathematics that will come so Let's call this a perceptron. So a perceptron is the most basic learning machine Now this is in the computer. I can call the machine It's not in the human brain anymore. So is is my abstraction. I can't write a python function for this function neuron inputs weights Sum them up send them to the sigmoid all function get the output big deep, but it can do magic If you put many of them because we are imitating something sophisticated something Majestic result of millions years of of evolution. So in the perceptron the logistic function the logistic function is a hard limiter for example or in general It is basically a threshold is a threshold you say if the sum of Electrical signals that come in exceeds this limit That's a stop. Don't add them up anymore that don't add them up, but I will not send out more than this So which means the sum is the sum of wi xi i equal 1 to m To n here So this n so Take the inputs multiply them with the weights and Just sum them up. So I'm here. So this s my s is this sum so and again synapses could be Inhabitory or excitatory if they are in habitory they go towards zero So zero times x1 is zero nothing happens if they are excitatory Let's go they go toward one and they let x1 to go through Simplistic, but that's the basic principle now however Things may get still out of hand. So maybe maybe I add a Bias here so my maybe I equip every processing unit with a bias that says whatever comes in You added plus bias It's maybe too early to talk about this But I guess all of us we understand that this is a line isn't it that's a line, right? W1 times x1 plus W2 times x2 is a line If you don't give it a bias that line will go through zero zero You cannot shift it around So the bias give me the possibility to draw the line in many different places Okay, well, this is a line which in in general that's it in general It's really wrong to say that's a line, but that's a hyperplane. That's a plane in n dimensions because I have n inputs So that's a hyperplane hence The perceptron the perceptron can separate Two classes you don't even know how the perceptron learns how would how would this Weird looking abstraction even work But when I write my summation, I include that this will be a hyperplane so which means so if if this is my Simplistic x and y axis So then you will have the line like this So this will be one class for example for Wtx greater equals zero and This side for W Transport x transpose x small equals zero So if I draw a line This is one class. This is one class What SVM did that when perceptron was born there was no SVM. So we are going in reverse historical direction So if I draw a line and the bias give me the possibility to move it around So you can play with the slope we can shift it around So you have the possibility to learn and say This are the classes of circles whatever the circle means and this all the The classes of triangles So I can separate them and they are well They are linearly separable, which means they can separate them with a line or hyperplane So if if we have any doubt that this this sum Will span a hyperplane we have we have to do some homework If you don't get this because then everything else we say is based on this and things will get progressively complicated When we when I put two of this together ten of this together a million of this together Then I have to realize okay, what's happening? What what's going on? Okay, so this is what we call the decision Boundary You hear people saying that all the time. This is my decision boundary So if you have a decision boundary, you know this side of the line is this this side of the line is this Cancer healthy. It was that simple. It's not but That's just a perceptron You're just getting started. So let's let's stick bit so which is W1 x1 plus W2 x2 Plus a bias is zero. So this is my decision boundary And I have to find W1 and W2 given the bias and SVM this did that nicely and elegantly and give me the guarantee and okay well that was 1995 so You're still in 1945 So for simplicity we can say you know what xo is plus one and Wo is the bias Such that I can write things in a simpler way and say my sum is actually the sum of X I w I or other way around doesn't matter. So this is more convenient So you usually don't see that you write the bias separately Because the first one we say, okay that the input is one The input is one and then the bias is a certain number you initialize it you go So we never write this because then the things that you want to do things get nasty Again for mathematical convenience. We just work with this and say X o and W o are my biases the first one You have to in you have to incorporate that in your in your implementation, of course Okay, so how far can we get today? You're getting closer to interesting stuff But you're not there yet so now we have to now we have to iterate over the data over the data so s of n is equal the sum of W i of n times X i of n and I goes from 0 to M, let's say the number of the number of instances and is the number of iterations Number of iterations Now I'm learning now I'm learning So which of course I can write it as W transposed At n times X at n Now this is a matrix. This is a vector Who said I should give up those nice matrix operations? I still use them But I will not build a mathematical model. I use them for local operations to have nice efficient implementation, so that's a vector notation of course now Again if I have if I have a class like this so the world is easy Because I can do this So Linearly separable Linearly separable the world kid can get nasty when I get this Anyway, you draw the line you have a mistake. So this is a non Linearly Separable which means what if you have an easy problem you can use the perceptron if you have a difficult problem Doesn't matter how you play with this line Shifted around draw it this way that way you always get somewhere You cannot push the error towards zero is not a matter of adjustment of the synopsis or the rates The problem is more difficult than your solution capability So we also could say here maybe first time that I use this word cautiously Actually, you are underfitting So the problem is more complicated Then you imagine Most of the time we are worried about overfitting, but sometimes you are underfitting Okay, what about the weight adjustment now? This is the entire intelligence we said How do you adjust the weights? That's all there is to it. So Let's say X of n correctly classified by Wm and I'm again, I'm giving up the vector matrix notation. We just figured out from the Context whether I'm talking about a vector matrix a scalar So we say that w of n plus one it will be w of n if W transposed times X of n greater than zero and X of n Belongs to see I So no change Since Classification correct. So if the classification is correct, why should I change stuff? If I'm here, why should I change stuff? If I'm here, I have to change stuff So if I'm making a mistake, I will change it. I should not Touch the weights the synopsis if they are delivering the right action so The weight for the next iteration will be the same weight nothing changes if This is my boundary and it is correct Yes Well, we have we are doing a binary classification. The label is implied zero or one So we don't so I can do that to repeat that for less than zero negative the same thing So we don't we don't change it if this is the case Your boundary decision is this You can also say this and it's n belongs to the CI or C sub j which means negative or positive you are doing the right decision so Either you are on this side or you are on this side No change the world is good No change So if misclassified now suddenly I have a triangle here If if if I have a misclassification, then I have to do something about If misclassified Then the W n plus one is W n minus a Learning rate eta as a function of my iteration times x of n For the case that I'm misclassified somehow so we call this generally a learning Rate so can I adjust the weight somehow? Don't worry about the positive and negative at the moment. So I misclassified I Should get punished. Let's say so that means the weight that I have was not a good good enough the new weight has to be the previous weight plus or minus something as Contribution of the input why input what else do you have? Do you have something else? I don't know about you just work with whatever you have So if we don't have a misclassification We talk about this so much you will puke so don't be afraid that we don't understand everything at the moment Because this is just perceptron then we get to multi-layer perceptrons And then we talk about back propagation and then we get to auto encoder. We talk about back propagation We revisit it again again. So if if correct decision no change if misclassification Then I have to do some change at the moment. I take a baited Substraction of my input value as my adjustment Who said this is the right thing to do? I don't worry about it things that fire together wire together Have told us But we still don't understand what he meant with it, but okay We will get him we will get him when we get to back propagation. He was a small guy so if the eta of n is a Positive eta is a positive value Then we have a fixed Increment adaptation However, ita is actually is not very important. It's just a factor. It's not a weight Even if you change it during every iteration is not a major factor in the learning process as long as It's positive. So as long as the eta which is a factor that the invented we made it up So as long as it's positive. What does it do it? What does it do it? just scales The contribution that's it So You may need a little bit more or less, but this is just scaling the contribution So it's not a gigantically important factor, but it helps we have it in the learning And we work with it. So the perceptron can be proven to converge for Ita equal one So if you set ita equal one, we can prove that this Will converge and do the same job as linear regression with matrices If you have a problem that you can compare most problems that we do with perceptrons actually I'm a little bit more difficult than the ones with matrices, but both of them do linear stuff. So we can compare them so If I put eta equal one, I can prove that this can learn this can learn a line But we already did that with linear regression You patient So as the preload to next lecture, which by the way will not be next Tuesday Next Tuesday we have a long Tutorial and then the next day we have a long lecture because next Tuesday I'm away So artificial neural networks We can basically classify them Into the ones who take binary inputs and the ones who take real valued real valued inputs The ones who take binary inputs We have supervised and we have unsupervised The supervised binary ones are for example Hopfield nets hopfield Nets or having nets We will not talk about them because we don't have time and Their performance is relatively limited because we don't have many examples in reality with binary inputs unsupervised with binary inputs Carpenter Grossberg net as An example so networks that you don't hear much about them Again, because the input is binary How many engineering real world applications that I have that the inputs are binary usually we get just real numbers So the interesting stuff Happened here again. We have supervised an unsupervised in the supervised category of real valued networks We have perceptron. We have multi-layer perceptron or MLP We have CNN and we have auto encoders AE Here in the category of unsupervised learning with real valued We have the lonely fighter So I like so I just like people would fight alone So we don't talk about this stuff Although we could go back and talk about hopfield nets For the for certain tasks, but we mainly are here. We talk about this one. We started to talk about perceptron We will continue with MLP CNN and a which is MLP is shallow learning and That's CNN and auto encoders are Deep learning so we will talk about that not the next Tuesday again next Tuesday will be a long Tutorial and we are planning for that and then Thursday. I will use the tutorial time to do Maybe we do MLP and we get to the deep learning. I don't know But we will take a break for the for both of them because they're Long long events. So see you next week You