 Hello everyone. This is Alice Gell. In this video, I'm going to discuss some limitations of perceptrons. Let me talk a little bit about history. From the 1950s to the 1960s, there was a huge hype around perceptrons. At the time, people believed that AI is solved if we can make computers perform formal logical reasoning. Recall that we can use perceptrons to represent simple logical functions. Like and or or not. And these logical functions are the basic building blocks of a logical deduction system. This made people really excited about perceptrons. Unfortunately, in the 1960s, people discovered some limitations of perceptrons. This discovery had a huge impact on the research towards artificial neural networks. Despite the hype around perceptrons at the time, some people were skeptical about it. For example, Marvin Minsky, who was the founder of the MIT AI lab, and Seymour Poppert, who was the director of the lab at the time. Minsky and Poppert were skeptical about perceptrons, so they studied them, trying to understand the limitations of perceptrons. Other things that we cannot do with perceptrons. They found a significant limitation, and they wrote a book about it. The book is called Perceptrons and Introduction to Computational Geometry. The book says that it is not possible to represent an exclusive or function using perceptrons only. We need a deeper network with at least two layers. Recall that the exclusive or of two inputs is true whenever the two inputs are different. They're either 1 and 0 or 0 and 1. Now, this fact by itself is not a problem. If we need a deeper network, then we can construct a deeper network, and train it to learn our target function. The problem was, at the time, nobody knew how we could train a neural network with at least two layers. People only knew how to train a perceptron, which is a single-layer network. So these two results combined together suggest that pursuing perceptrons may be a dead end. It was believed that this book, this significant limitation, led to the first AI winter. There was a freeze to funding on neural networks. It also became challenging to publish any research paper on neural networks on perceptrons, since people didn't think that it was a promising research direction. I found this story very interesting. While it may appear that Minsky and Popperd saw the research progress on AI, I believe that they made a huge contribution by discovering and publishing this significant negative result. Although the result temporarily stopped research on artificial neural networks, it motivated researchers to eventually develop efficient algorithms to train multilayer neural networks. I wanted to share a personal story here. One of my favorite papers from my PhD was a paper on negative result. At the time, I was very interested in a class of algorithms called the peer prediction algorithms. These algorithms are used to motivate people to contribute their opinion about something honestly. Most work on peer prediction methods were theoretical. People proposed many variants of these methods and proved that they had nice properties. Of course, they also had some undesirable properties, but theoreticians who were working on this method dismissed the undesirable properties. They claimed that these properties won't arise in practice. I was skeptical of this. I believed that the undesirable properties would be problematic in practice, and I set out to prove it. With a few other people, we designed and conducted an online experiment to test these methods with real people. Our program allowed people to play a game in real time and rewarded them using the peer prediction methods. And our results showed that the methods didn't work in practice. They had significant problems and the participants recognized the problems and took advantage of these problems to improve their rewards without doing the work. So this huge negative result of mine helped push the field forward. After I published the result, people developed improved peer prediction methods that no longer have these undesirable properties. This is one reason that I'm very proud of this work. My takeaway from this story is that negative results are not all bad. A significant negative result may push the field forward by motivating people to develop better ways to solve the problem. Let's come back to perceptrons. Given history, there are two questions you might be thinking about. First, why can't we represent the x-word function using perceptrons? And second, how can we represent the x-word function using a multi-layer neural network? It turns out that we only need a two-layer neural network to represent the x-word function. First questions first. Why can't we represent x-word using a perceptron? Let me give you a short and intuitive argument. A perceptron is a linear classifier, given that we use the step function as the activation function. However, x-word is not linearly separable, so we cannot use a perceptron to represent x-word. Let's take a look at this graphically. I'm going to draw three graphs corresponding to and or and x-word. Each logical function is a binary operator, and the two axes represent the two inputs to each function. So we have four data points for each function. I've drawn all the data points on the graphs. A shaded circle is a positive example, so when the function is true, and the empty circle is a negative example when the function is false. For and or, it's easy to see that we can draw a line to separate the positive and negative examples. Many lines will work here. I've shown you two examples on the two graphs. Now for x-word, it should also be easy to see that there is no way that we can separate the positive and the negative examples using one line. If we really want to separate them, we'll have to draw a circle or using two lines. I'm showing you both examples on the graph, two red lines or the yellow circle. This means that x-word is not linearly separable, and we cannot use a perceptron to represent x-word. It's possible to prove this rigorously. For example, you can use a proof by contradiction. Start by assuming that we can represent x-word using a perceptron. This will lead to a series of inequalities, and the inequalities should lead to a contradiction. If you're interested, try writing down this proof yourself. I've included detailed explanations of the proof in a separate video. Question two, how can we represent x-word using a two-layer neural network? This slide has an example of a two-layer network. It looks like we have three layers, the input layer, the hidden layer, and the output layer. However, the input layer is not a real layer since it is not performing any computation. It simply contains the input values. The middle layer is called the hidden layer since it's not visible. The input and output are the two visible layers. If we have a bigger network, then any layer in the middle is a hidden layer. Given this network structure, how can we set the weights such that the network represents the x-word function? For now, the best approach to solve this problem is to reason mathematically. Think back to your logic class. Can you break down x-word into simpler logical functions that we already know how to represent using a perceptron? Once we do this, we can set the weights to represent the simpler logical functions. Pause the video and try solving this problem yourself. Then keep watching for the answer. Here is one possible solution. There are many solutions to this problem. I've rewritten the x-word function as a combination of and or and not. We already know how to represent all of these simpler functions using perceptrons. Please watch a separate video for detailed explanations. That's everything on the limitations of perceptrons. Let me summarize. After watching this video, you should be able to do the following. Explain what caused the first AI winter. Explain why a perceptron cannot represent the x-word function. Construct a two-layer neural network that can represent the x-word function. Thank you very much for watching. I will see you in the next video. Bye for now.