 Machine learning is a very challenging area of computer science and engineering and there are many different approaches to building machine learning systems. As yet, there's no real classification for these different approaches. But in his book, The Master Algorithm, Pedro Domingos of the University of Washington provides a coherent and accessible overview to the different methods currently being pursued. The book describes five what he calls tribes, which each emphasize a different method of machine learning, with each being particularly well suited to solving for some core challenge. Domingos begins by identifying five basic methods through which a computer can build a model and then associates these with the different approaches currently taken to machine learning. First is filling in gaps and existing knowledge through inverse deduction. Secondly, mimicking the human brain, which is associated with the connectionist neural network approach. Thirdly, evolutionary selection, which is associated with techniques that enable the computer to simulate evolution. Fourth is reducing uncertainties through statistics and Bayesian inference. And lastly, making contrasts between old and new sets of information through the use of analogy. The symbolist approach is said to operate on the basis of formal logic and more specifically the premise of inverse deduction. The approach is to think of deductive learning as being the inverse of deduction. Deduction is going from general rules to specific facts. The opposite of that is called induction, which is going from specific facts to general rules. For example, if we can figure out that 2 plus 2 is 4, then we can also fill in the gap in the equation where we know that we have 2 and have to find what we have to add to that to get 4. The system has to ask itself what is the knowledge that is missing and acquire that knowledge through the analysis of existing data sets. A second approach is based upon the network structure of the brain and how the brain learns through encoding patterns within neural networks. This is the neural network approach. An artificial neural network is an interconnected group of nodes akin to the vast network of neurons in the brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another. The network is trained on data so that a specific set of connections between the nodes forms to represent a pattern. The weight of the connections between the nodes is altered with each iteration so that the output to the system better matches the desired output. For example, Google used this approach to train its computers to identify cats in YouTube videos. Much of the breakthroughs in machine learning in recent years have come from this approach and because it's well suited for dealing with big data, we'll be looking more closely at how neural nets and deep learning methods work in the coming modules. Another very important approach is that of trying to simulate the process of evolution. Genetic algorithms work the way evolution does. Through the production of variety, the exposure of those variants to an operating environment and then selection, cross-mixing, duplication and iterating on the whole process. You have a population of individuals, each of which is subscribed by specific characteristics, and then each of these individuals goes out into the worlds and is evaluated based upon its success at a given task. Those that perform well gain a payoff of a higher fitness value and will therefore have a higher chance of being parents of the next generation. Individuals that have performed well cross-mix and random mutation is added to create a new population and the process is iterated on. After some number of generations of this, you actually have some things that are doing non-trivial functions. Indeed, algorithms can learn surprisingly powerful things this way. The Bayesian approach deals with uncertainty through probabilistic inference. Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. You create a hypothesis that there will be some outcomes that are more likely, then update the hypothesis as more data comes in. After some iterations of this, some hypotheses become more likely than others. The Bayesian approach could be used for example in filtering messages. The system has a bag of words to identify certain categories of messages. It then goes through the message and every time it finds more evidence confirming or disconfirming a hypothesis, it adjusts the probability of what category the message might fall into. Bayesian ideas have had a big impact on machine learning in the past 20 years or so because of the flexibility they provide in building structured models of real-world phenomena. Algorithmic advances and increasing computational resources have made it possible to fit rich, highly structured data models which were previously considered intractable. The fifth approach is that of analogy. Analogy is a powerful and fundamental tool that our brains use to categorize new information by comparing it to what we already know to see how closely it resembles other things and thus whether we can place it into or near to a category that we already know. The general method is that of the nearest neighbor principle essentially asking what is the thing closest to and then positioning it indifferent to other things based upon its similarity to them. A popular method here is support vector machine SVM. Given a set of training examples, each marked as belonging to one or the other of two categories, a support vector machine training algorithm built a model that assigns new examples to one category or the other. This approach is at the heart of a lot of outcomes that are extremely successful for some kind of machine learning systems. Support vector machines were probably the most powerful type of learning that was common until recently. Handwritten characters can be recognized using SVM and support vector clustering is often used in industrial applications. Amazons and Netflix recommendation systems are based upon this method of analogy. If someone else has given five stars to something you have and one star to something else that you've also given one star to, then by analogy the system extrapolates out to recommend to you something that that person with similar taste to you has liked. The idea is that each of these approaches has a problem that they can solve better than the others and it has a particular master algorithm that solves that problem. For example, the problem that the symbolists solve that none of the others know how to solve is the problem of learning knowledge that you can compose in many different ways and they learn that knowledge with inverse deduction. Connectionists solve the credit assignment problem through the development of complex networks where individual nodes and connections are adjusted based upon how well they contribute to matching the desired output as we'll discuss in the next video. The evolutionist approach solves the problem of learning structures. The Bayesian approach can deal with uncertainty, the fact that all knowledge that you learn is uncertain and it knows how to update the probabilities of hypotheses to better match the desired outcomes. The analogy approach uses a comparison between things to categorize them based upon similarities or differences.