 Hello everyone, this is Alice Gao. In this video, I'm going to talk about some applications of artificial intelligence. Arguably, the grand goal of artificial intelligence is to build a general intelligence agent. If I were to summarize the state of art of AI, I would say the following. There hasn't been much success towards this grand goal. There are, however, lots of progress in many restricted domains. The best way to get a sense of what artificial intelligence is about is to look at some applications. Let me describe some examples of problems in AI research. This will be a biased introduction to AI since time is so limited. For every example, I would describe the problem, the main result, and give you a high-level summary of the techniques used to solve the problem. Let me start by talking about chess. I grew up with this story and I still remember the glory of the seminal match from my childhood. However, if you are one of my CS486 students at the University of Waterloo, this match was likely before you were born. In 1997, IBM's Deep Blue program played Gary Kasparov, the world champion in chess at the time. Deep Blue defeated Kasparov in a six-game match. This match was the first time that a computer program defeated a world chess champion in the tournament. Deep Blue used several strategies. It performed lookahead search. It made use of a complex evaluation function to evaluate the likely game outcome of a board position. The evaluation function was handcrafted with 8,000 features. Deep Blue also had a large grandmaster game database and it was encouraged to play moves that appeared in the database. Next, let's look at the game of Go, or Wei Qi. This game was invented in China about 2,500 years ago. Two players take turns putting black or white stones on a 19 by 19 grid. The goal is to occupy a larger territory on the board. I remember watching my dad and my grandfather playing this game for hours when I was a child. They also watched TV shows of experts going through a game of Go and explaining the moves. Unfortunately, my understanding of Go still remained at an extremely basic level. Suppose that we want to surround one space on the board. In the middle of the board, surrounding one space requires four stones. If we do this on a side, we need three stones. Finally, if we do this in a corner, we need two stones only. This is why players tend to start by putting stones in the corner and on the sides and only moving to the middle towards the end of the game. Similar to chess, Go is a game of perfect information. Everything is visible on the board. Theoretically, we can solve Go by exhaustive search. Unfortunately, the enormous search space makes exhaustive search infeasible. Try to picture the search tree for Go, where the root is at the top and the leaf nodes are at the bottom. Each node has around 250 child nodes and the depth of the search tree is around 150 levels. This is an enormous search tree. The early Go programs relied on tree search algorithms and achieved strong amateur play only. Around 2015, Google DeepMind achieved a breakthrough. They developed a program called AlphaGo, which defeated top professional Go players. In 2015, AlphaGo defeated Fan Hui, who was a European Go champion for a few years. In 2016, AlphaGo defeated Lisa, though, who is a 9-段 professional player and won 18 international titles. In 2017, AlphaGo beat Ke Jie, who was the number one ranked Go player in the world at the time. You might be wondering now, how does AlphaGo work? AlphaGo was still based on tree search. However, it made use of two deep convolutional neural networks to reduce the effective depth and breadth of the search tree. AlphaGo trained the two networks using supervised learning of human data and reinforcement learning through self-play. The first network is called the policy network, which maps a board position to a probability distribution over actions. AlphaGo uses the policy network to sample actions when simulating a game. The other network is called the value network, which maps a board to a numeric estimate of the game's outcome. AlphaGo uses the value network to predict the game outcome without simulating the game until the end. After the breakthrough was AlphaGo, DeepMind kept working on the game of Go and developed programs that were superior to AlphaGo. AlphaGo Zero no longer made use of human data. It learned to play the game using reinforcement learning only and defeated AlphaGo by 100 games to zero. Alpha Zero used one general-purpose algorithm to learn to play three games well. Chess, Shogi, which is Japanese chess, and Go. This was a significant step towards developing a general gameplay algorithm. If you're interested, I encourage you to check out the papers on the three programs. They're published in Nature and Science, which are the two of the most prestigious journals in the world. The next story is about poker. Poker is challenging to solve for two reasons. First, it is a game of imperfect information. Each player cannot see their opponent's cards. Second, it is not a one-shot game. During a tournament, each player must strategize to maximize their chips over many games. Two research teams have made significant progress towards developing program to play poker. The team led by Professor Boling at the University of Alberta solved heads-up limit Texas Holden. This is a game with two players and there is an upper limit on the bad amount in each round. They modeled poker as an extensive form game and solved for an approximation of the Nash equilibrium using an algorithm called counterfactual regret minimization. They declared that the game is essentially weakly solved, which means that a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution. The results confirmed that the game is a winning game for the dealer. Extensive form game and Nash equilibrium are game-theoretic concepts. If you want to learn more, search them online or consider taking a course on game theory. Around the same time, a team led by Professor Sandholm at Carnegie Mellon University was working on poker with more than two players. For two-player poker game, any player following a Nash equilibrium strategy is guaranteed to not lose in expectation no matter what the opponent does. This claim is no longer true if the game has three or more players. Therefore, the CMU team focused on developing strategies to beat top professional poker players without solving for Nash equilibrium strategies. The CMU team developed a program called Pluribus which defeated elite human professionals in six players no limit Texas Holden poker. Pluribus developed strategies as follows. It first simplified the game representation by eliminating some actions and combining some decision points. Next, Pluribus spent eight days developing a blueprint strategy. It started by playing randomly and improved by learning to beat earlier versions of itself. After playing each action, Pluribus evaluated every other action by using counterfactual reasoning. If I chose this action instead, how much better or worse would I have done? Finally, Pluribus also used real-time search during a game. Be sure to check out the two science articles if you want to learn more. Let's look at some video games. Next story is on Atari 2600 games. Check out the screenshots of the games from the paper. I remember playing Breakout and Space Invaders when I was a child. The goal is to create a single program that is able to play as many Atari 2600 games as possible. Just like a human player, the program learns to play the game solely from the video input, the reward, and the set of possible actions. The program was tested on seven Atari games. It outperformed all previous reinforcement learning algorithms on six of the seven games. It also surpassed a human expert on three games. The program learned to play the Atari games using reinforcement learning, specifically Q-learning. Q-learning relies on a value function to choose an action in each state. The value function estimates the agent's expected reward in the long term by taking the action in the state. The main contribution of this paper was learning a complex value function by training a convolutional neural network. In prior work, the value function was created by using handcrafted features, and this was infeasible for the high-dimensional video input in Atari games. Instead, the program used a convolutional neural network to represent the value function. The convolutional neural network automatically extracted high-level features from the video input, and used them to determine the value. Starcraft is one of the most challenging real-time strategy games, and one of the longest-playing esports of all time. Set in a distant Milky Way galaxy, the game revolves around three intelligent species fighting for dominance. Starcraft emerged as a grand challenge for AI research for several reasons. First, it is a multi-agent problem. Several players compete for influence and resources. Each player controls hundreds of units, which need to collaborate to achieve a common goal. Second, it is a game of imperfect information. Each player can only observe the game via a local camera. Third, the action space is vast and diverse. There are approximately 10 to the power of 26 possible choices at each step. Fourth, the game lasts for tens of thousands of time steps, and the player's strategy must balance short-term payoffs and long-term gains. In 2019, Google DeepMind developed a program called Alpha Star to play Starcraft. Alpha Star played anonymously against human players on battle.net. It achieved grandmaster level for all three Starcraft races and placed above 99.8% of all the ranked human players. How did Alpha Star learn to play Starcraft? Well, Alpha Star trained one agent for each race. First, the agent was trained to predict human play by supervised learning. Next, the agent was trained as part of a league of several types of agents. The main agents in the league focus on learning rapidly through self-play. Other agents in the league try to identify exploits in some agents or systemic weaknesses of the entire league. Each agent's goal was to maximize the win rate against the non-uniform mixture of other agents through reinforcement learning. Check out the paper and the YouTube video for more details. The next story is about Jeopardy. Jeopardy is a popular TV game show. In each game, three contestants compete against each other, trying to understand and answer rich natural language questions very quickly. For each question, the contestants compete for the first chance to answer via a handheld buzzer. What's special about Jeopardy is that the question and the answers are reversed. For example, suppose that the player chose a category AI for 100. The host provides a clue in the form of an answer. This popular TV quiz show is the latest challenge for IBM. The contestants must phrase their response in the form of question. What is Jeopardy? In 2007, IBM research took on the grand challenge of building a computer system that can perform well on open domain question answering. Specifically, a system that can win the game of Jeopardy. In 2011, a system called Watson beat the greatest players of all time. Ken Jennings and Brad Rutter in the two-game Jeopardy match. To succeed at Jeopardy, players must overcome several challenges. First, questions come from a broad domain and use rich and varied natural language expressions. Second, players must answer questions with high precision and high confidence. Third, players must answer the questions very quickly. On average, champion players must correctly answer at least 85% of the questions they buzz in for, and they must buzz in for at least 70% of all the questions. How did Watson become so good at playing Jeopardy? The architecture behind Watson is called DeepQA. Contrary to some popular misconceptions, DeepQA does not simply look up the answer in a database. It makes use of sophisticated natural language processing and search algorithms to answer a question. First, DeepQA tries to understand what the question is asking. Then, it finds some potential answers. DeepQA stores 200 million pages of information, including Wikipedia articles and relational databases. And it is not allowed to access the internet. For each potential answer, DeepQA uses hundreds of algorithms to study the evidence and assigns a score to the answer. Finally, DeepQA generates a ranked list of answers. The Jeopardy Challenge was a means to an end. Subsequently, IBM adapted Watson for healthcare. By 2012, two healthcare organizations had started piloting Watson. For our next application, let's look at self-driving cars. To stimulate the research and development of self-driving cars, DARPA, the Defense Advanced Research Projects Agency, organized three competitions in the last decade. In 2004, the first DARPA Grand Challenge required self-driving cars to navigate an 142-mile course in the Mojave Desert, USA, within 10 hours. Unfortunately, all competing cars failed within the first few miles. The DARPA Grand Challenge was repeated in 2005. This time, the course was 132-mile long and contained flats, dry lakebeds, mountain passes, narrow tunnels, and lots of sharp turns. Four cars completed the route within the time limit. Stanford University's cars Stanley claimed the first place. Two cars from CMU, Sandstorma Highlander, finished in second and third places. In 2007, the DARPA Urban Challenge took place in the former George Air Force Base in California. Cars needed to complete a 60-mile course in a simulated urban environment, interacting with other self-driving and human-driving cars within six hours. The first three places went to CMU's car Boss, Stanford's car Junior, and Virginia Tech's car Odin. So, what does it take to build a self-driving car? A self-driving car consists of two main parts, the perception system and the decision-making system. The perception system tries to understand the environment, whereas the decision-making system determines what the car should do next. The perception system needs to perform tasks such as locating the car in the environment, recognizing static and moving obstacles, detecting lane markings on the road, and understanding traffic signs. Many of these tasks can be accomplished using supervised learning algorithms such as support vector machines and convolutional neural networks. The decision-making system needs to perform tasks such as planning a route, determining what to do next, and avoiding obstacles. These tasks require algorithms for searching and planning. For example, search algorithms such as Dijkstra A-Star, Finite State Machines, and Markov Decision Processes. If you are interested, check out the survey paper for a comprehensive literature review on self-driving cars. There are so many other applications of AI that I don't have time to talk about. Here are some examples, solving partial differential equations, discovering new types of antibiotics, and playing hide-and-seek games. Be sure to follow the links if you want to learn more. That's everything on the applications of artificial intelligence. Which one was your favorite application? Please feel free to let us know by posting on Piazza. Thank you very much for watching. I will see you in the next video. Bye for now.