 Hello everyone, this is Alice Gao. In this video, I will discuss some properties of the Q learning algorithm Thinking about Q learning, the first thing that comes to your mind. Maybe our version of Q learning learns Q of SA instead of V of S This distinction is not important It's straightforward to write an equivalent Q learning algorithm for learning the V values instead Q learning is a model-free algorithm since it does not require us to learn the transition probabilities. In contrast ADP is a model-based algorithm Because of this, Q learning requires much simpler computation than ADP Q learning is not guaranteed to converge to the optimum Q values However, if the agent explores enough then Q learning learns an approximation of the optimum Q values How good is this approximation? Can we improve it? We can improve the convergence by adjusting the learning rate alpha The smaller alpha is, the closer it will converge to the optimum Q values, but the slower it will converge This makes intuitive sense. If alpha is small, the magnitude of each update is small We would be adjusting the Q values very slowly and very cautiously until they converge Let's compare ADP and Q learning. They're both reinforcement learning algorithms, but they're different in some significant ways Number one, does the algorithm require the agent to learn the transition probabilities? ADP is model-based and requires the agent to learn the transition probabilities Q learning is model-free and does not need to learn the transition probabilities Number two, how much computation is performed per experience? Comparing the two, ADP requires more computation per experience After receiving every experience, ADP tries to maintain the consistency in the utility values by adjusting them using the Bellman equations On the other hand, Q learning performs a simple update based on the observed transition only It does not try to keep the Q values consistent between neighboring states. As a result, Q learning requires less memory and computation time Number three, how fast does the algorithm learn? ADP typically converges faster than Q learning Q learning learns slower and shows much higher variability In general, a model-based algorithm like ADP is more efficient in terms of experience They require fewer experiences to learn well That's everything on the properties of the Q learning algorithm. Let me summarize After watching this video, you should be able to do the following Describe some properties of Q learning Explain some differences between ADP and Q learning Thank you very much for watching. I will see you in the next video. Bye for now