 In this video, we're going to talk about how we can solve problems with reinforcement learning. So machine learning is the ability of machines to learn to perform a certain task. And there are primarily three ways we can have machines learn to do different tasks called machine learning paradigms. And these are supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, we have data, we have a label, and we have a dumb model that learns to map from the input data to the output label. And this is the foundation of classification tasks and regression tasks. Then in unsupervised learning, we have data, but we don't have the labels. And this is primarily used to understand patterns within data, primarily used in clustering and dimensionality reduction. And then we have reinforcement learning. Reinforcement learning is learning what to do that is how to map situations to actions, so as to maximize a numerical reward signal. Now, this definition is great, but it's kind of vague. And the idea now is how exactly do I go about thinking about building a robot that knows how to navigate the world or building some game AI? We can do so with a theoretical framework called a Markov decision process. Let's start with a simple definition. A Markov decision process is a theoretical framework that you can use to solve many problems with reinforcement learning. And here is a visual of that MDP in a Markov decision process, we have two main characters that is an agent and an environment. So the agent will perform some action in an environment and the environment in turn will emit a state and a reward. And this is consumed by the agent and influences the agent's next action and so on. So this is a schematic of the Markov decision process. It is Markov because the next state that is emitted by the environment only depends on the previous state and action. In other words, it's memoryless and past states beyond the current state don't really influence this future state. Decision in Markov decision process refers to the fact that we select an action based on the current state. And this leads to a state transition and the accumulation of rewards. And then we have a process in Markov decision process. Process emphasizes the interaction between the agent and the environment as a sequence of steps. So the agent's decision and the consequences of those decisions unfold as a process where each step is influenced by the current state and action. This setup gives more of a concrete starting point to think about solving problems with reinforcement learning. And so when thinking about this problem and solving it with this mindset of the MDP framework, we need to define the agent environment action space, the state space and the rewards. So let's do that considering a few distinct cases. So let's say in this first case we want to create a robot that can clean the room. And so let's go on defining the components of the Markov decision process. So we have an agent that is the Roomba itself, then we have an environment and this is the room. And the actions that the Roomba can perform are it could be turning one degree turning two degrees turning three degrees turning four degrees until maybe like turning 179 degrees. And then it can also be in the opposite direction of turning negative one degree turning negative two degrees and then up to turning negative 179 degrees. And another action could be just to move forward. Next, we wanted to find what is the state. So a state is a snapshot of the environment. And in this case, it'll be a snapshot of the room itself. So this could be the list of for every single degree, is there an obstacle within six inches within one degree, two degrees, three degrees, is there an obstacle within six inches, it could also be how much charge the Roomba has or how far is the Roomba from the charging port. And the reward, this could be the volume of dust that is collected in the container. And the idea here is that the more dust that is in the container, that means more the room has been cleaned. And it goes in tandem to achieve the goal that we set out for this Roomba. And now that we've defined all of these components, it is a matter of just now collecting data and coding things out from here. Now it's important to note here that we don't really need to think in the realm of AI. The Markov decision process can go much beyond this. For example, consider a case to where a human is riding a bike and the goal here is to learn how to ride a bike. The agent is the human, the environment is the bike which is on a road. The actions could be to pedal, to brake, lean left, lean right, turn left, or turn right, or continue straight. Now the state is a snapshot of the human on the bike on the road. Is there an obstacle 12 feet forward or immediately to the left and immediately to the right? It could also be the direction of our destination and next reward. So this could be zero for every time step that the bike is straight up. We can get a negative 50 reward for if the bike slants a lot or a negative 100 reward if we crash or fall down. And all of this will incentivize our agent to well be upwards so that it can maximize the reward. Now another important thing to note here is that this is just one way to define a solution to how to ride a bike. It's not the be all end all and an extreme example would have framed it kind of like this where the agent could be the brain and the environment is the body which is on the bike. The actions are send signal to the left hand, send signal to the right hand, send signal to the left leg, send signal to the right leg. And the state which is the snapshot of the body could be how elevated are the legs and the hands. Are there any tensions that exist in the muscles? How clear is our vision? And the reward would be what is the absolute difference between the current heart rate and the acceptable heart rate for riding a bike. So clearly it's very fluid on how we define what an agent and environment is even for the same problem. Now once we have defined these it makes it much easier to understand what strategy we need to use to code this out and answer questions like what architecture should the agent be. We have discrete actions so what is the best for this? But like I mentioned before before building all of this out we need to phrase the problem with this framework and the rest will fall into place. So that's going to do it for this short video. Thank you all so much for watching and subscribe for more content on artificial intelligence and machine learning. I'll see you very soon. Bye-bye.