 Hey, so today I'll be talking about how I built the dinosaur game AI and how you can do the same Now unlike a lot of other people's approaches where they either rebuilt the game from scratch or hacked through the source code to make it Compatible with their AI this game is as true to the real game as possible by actually capturing screenshots of the real game and Learning from that for the more the coolest part of this AI is that it's only around 200 lines of code So let's get started So this AI is actually built on the principles of reinforcement learning Which as the name suggests is a form of AI learning where the AI learns from rewards Given by the game in this game I had a very simple reward function if the AI dies it gets a negative 10 reward and if it lives It gets a plus one reward this AI was designed around something called Q learning where the AI actually learns to estimate the Expected reward of doing each action So in this case we have three actions jump duck or do nothing and if the AI in this situation decides to jump it might learn That the expected reward is plus one while the expected reward of going straight or the no action would be negative 10 Traditionally Q learning is actually done in a tabular manner where each state an action that means position in the game and Potential action is mapped to the expected Q value in a tabular table like sense This tabular learning happens with this learning algorithm However, this method of tabular learning does not really carry you a well to a chrome dinosaur game because we're dealing with images here And each image has billions of potential values and therefore our Q learning table would be too long Instead what we can do is actually borrow techniques from machine learning such as neural nets and deep learning to actually train our AI to Estimate the expected value for each position This is done using something called a convolutional neural net or a CNN as the name suggests Function approximation is just an approximation of the various Q values So therefore it's not technically as accurate as doing a tabular sense But in this case, it's the most practical way of the chrome AI actually learning However function approximation does give us an advantage of being able to abstract if I the AI is learning to be used in other cases Now let's move on to the actual AI implementation The AI is actually divided into three broad categories the agent the environment and the learning loop The agent is the actual AI that controls the dinosaur and is responsible for choosing what actions to take and also learning from those actions Based on the reward it gets The agent is fed its inputs or its images from the environment The environment will actually stack four consecutive frames or images together to allow the agent to learn the speed of the game Since the game does speed up as it progresses The environment is also responsible for providing the agent Reward signal and letting the agent know when the game is over now traditionally an environment is the game itself But in this case since we're using the chrome AI game That's already built in our environment simply needs to control that so in this case Our environment is actually responsible for taking the screenshots Packaging them up into layers of four images and also determining based on pixel values When the game has ended and appropriately giving a reward signal based on that The environment also did a bit of preprocessing on the images in this case by reducing the images resolution Converting it to black and white and also making it a bit more contrasty to make it easier for the agent to learn on And lastly, we have the actual learning loop. This is where the magic happens This is where the AI actually plays the game and learns from its experiences The loop is fairly straightforward and no matter what reinforcement learning architecture use this loop is going to stay pretty much the same The loop starts by resetting the environment from here The agent picks an action to take the environment executes the action and returns the appropriate reward Whether the game is done and also the next state that the AI has ended up in the AI will actually remember the reward in The next state based on the original state action pair using something called the experience replay You can think of the experience replay like the memory of the AI now once the game is finished The AI can actually go back to its experience replay randomly select a couple of experiences and learn based on those experiences And that's it for this video That was a fairly high-level overview of how Q learning function approximation and reinforcement learning works Now I have all my code of for this AI posted on github with much more detailed Annotations and you can always leave your questions down in the comments. Thanks for watching and here's some more AI gameplay