 So the last thing I want to talk about at least for now is this idea of what I'm calling types of agents In the past few videos, I've used a number of different examples connect for chess checkers Tic-tac-toe a self-driving car a tomato harvesting plant Each one of those is gonna have again a different environment And so we have to map out the different elements of the environment But also we have different kinds of agents because we can have simple reactionary agents or You know a little bit more refined To start we're focusing in on again as you can see from the slide a very simple reflex agent There are Percepts in the environment or that the environment is giving off again You know the the state or the positions of different things in the environment, etc And so that's again being received by our agent via the sensors. That's again Modeling out what we see in the world, but you notice then you've got a very simple condition action rules, you know It's just if statements ladies and gentlemen just Simple if statements just a mountain of if statements because again, what's the world? Well given what the world is Let me look into my giant Table of conditions or it you know my conditional statement and then make my decision off of that And then again actuator update the world In fact, that's actually what you're going to see in sort of lecture exercise one The whole point is again this idea of getting familiar and comfortable with time steps And so again what we're seeing here is oh well, you know if i'm on a dirty tile clean that tile else Look around can I move to the right? Can I move to the left? Can I move down? Or up that's all it's doing again. It's not even considering If those tiles are dirty or not or you know again, it only sees it was on a clean tile Now it's just saying oh if I can move there move me there. That's it Okay, well, you know, you could make it a little smarter. You could change this to uh, you know Uh is right dirty so dirty Right If dirty right then move right if dirty down move down, etc But obviously again, this is a very simple Agent that doesn't really kind of you know give you much of that Intelligence that we're thinking about and that's where we start to get into Modeling and adding again some refinement to our agents more specifically. This is where obviously We're starting to move out of the completely observable environments and maybe we only have uh, you know what we've seen in the past to work off What that means is again Our agent finally starts to store sort of uh, uh, a model of what the world is doing I used an example earlier about how we may have an agent Right here and then it may see that there's a dirty tile beside it And then there's a dirty tile not directly beside it, but it still perceives it. Well, again, what was its action? We said to move Right All right. Well, it moves To the right. Here's our agent. It's on a dirty tile This is tile still dirty And finally we'd said to clean Right there So what I mean by this and as I drew it out and as you can see what I'm saying is All right, my agent moved to the right my agent cleaned that the tile it was on now what? Well, again, that's where each action it was doing. It's sort of refining sort of the what its understanding of the world It maybe makes note notes dirty Tile It knows. Hey, there was a tile that I saw was dirty I'm going to put it in, uh, you know a little stack or queue in my mind and If I ever find myself in a situation where I'm Completely out of spots Let me go back to where I was so I can go clean that tile. So maybe, uh, You know, uh, move Left As it needs to Go back to where it came from again as you can sort of see this is where Still modeling the environment So it's sort of making assumptions about what happens if I make these sort of actions Again, what's going to happen if I do a particular action if our agent had more than just clean and move Again, what's going to happen to the world? But you notice again all that stuff is still boiling back down to if statements because again As we start to think about it Each one of those actions, you know, there's maybe some sort of strategy That it's focusing on on but again, it doesn't really have sort of a goal Again, now we're starting to get into the idea of things like, uh, you know, uh pathfinding Or That's the worst. Oh, I've ever done optimization up to Optimization there we are Again, so what we're dealing with here still sort of Handling and modeling the world modeling The world But you notice that it's sort of doing this what will happen If I do x and what it's specifically looking at is again these types of goals that it's working towards will This get me to my goal Will the action of going up get me closer to in this case, maybe a target location on On my map or my environment or if we're thinking about our agents in a little different perspective Will this get me to a a new Maximum output or minimal output if I'm working off of optimization Guess what? You're gonna see this a lot where we're again still seeing that idea of well given Uh, what state I see the world in If this is a good step, what do I do if this is a bad stuff, which we'll actually see Should I do it anyway? Then we get into sort of Maybe a goal isn't enough and we start to get into utility again this starts to play into that idea of optimization because Not only do I want to have find a goal solution But I want to make sure it's the best goal solution and that's that idea of happiness sort of again when we think about happy Not just A goal But The best Goal Because as we'll see, you know, there are different ways that I could potentially get to a particular target Doesn't mean it was the most efficient Utility based agents is where that starts to come into play. We want to get there in the shortest amount of time we want the you know, uh, our linear programming assignment to, uh, have the best configuration maximize output But one thing to sort of note as you sort of seen by all of these agents and you may be rolling your eyes like None of this is Smart. This is not learning. This isn't the reinforce. There's none of that going on Well, again, that doesn't make it not an intelligent agent But as you can start to see we are starting to model more About what it means to sort of make decisions when we think about the idea of Learning, however That's actually where one of the things you can think about is this idea of what we're calling a critic So if we go super into the future for a second, not like time-wise, but like for, you know, videos If I'm thinking about something like a neural Network, right that oh big fancy word that everyone likes Right, they have a process known as training And the entire idea is oh, well, you know, I give you sort of An opportunity to refine yourself. That's the learning aspect. Well, what happens Uh, let's say if I'm correct or incorrect correct Or Incorrect Again, that's the sensor that the environment's giving off. Oh, no, you misclassified This as a face. It's not a face. You misclassified this as a stop sign. It's not a stop sign All right. Well, the critic is effectively what receives that and how bad of a mistake was it Because that gives us now this idea of feedback. Oh, well, you this was a very bad mis, uh classification So you need to refine your state again If you're thinking about the agent as having a state of the environment and a modeling of the world that same kind of concepts going on it needs to update It's state Because again, if we think about it from an incorrect perspective, oh, well, that's a learning moment But that actually changes sort of its performance because oh, well, you know, you know Again, if we're thinking about this from, uh, uh, updating Updating Again, that changes how it's going to perform next time And so that same kind of concept comes into play again if we're thinking about that from that neural network term of training Again, it's not just one activity. It's or one classification. It does multiple Classifications and so again that idea of the next training iteration time step Is going to have a different performance element again with the actuators And what happens is that's actually where we start to think about that idea of learning again The a the critic is kicking in with that feedback which changes sort of the model the numbers And again, that's updating then its performance as the performance improves or Decreases again We start to continue to modify sort of that learning aspect of our agent And we'll again see this later on but that's where sort of the Learning agent sort of comes from when we start to think about intelligent agents