 Well thanks for inviting me to the TISIC talks and I'm very happy to talk about some of the recent findings on challenges of active perception. So what is active perception, let me start with an example. So multi sensor networks are everywhere nowadays. If we walk in a public space, it's common to assume that we are being monitored by a multi camera surveillance system. A key challenge in design of such system is efficient allocation of scarce resources, for example computational power. So in many cases it is not possible to process the signal collected by all the sensors, or in case of multi camera systems to apply sophisticated object detection algorithms on images collected from all the camera. In fact, in many cases, even if there is only a single camera, as I show on the image on the right, even then it is not possible to apply object detection algorithm on all locations that are covered in the image captured by this one single camera. So this resource allocation gives rise to sensor selection problem where an agent must select a subset of sensors that it should allocate the scarce resources to. Sensor selection is an example of an active perception test, where an agent must take actions to reduce its uncertainty about its environment, while reasoning about its limitation and the limitations imposed on it by the environment. And in general active perception and compass a broad spectrum of concepts with multiple applications such as visual attention, control of memory, active sensing, sensor selection, or even question answering system where an agent must ask a series of questions and then get answers to that and then push those answers together to come up to a final conclusion. So what are the common characteristics or challenges that are associated with this task. So first, this is a sequential decision making task where the agent is for the final aim of the agent is to compute a policy which is a sequence of action. The agent should be able to model the stochasticity and partial of the observability in the environment that it is acting. Furthermore, it should be able to reason long term, that is it should be able to reason about the consequences of the action that it takes right now in the future. In order to compare between what might be a good action and whatnot, it should be able to associate an objective value to its estimate of uncertainty or information. Finally, in many cases the action space of the agent is in itself is a subset of available set of sensors so the agent must be able to deal with a combinatorial action space where if the number of sensors increases, the number of options that the agent has to select from also increases combinatorial. And finally, it may be desired that the agent must learn from its past mistakes. So maybe the agent should also have the capability of learning to perceive activity. We address these challenges using at least a combination of four different approaches, decision theoretic planning, information theory, submodularity and reinforcement, the few of which I'm going to talk about in the next few slides. So how do we model the world in order to capture the stochasticity and partial of the way. Partially observable mark of decision process provide a decision theoretic framework producer. In a POMDP model of the world, an agent maintains a belief at every time step about the world that it is trying to observe. And depending on this belief, the agent takes an action and the world keeps evolving in according to a particular dynamics. As the agent takes an action, it receives a partial observation about what is going on in the world. And using this information, it maintains a belief or it update its belief about what is currently going on in the world. In general, the observation is correlated with the true state of the world. The idea is using this model of the world. The agent can compute long term policies that tells the agent what might be informative action and what might not be. Or in general, the agent can use this model of the world to compute a policy that can maximize a long term notion of the world. In case of active perception, this reward is defined as the information gain of an agent, which is an objective way to measure information that has its roots in the information theory. However, we also talked about another challenge where the action space of the agent itself is combinatorial. So while the agent can select a sequence of action, each action in itself can be a tedious task in itself. So how do we tackle that? So we tackle that by exploiting the property of submodularity. Subsubmodularity formalizes the notion of diminishing returns. What it means, loosely speaking, is that as I add more and more element to a set, the value of a function of that set, it starts to decrease as more and more elements are added. And maybe I can explain that better with this example image on the right. So here I have a smaller subset of the yellow sensor. And I have a bigger subset of yellow and purple sensor. And to both of them I add the blue sensor. The increase caused by the addition of the blue sensor to the smaller subset here is clearly greater than the increase caused in the area covered by the addition of the blue sensor to the bigger sensor due to the overlap. And this basically is an illustration of the property of submodularity. The reason why this property is useful or helpful is because if a function is submodular, then we can use greedy maximization instead of an exact maximization and greedy maximization is computationally very cheaper than a full maximization. Now fortunately for us, information gain is submodular. We can directly apply submodularity and a planning method based on POMDPs to compute upon this. Just as a side note, submodularity in general is a very useful property. And it has applications in text data summarization. And in fact, recently, a couple of my students used it to do grocery list planning for their master thesis. In the example of algorithm we developed using the POMDP formulation and submodularity and here's a small demo of this algorithm. So what you see here is that the agent is trying to select 40 out of 5000 pixel boxes that are possible in this image, and it is trying to do so in really quick time, but also to track the five blue dots, which actually represent people in this scene. It does so so that the red boxes track the people but also so that it also looks for new people that may appear in this. So as you can see the selections are made really quickly and the agent is able to actually track these people. Quantitatively speaking we showed that we can actually maintain 80% of 10% of the resources. So here selecting 40 items out of 5000 is already a really big number and we were able to achieve it in less than millisecond. So on the x axis you see the time taken to compute a solution and on y axis you see the total performance of this algorithm and the flat line saying BD performance means brute force detection, which means the performance of the algorithm that would have resulted if we had applied the detector on all possible locations in this image. And the blue dots here are the party maxed algorithm that I represented and ideally we want the blue dots to be as far left. Left top as possible. Okay, so in that case the final challenge that I talked about, which is learning. So recently we also proposed a deep RL based method for solving the form that you formulated. One is called as deep anticipatory network and our idea here is to use deep reinforcement learning methods to directly learn what might be a good or bad policy without having to model the word. So here deep anticipatory networks or then consists of two neural network one is called as Q network and the other is called as M network. The role of the Q network is to take actions and collect operation. So for example, it can select a camera and then the observation from the camera is collected. This observation and action is passed to the M network, whose job is to actually predict what is going on in the world. And as time goes on the Q network collects more and more observations and M network is trained to make its prediction better and better. And if the M network correctly predicts what is going on, then the Q network and turn is rewarded. And ideally, both of them are actually trying to accomplish joint tasks, and they are able to actually help each other or improve each other's thinking. And we finally saw that the Q agent or the Q network here is able to learn to select observations that are most useful or most relevant for the network to predict the hidden state of. We applied this algorithm on a multi person tracking system. And here in green we show our algorithm which is Dan, and we compare it to coverage and random selection. And in this case, these both baselines are quite strong. So actually random in this case would correspond to any machine learning system that we see nowadays, where a data sample randomly or I ID. So that is already a very strong baseline and we show that Dan actually outperforms them by a bit. Also, we compare it to a coverage based baseline. The idea behind which is to try to cover as many people as possible for select cameras that cover as many people as possible. And why it does not necessarily perform the best is because in many cases we don't even need to have an observation to find out where people are. And therefore a very long time, it's better to rather look at a more dynamic place where people where there is a constant crowd of moving. And then is able to exploit such insights. Finally, we also applied on an MNIST object detection task and here on the right I show the glimpses that are selected by the algorithm in order to predict what is the underlying in two separate instances. And you can already see that the algorithm it starts to select glimpses that turn out to be more informative so that it can predict what the underlying digit. So I would like to draw conclusion to my to my talk, what I tried to present was introduction to active perception and the challenges associated with it. I also tried to present a planning and learning algorithm that addresses some of those challenges. However, I also think that it raises more questions than answers. I presented result on sensor selection task, but what about navigation, what if we have a mobile robot and what if having that mobility while it's the property of some modularity. So how can we answer questions like those. How can an agent help. How can an agent help an agent with active perception and can learning to navigate or knowing better navigation help and help an agent perceive its environment more but that's again something that introduces a cyclic structure and a question that I haven't answered particularly in this talk, but there are answers to this question in a specific setting is be nice to know if there are general answers to this. What can we say about reinforcement learning does active perception present a harder or easier set of problems for RL and what about the simple efficiency. So I would like to draw into my talk here with this question.