 Hello, my name is Brandon Rohr, I'm a senior data scientist at Microsoft and I'm going to teach you how to turn your house into a robot. I'm going to talk about an adaptive learning algorithm for the Internet of Things. So picture a house that did a lot of things on its own, a robot house. It could save you money by automatically adjusting the temperature and humidity to keep you comfortable, but doing it to save energy costs. It could save you electricity and gas. Imagine a house where the blinds automatically adjust with changes in the weather to keep the house full of natural light, but without losing any more heat than necessary. And it would save water. Imagine faucets and plumbing that give water at the right flow rates and the right temperature, whether you're bathing or showering or washing a dish or filling a glass, and a robot house would save you time. Picture e-notifications that would keep you up to date on the need for routine maintenance. When things break, it could schedule repairs automatically, and a robot house could keep you and your loved ones safe. You're at work and you get a text with an image when an unfamiliar postal worker slides a package through your mail slot. Or if your cat knocks a crystal vase onto the tile floor. So what makes a robot? A house is not going to stand up and walk around the neighborhood. But it does, like a robot, have sensors and actuators, a world, and can be given a brain. Now a traditional robot has a variety of sensors. You can sense joint position, or you can sense wheel position. If it's a rolling robot, determine how far you've gone. You can have cameras, proximity detectors. You can tell how close you are to walls or obstacles. Contact sensors. You can tell if it's touching anything. Pressure sensors. You can tell how hard it's touching things. Laser scanners. You can tell how far away things are very accurately. Now a house also has sensors. It can measure a lot of things. But they tend to be different types of things. Thermometers measure air temperature. Security cameras can take pictures in rooms. Smoke detectors, of course. Microphones can measure sound. Window and door position. And latched-unlatched sensors. And the flow of electricity, natural gas, and water. All things that you can measure in a house and often are measured. Actuators. Ways to move things or take actions. In traditional robots, these include joint flexors, drives that spin wheels, grippers that grasp and release things, or speakers or monitors for showing a visual display or playing sound to interact with a human. Similarly in a house, there are other types of actions that can be taken. Heaters and coolers can be switched on and off. Light switches can be switched. Blinds can be opened and closed, raised and lowered. Cameras can pan and tilt, zoom. Fire alarms can be turned on. Emails and texts can be sent. Locks can be locked and unlocked. All things that a house can do. Actions that it can take. The brain is what connects these two. Now traditionally in a robot, this brain is a computer. And it uses rules that can be simple as in, if the temperature is lower than 70 degrees, turn on the heater. If it's higher than 70 degrees, turn off the heater. There can also use fancier rules or recipes called algorithms to learn appropriate behavior. And that's what we're going to talk about here. The same is true of a house. It can learn the right thing to do, digging deeper into this idea of learning. Typically at this point in the conversation, sensors and actuators are connected by a magic black box called machine learning. And unless it's inspected a little more closely, it's natural to assume that this magic black box can do anything. Can do anything that a person can do. In reality, that's not yet the case. Despite its many notable achievements, machine learning still can really only answer five questions. It's worthwhile to take a minute to walk through these. The first of these questions is how much or how many? For instance, if you'd like to predict what the temperature will be next Tuesday because you're making some plans, how much or how many is a reasonable question to ask? Or if you're running a business and you want to know what your sales are going to be in a certain region next week, that's another question that can be answered with a number. In the context of a house, if you want to know what the temperature is likely to be in two hours, or you want to know how many people are likely to enter between the hours of 1 and 4 a.m., that's another question that can be answered with a number. Machine learning is pretty good at answering these, especially if you have lots and lots of previous examples. And the family of algorithms that do this are called regression algorithms. The second category of questions are, answer the question, which category? So if you have many things and you want to know whether it's true or false, a yes or a no, an A or a B, is this an image of a cat or a dog? A classification algorithm will help you do this. And it can decide between two classes or between many classes. You can imagine an aircraft detection algorithm that takes radar signatures in and assigns it to one of the many known types of aircraft. This is also a categorization or a classification. And there's a popular set of algorithms that do this as well. The third type of question that machine learning can answer is, which groups do these data points fall into naturally? This is like being handed a bag full of M&Ms and being asked to break them into several different groups. There's no one right way to do it. You could do it by size. You could do it by deviation from perfect circularity. Most of us would probably go through and sort them by color. But there's lots of fine ways to do this. Why this is useful is when things are sorted into groups, then what you learn about one member of the group you can infer about the others. So answering the question, which shoppers have similar tastes in produce helps you to learn about some members of the group and apply that learning to the rest. If you are a consumer of online streaming video services, you've probably been subjected to one of these algorithms, these video services like to make recommendations to you. One of the ways to do this is to group their viewers and look at movies that other people in your group have seen and rated highly that you have not. These algorithms are called clustering algorithms and they're often used for recommendation. The fourth type of question that machine learning can answer is, is this weird? On its surface, it's not an obviously relevant question. But if you picture an automobile with tire pressure sensors, any deviation from predicted tire pressure can be a sign of a problem. If the pressure is too high, if it's too low, if it's too high for that temperature, if it's changing too rapidly, all of these things may not necessarily be problems, but they indicate that something is going on that might need attention. This is also used in internet security. If an internet message is identified as atypical, it might be an attempt to defeat the security of the system. This is an algorithm I've personally benefited from. My credit card company noticed that several purchases were made of a large amount, multiple of the same item, which is unusual. From a location I've never purchased before, from a to a vendor I've never purchased before, and flagged this as weird. Based on that, they immediately froze the activity of my card, reported it to me, and I was able to correct the fraud before it was perpetrated too far. This family of algorithms are called anomaly detection. Finally, and this will be most relevant for us today, algorithms that choose actions. So, in a house, for instance, should I raise or lower the temperature? If I'm a robot vacuum, should I visit the living room and clean it again, or stay plugged into my charging station? If I'm the Mars rover, should I turn right? Should I turn left? Should I move forward? These are all decisions that, when made, they change how the system, how the decider, interacts with the world around them. This is different than other machine learning algorithms, and we'll see why in a second why these are so important for use in a robot house. Most commonly, this family of algorithms are called reinforcement learning. They mimic the psychological theory of reinforcement learning where rewards and punishments shape the decisions we make. I'll show you how this is done. So, if we want to take a house with sensors and actuators and make it smart, we want to give it a brain, we take a reinforcement learning algorithm and we plug it into the sensors and actuators, put it between and press go. Now, there are a number of challenges that actually make this very hard to do out of the box. The first is that nearly all machine learning algorithms assume that the world doesn't change. If you have a robot vacuum cleaner that makes a very accurate map of your house so that it can clean it better and then you rearrange your furniture, you break the algorithm, and it has to start over. The second challenge is related. Reinforcement learning algorithms don't handle changes in sensors and actuators. For instance, let's say you have an outlet that's controlled by a light switch and you plug a lamp into it. So, each time the switch is flipped, a light turns on or off. Now, after spending some time learning that, you take and unplug the light and plug in a space heater. When the switch is flipped, the heater turns on or off. It has a completely different effect. You've changed out the actuator. You've changed what that does. This would break most reinforcement learning algorithms and they would not be able to very rapidly account for this. Another thing that is hard for reinforcement learning algorithms to do is to handle changing goals. For instance, you live in a house. You keep the house on average at about 69 degrees as a base temperature. Then, you bring home a newborn baby and decide that having the house at 73 degrees, keeping it nice and toasty would be the best thing to do. A reinforcement learning algorithm that was trained to keep it at 69 would not be able to adapt to that very quickly. And it would fight to keep the house warmer. So, the first three challenges all deal with resistance to change and the inability to adapt very quickly. The next two follow a similar theme, too. Most machine learning algorithms take a lot of time to learn. And by the time they've learned, it's too late. Imagine a house, a reinforcement learning algorithm may typically take 10,000 samples or data points to learn what to do. If, in the case of the house, one data point is one 24-hour cycle, 10,000 of those becomes 27 years. By that time, the family living there moved out, renters moved in, they've moved out, and the house has gone through several owners, and it's much too late to learn the original pattern. The final challenge is related. Most reinforcement learning algorithms don't scale well. What this means is I start with the house and then I increase the number of sensors. I go and install security cameras and door and window sensors all through the house and double the number of sensors I have. Now there's different ways that this can affect the training time. Let's say initially it took 1,000 data points to train, 1,000 days to train. A linear algorithm, if you double the number of sensors, doubles the training time. So that would become 2,000 days. It's a cost, but it might be reasonable. Now the next step is a polynomial algorithm. Which, instead of taking the factor of 2 that the sensors were increased by, it takes that factor of 2 and raises it to a power. Say squared, second power, or cubed, the third power. If it's a polynomial algorithm, a cubic algorithm, then doubling the number of sensors will make it take 8 times, so 8,000 days to learn. That's starting to get very long, but it gets worse. Many reinforcement learning algorithms are exponential. What that means is your factor of 2 that you increase the sensors by becomes the exponent on your training time. So 1,000 raised to the power of 2 is a million, or 1,000 times the training time. Which is obviously way too long to do anything useful. Now I wouldn't have brought all these up if I didn't have a solution. So to answer all of these challenges, I propose a new reinforcement learning algorithm. I would like to underscore that this is a prototype, it's a toy algorithm. To illustrate the types of things that can be done, the feasibility. So it's not something to go out and plug into a real house tomorrow. But it does address the problems I just mentioned. So if you look at the picture here, this is what reinforcement learning looks like. There's a world. There are sensors that measure parts of that world and report it to an agent, a decision maker, an algorithm in our case. And that agent takes actions. It does things. It decides. It turns things on and off, spins things around. Which affects the world? Which affects what the sensors read and the cycle continues. There's also a reward. Depending on the state of the world, the agent perceives a certain reward. So in the case of a house, perhaps the reward is set to be maximum when the temperature is at 70 degrees. The agent then, a reinforcement learning agent will take actions to drive the sensors to read 70 degrees. And that's what reinforcement learning does. So let's consider this example of a smart thermostat. So the sensor is a thermometer and the actuator is a heater switch. It can turn the heater on or off. The world consists of the heater itself, the ducts, the rooms, the natural gas supply line, the filters, the walls, the weather, the downed comforter that burst and put feathers in the air that clogs the filter vent, and everything else. And especially the people. We're going to assume that going through this cycle takes 10 minutes. Each 10 minutes, there's a new signal to the agent. And the agent makes a new decision. And the brain will be our new reinforcement learning algorithm. So to step once through this loop, measure the temperature at 68 degrees, and the agent has to make a decision, so arbitrarily chooses to turn the heater on for 10 minutes. And then, after that, the temperature is measured at 70 degrees. Once through this cycle. So we take that information and put it into a table. So we've measured the sensor value before, the action that was taken, and then the sensor value the next time around. We also add a column that shows the number of times that we've observed this. So here we can see this has just happened once. We'll go ahead and add another column that shows the number of times this has had the opportunity to occur. So for instance, each time the temperature is 60 degrees, and the heater is turned on, there are any number of outcomes that have the opportunity to occur. Temperature could end up being 68, 69, 71 degrees. But only one of them actually occur. So they all have the opportunity, but there's only one observation for going through this loop. We can also add to that an estimate of the probability that this will happen in the future. So because we have one observation and one opportunity of 68 degrees, heater on, 70 degrees, we'll say 100% of the time we've tried, this has happened. With the others, 0% of the time this has happened. Obviously a very crude probability estimate, but correct in the frequentist sense. And we can add to our table to account for what if we had left the heater off. Again, there are lots of possible outcomes. So far, none of these have had the opportunity to occur, and there have been no observations. So let's go through the loop again. At a later time, the thermometer again measures 68 degrees, and the agent arbitrarily decides to turn the heater off for the next 10 minutes. After that time, the temperature is measured at 65 degrees. We can go back to our table, find the appropriate row, 68 degrees, heater off, 65 degrees, we can increment the observations to 1, and then we can increase the opportunities by 1 for all of the heater off sequences. The probability reflects this. We estimate 100% chance of ending at 65 degrees if we leave the heater off at 68 degrees, and then we go through the cycle again. We go through it again and again for a very long time, and say after a while we've built up, we've had lots of opportunities at 68 degrees to turn the heater on and turn the heater off and to see what the result was, and these get compiled in the table like this. You'll see the total number of opportunities for leaving the heater on are at 385. Most of the observations, the majority fall at 70 degrees, but some at 69, some at 71, and some outside of that. And the probabilities reflect that. 51% probability that it'll end at 70 degrees and then reduce probabilities of other outcomes. Similarly, starting with 68 degrees and leaving the heater off, we see that out of 373 opportunities, 158, a plurality of those occurred at 67 degrees. And then there were somewhat lesser probabilities for the other outcomes. This is important. This is a model. What we see here is an estimate of given a state and an action what's likely to occur. This lets us do a couple of things. If we know a state and we know what action we took, we know what outcome to expect. We can also work it backward. If we know our state and we know where we want to end up, we can take the action that's most likely to get us there. Now, we've been neglecting the reward for now, but let's bring that in. And let's say that this agent gets a reward every time the temperature is 70 degrees. Right now the thermometer reads 68. So the agent, the reinforcement learning algorithm has to decide whether to turn the heater on or off. Now, the way it does this is it goes to its model and it says, well, I'm at 68 degrees. Let me identify all of the outcomes that end in 70 degrees. All of the rows that have an outcome of 70 degrees. And we see that both leaving the heater on or turning it off might end at 70 degrees. So to choose between them, we look at the relative probabilities of each of these and we see that leaving the turning the heater on actually has a much higher probability. 51% probability more than 50 times the probability of leaving the heater off. It's a much better choice. So the algorithm chooses to turn the heater on. So this has been a simple illustration of this algorithm. It's part of a bigger project called BECCA, which is intended to be a general purpose brain for houses and robots and other things. If you're interested in diving into the technical details, you are more than welcome. The Python code and the documentation is available at this website. It is released free under a permissive open source license. You are free to reuse it however you want. Also, feel free to ask me questions about it later. As a side note, this is not a Microsoft sponsored product or project or activity. As I mentioned, it's just an example to show this process of turning something with sensors and actuators into a house into a robot and it illustrates the fact that these challenges in traditional machine learning approaches are addressable because it's an incremental algorithm. That means it learns a little bit each time through the cycle. It starts even with no data at all starts learning and it never stops. It's meant to mimic the learning of a puppy or a human child. It doesn't require any ongoing human participation. Once it's set up, it will learn from all of its decisions. It's good ones and actually even more informative. It constantly adapts to changes. I haven't highlighted how but this algorithm handles changes in the world, changes in reward, changes in sensors and actuators on an ongoing basis and it learns the best decisions to make or good decisions to make at least in any given situation. Now I'd like to include a caveat here. For those of you looking at the table before, you'll notice that even for one sensor for one initial degree setting for just two actions we had a whole page full of rows. Imagine if we included starting points of not just 68 degrees but all of the possible temperatures that we might start from in a house. Multiply that out by not just one thermometer but all of the sensors in the house which as we saw on a previous slide there can be many of them and many of them much more complex than the thermometer. Now, multiply that square that essentially to get the number of rows for all of those inputs and all of those outputs. For every sensor input we have to look at the probability then of every possible sensor output occurring and multiply that again by the number of actions that you can possibly take across all of your actuators. The number of rows gets very large very quickly. This is a real issue but it's still fast. It still works and here's why. It gets big but it's polynomial not exponential. We're taking the number of sensors in actuators and multiplying it by itself a couple of times. So it looks polynomial. We're not raising anything to the power of the number of sensors or actuators. So we're not subject to the curse of dimensionality in that way. Also, and this is a critical difference we're talking about storage about disk space not about processing time. The number of rows can be stored on huge hard disks and in fact in database form there are some pretty efficient ways to reach in and pull that information out if we know what we're looking for. The processing is very quick. The processing is also very simple which makes it even quicker. The computation is mostly comparisons greater than less than equal to and some simple arithmetic. Now, if you're an architect and you're thinking about how you build such a thing this is suited really well for sitting in a large database. Most of these rows will be either empty or nearly empty. They'll be very rare and so can be compressed or even neglected altogether. It's also suited very well for cloud processing. We're having a lot of storage and having it cheaply accessible is a bonus and it's particularly well suited to a data lake setup where your data is stored and the computation actually takes place at the point of storage. It doesn't have to get pulled and transmitted anywhere. Now, if you happen to be looking for ways to implement this, I happen to know a very reputable vendor of world-class products in each of these areas which I would be more than happy to connect you with after the talk. So, just to show you a bit of an example here I want to show you Rixel the Robot chasing a ball. So, Rixel's a robot puppy and his job is to catch that ball. What that means is that his brain is a reinforcement learning algorithm and he gets a reward every time he hits that ball right between his eyes. When he does, another ball appears somewhere else. He starts off life knowing nothing about what or who he is, what he's sensing, how he moves, what his world looks like but he starts experimenting. He starts filling in a table of before movement sensor, action, sensor rows and he starts to learn what states result from what decisions and more importantly he starts to learn which of those tend to result in reward. The longer he runs the longer he trains the better he gets at getting the ball and you can see that as he matures he's able to move toward the ball more directly not perfectly but more smoothly with fewer wasted motions and fewer collisions. This is similar to how over time a robot house would learn to do things that result in greater reward and as a resident of the house would give the house a thumbs up either explicitly or just an atty boy for doing things that were desired it would get better and better at doing those things. So of everything that I've said notice that it all applies to yes houses but other things as well so imagine not just a robot house but an entire neighborhood with coordinated traffic lights street lights security systems infrastructure or an office complex each would have its own unique sets of sensors and actuators each would have its own unique reward that it needed to optimize but a brain that could learn to piece all those together would be able to serve both of them equally well. Code traffic, air traffic could be helped out also their applications in self-driving cars or in fleets of cars coordinating and caravanning those. In consumer industries, entertainment and advertisement there's a lot to be learned and it's tough to write rules for those because people that are involved are so hard to predict. In the cyber realm security constantly changing the need for adaptation is huge there but also the ability to interact in social networks and internet trends in infrastructure helping to control and moderate natural gas electricity, water, petroleum and disaster response and in production industries assembly line manufacturing and agriculture can also turn into robots. Robots can save money, natural resources time and even lives when implemented well and carefully and we're very close to being able to do this. We have the sensors we have the actuators and we're just missing the brain. Thanks for your attention I'd love to continue the conversation feel free to contact me on email look me up on LinkedIn, Twitter you can check out publications and examples that I've posted at github.io and if you'd like to check out the source code for the algorithm I showed the link is there as well. Thank you.