 Hello, welcome to our presentation that is going to be focused on optimization of wireless networks by means of reinforcement learning. We are going to start by providing a technology overview and then we're going to focus on the resource that we have achieved when applying this technology in the field. But before that, let us say a few words about us. My name is Juan Ramiro and I work for Ericsson. Inside Ericsson I work for Business Area Management Services, Automation and Artificial Intelligence. And within this team I'm heading up an innovation unit that is focused on network design and optimization. Hello, my name is Jose Autes. I also work for Ericsson in the design unit as Juan. I'm contributing to his team as a research specialist. Okay, so let's start by providing an outline of our talk today. First we're going to set up the context, basically explaining what we mean when we talk about network optimization. Then we're going to review very quickly the basics of reinforcement learning and then we're going to contextualize that within the scope of network optimization. And then at the end we are going to show a real case study in which we have applied this in a live network and we're going to show the conclusions and the learnings we have obtained from this experience. Okay, so what do we mean when we say network design and optimization? Basically we are talking about these four stages that you can see in the slide. First we have network planning in which basically we forecast the traffic and then we tell the operator where and when investments are required in order to expand the capacity of the system so that we can cope with all the expected traffic with the desired quality. Then there is design. In design ones that you know the areas in which you have to carry out an expansion basically you determine where exactly that new equipment needs to be installed and which configuration should it have where it should be the antennas pointing at and so on. Then once you roll out the site or the infrastructure or whatever comes the tuning part in which basically before launching that equipment before making it available to the live traffic you have to adjust a little bit the different settings so that at the end you're providing the expected quality of experience and then at the end once you have launched the system and you have it running with live traffic you have optimization which is basically tuning the different parameters and settings that you have in the current infrastructure so that you can make the most out of it and you can maximize the quality of experience and the capacity with the current level of investment that you have deployed in your system. So as I said before we're going to be focusing on reinforcement learning which is basically a technology that leverages learnings from behavioral psychology and teaches a system how to interact with an environment by basically interacting with it and learning from the outcome. It is basically the same way that we believe that for example children are learning how to work and at the end based on every interaction you take some learnings with a long term goal in mind so that the way that you decide how to modify your decision logic is always guided by the maximization of the long term reward that you are going to get. So here you can see in the slide you have the agents which will be the decision making entity and you have the environment which can be anything in our case it will be a wireless network so the way it works is the following out of an environment you measure the state basically how the environment is behaving in our case it will be the performance indicators of the network then based on the current knowledge of the agent and this input which is the state the agent will make a decision about an interaction to be executed upon the network so we typically refer to this as the action then the action will modify something in the system and as a consequence there will be a reward which is basically the outcome of what happens when you implement that action in the system has the system improved has it degraded and it's always measured within the context of your long term goal so if your long term goal is to improve capacity the reward will be telling you to which extent that change has impacted capacity now in the agent once that you know what is the state that you measured the action that you took and the consequence of that action you are able to extract sound knowledge out of that in order to find your decision logic so this is the learning step that happens in every interaction cycle and then once you have done this you continue doing it further and further until you have fully learned how to interact with the system so at the end of this process the knowledge that you have accumulated in the system is very small and therefore you will have some kind of erratic behavior so let's look at it within the context of learning how to interact with a video game so this is the typical a bricks video video game and in this context we have the video game here then we have the agent and then we have the state which is basically what we see happening in the video game basically we're going to look at this by looking at the screen that you see in the video game then based off the available knowledge at this moment you are going to take an action which is basically how you move your character in the video game and that is going to generate a reward which is the consequence of your action which is typically measured here as the amount of points that you have in the video game the amount of lives that you have left and so on and so forth by the way this is extracted from a publication from from DeepMind who have been working a lot on different learning things so let's look at how this behaves in reality so this is the video game and as more episodes go by you learn more and more so you can see that at the beginning the proficiency of the engine to manipulate the video game is limited and very often you lose very soon but then you get more and more training cycles and you can see how the agent is able to play better and better every time so let's see how this thing is evolving you can see that now it takes longer until you drop the ball at them as we will see in a few seconds when you accumulate more and more learning you end up coming up with sophisticated strategies that allow for example to open these tunnels that will send the ball to the top of the screen so that you are really destroying a lot of bricks over there so you can see what is happening now so at the end you will see how all these strategies that are developed through interaction with the system are really able to outperform whatever a human expert can achieve with this but of course as you have seen at the beginning and this technique has a major drawback which is that at the beginning since you have not accumulated any knowledge about how to interact with the system you will do things that are erratic so this erratic behavior does not really matter when you are interacting with a video game but if you are interacting with the settings of a wireless network that is not acceptable so we need to find a way to overcome that challenge so let's talk a little bit further about this so in this slide you have in the vertical axis you have the optimization capability which we have also described before us what is the knowledge that you have accumulated in the system about how to behave in order to achieve your goals and then in the horizontal axis you have the different learning cycles so as time goes by you move to the right so if you have an expert system which is the typical sound systems based on rules there you have just encoded the knowledge that is stored in the mind of an expert optimizer and you have translated that into a set of rules so at the beginning you already have a very proficient agent because you don't need to accumulate knowledge through interactions basically that knowledge was already there because the expert wrote it in a program so you have a very good optimization capability from the beginning but then you don't improve that over time because basically you're always applying the same knowledge again and again on the other side if you look at the reinforcement learning curve which is the blue one here at the beginning you know nothing about how to interact with the system so your optimization capability is very low and basically you are just doing almost random things in order to explore the system and start learning what are the fundamental tradeoffs that you need to consider but as time goes by you start learning from the consequence of your action and then the optimization capability grows and grows over time and there is a moment in which basically this tailored learning without any previous conception of how the system should behave ends up in an agent that is able to outperform the expert system and this is what we have here in which a reinforcement learning agent will typically as we have seen in the field outperform an expert system however as we said before we have a challenge in the initial stages of the of the training system because the behavior is erratic and that is not acceptable in a system like a real network of course it's acceptable in a video game so what do we do in order to to close this this gap so you can see here different strategies to learn from an environment at the bottom left part you can see the live network which is what we were pitching at the beginning so that is technically possible but you have to leave with some interactions that will be erratic of course the characterization that you will get from the system will be very good but it will have an impact which most operators will not have to will not want to have then on top of that we have the network simulator in which basically we can think of it as imagine that you want to learn how to fly an airplane and then you begin with a flag simulator of course it is a limited representation of reality it's not exactly reality but at least you can very quickly learn about the fundamental mechanisms and trade-offs of the activity that you want to learn how to carry out without any real consequence in case of an action that drives the system down and then there are other ways for example training a system to mimic an expert algorithm at the beginning or recording the consequences of past changes in the network and then replaying all that sequence of events in order to learn from that in our case we have decided to go ahead with the following first we connect our agent to a network simulator and then we carry out the initial learning there basically we learn about the fundamental trade-offs and the fundamental mechanisms of the optimization of the parameters that we want to optimize taking into account what our main goal is in the long term and then once we have achieved a certain degree of proficiency then we switch that to the live network so that we start with an agent that is not empty basically it is an agent that already knows the fundamental trade-offs about how to optimize the system and then we take it from there we start interacting with the system now everything we do is not erratic any longer but still we learn from the consequence of our actions so we continue further tuning our logic as time goes by now we will focus on a real case study and how the use of reinforcement learning can be applied to the optimization of one or more parameters in a wireless network in switzerland there are very strict regulations in terms of the maximum power that can be radiated through the air interface we received a request from one of our customers swiss com and the request was to reduce the power transmitted by the 4g base stations with the idea of using the safe power in the future deployments of 5g base stations however reducing the power in 4g can potentially cause coverage and throughput degradation this has the good news that coverage and throughput can be improved with the right configuration of the antenna tilt what is the antenna tilt is the vertical inclination of the antenna or to be more precise what we use was the remote electrical tilt the remote electrical tilt is a device in the antenna that permits modification of the vertical antenna pattern that is the pointing direction of the antenna and this can be done through a remote computer terminal which means that it does not require a very a visit to the site this is also the case when changing the transmission power of the base station that can also be done remotely therefore our proposal is to decrease the transmission power of the 4g base stations in a controlled manner so that it minimizes any performance degradation and can be compensated later with an optimal configuration of the remote electrical tilt in the slides we can see in this slide we can see the complicated scenario that we use to test our solution it is a healing terrain in the alps on top of this the area had been previously maintained and optimized by expert engineers which means that the performance of the baseline scenario was very good and this makes the task of improving the performance very challenging we selected a group of 79 base stations or also known as cells to optimize that is the core area we can see them in the map in there are these little sectors or cells in green but we also monitor a number of cells what we call the buffer area that this these are the cells in red there are 82 surrounding cells and that compose the buffer area that are used to make sure that there's no impact in them when changing parameters in the core area here we can see their enforcement learning approach that we have followed to optimize the remote electrical tilt of the antennas we propose an iterative approach in which the value of the antennas is adjusted incrementally with just one degree per step this way we can minimize the potential negative impact of our own decision with this approach the target is to pre-train the agent as much as possible with a network simulator so that the agent learns the best strategy to optimize their remote electrical tilt this is what we call the offline learning phase the network simulator lets the agent explore the vast space of possible stakes and actions without the inconvenience of degrading the real network due to unfortunate decisions although the network simulator models the basic mechanisms of a real wireless network it cannot behave exactly as a real network but this is enough for our agent since we have to remember that the target at this stage is not to come up with the optimal parameter values but to capture the best logic that permits obtaining those values once connected to the real network which is the second stage of the proposed approach once the agent is connected to the real network it can use the knowledge acquired from the simulator to propose the best remote electrical tilt values and at the same time continue learning from the feedback provided by the real network after every step for the case of the power optimization we have followed a different approach and like the case of the remote electrical tilt with the transmitted power it is easier to estimate the expected impact of a change in the performance of the wireless network in an offline manner to give a simple example double transmitted power implies double received power for this reason it is possible to use an emulator instead of a simulator the emulator mimics the behavior of the exact network scenario that we are optimizing which is not the case of the simulator in which we were looking into many random scenarios to extract the key trade-offs that lead the optimization the performance predicted by the emulator after a change in the transmitted power of one or more base stations will be very similar to the performance in the real network and thanks to this we don't need to connect our agent directly to the real network but we can interact with the emulator which works as a digital twin and then obtain the final optimal values of the transmitted power per base station and this has two main benefits first like in the case of the simulator the agent can learn the optimal optimization algorithm from the digital twin in a safe way without interacting with the real network and second the procedure does not need to be incremental since we can directly come up with the final optimal values in just one step one step in the real network but multiples iterations with the digital twin as we can see in the slide there are two instance in which there is interaction with the real network one here to retrieve the network configuration into the digital twin and another one here to load the final configuration of optimal transmitted power values into the real network at the end what did what we did was one power optimization step to reduce the power in a controlled manner followed by several iterations of the remote electrical tilt optimization approach and then we repeated the procedure another round of power optimization to further reduce the transmission power followed by several optimization steps of the remote electrical tilt to correct the network performance here we can see how every base station achieved a different final optimal value both for power and for tilt while most base stations ended up with a lower transmission power still some of them required a power increase in the case of the remote electrical tilt some cells were down tilted that is tilted down and some of them were up tilted that is tilted up to compensate for the power decrease after the power decrease applied to most base stations in the scenario in the slide in the following slide we will focus on the central area as we can see here most of the base station had a green value which means that they reduce the transmit power and it was compensated with antenna up tilts as we can see here but not all the base stations did this up tilt the final solution is the result of an iterative search of our performance learning agents which gradually steered the configuration towards the optimal values using an optimization logic learned from data and not designed by humans that optimization is contained in a deep neural network it is interesting to see how the combination of both approaches the remote electrical tilt approach and the power approach permit going beyond what a regular optimization would achieve this way after making the most of one of the agents new room appears for the improvement and when using the other agent and again vice versa when the second agent leads the network to a level in which no more gain is possible the first agent finds extra room for improvement and well as you can see here that has been shown some sequence of the different parameter changes as you can see some of them are power some of them are red and they are combined through the time let's summarize the results of the trial in a first iteration a combined power and electrical tilt optimization was used and we managed to reduce the average transmits power in 10 percent with no quality degradation actually it was just the opposite we achieved even a 12 percent improvement in the user throughput then we executed a second round of power plus remote electrical tilt optimization with a target of exploring the limits of our solution this time we reached a 20 percent decrease in the average transmission power compared to the baseline and we still got a user throughput improvement of 5.5 percent this power decrease was translated into a 3.4 percent reduction in energy consumption at the base station we can conclude that we have come up with a useful combined solution for power and remote electrical tilt optimization based on reformer's learning this solution uses the data from digital twins and from the live network to build the optimization logic automatically that is no human intervention and this has been done thanks to artificial intelligence this combined solution is able to reduce the transmission power while still improving the service level experienced by the customers if you're interested in knowing a little bit more about this topic you can read this article that we have written which is available online through the link shown in the screen and here concludes our presentation we are very happy to answer any questions you may have thanks very much for your attention