 Please tell us a little bit more about deep reinforcement learning for AFM. Yeah, so today I would like to tell you about how we train the deep reinforcement learning agent to manipulate individual items. So this work is a collaboration of myself and several of my colleague from Alto University and the Finnish Center for artificial intelligence. Yeah, so in this project we also try to explore how machine learning can help us do experiments. In particular, we focus on using deep reinforcement learning technique here. So in reinforcement learning, we usually have a agent. So in this case we will represent it with a deep neural network, and we wanted to achieve a certain goal. So, for example, it could be that we wanted to control a robot arm to manipulate an object to a target position. And this is how the training would go. So at each time step, the agent will receive an input information about the environment. So this is called a state. And so in the robot arm example, this information could involve the position of the object and the target position, and maybe the position of the arms. And based on this information, the agent will take action. So in the robot arm example, this might mean that it could move the robot arm around. And then, and then this action will have some sort of effect on the environment. Based on the this effect and the goal we have in this problem, we will give the agent and reward. So this reward is very important because this is how we encode the goal and communicated to the agent. In reinforcement learning, the agents are usually designed to maximize the overall reward. So if after collecting some enough training data and this reward signal, the agent should be able to know how to achieve the goal. Up to now deep reinforcement learning has been mostly used in games or simulated environments because that's where it's easier to get training data. But recently, as deep reinforcement learning become more efficient and stable. They have also been used in some real world scientific experiments. For example, in this example, they use, use it to control super pressure balloons. And recently there is also a work where they use, use it to control nuclear fusion experiments. So here in our work, we want to expand the application to atomic scale experiments. And the question we want to answer here is, can we train the deep reinforcement learning agent to manipulate individual atoms. So maybe a lot of you are familiar. The experiments people usually do to manipulate atoms are in scanning tunneling microscope. So this is how scanning my microscope look like in the lab. So inside this big vacuum chamber, we have a very sharp tip like this. And when it, when it get very close to a substrate, there will be a quantum tunneling electrical current that flows between them. And this electrical current is very sensitive to the material of the tip and the substrate, and also their distance and this sensitivity allow us to take atomic resolution images of the substrate. And besides imaging it also allow us to manipulate individual atoms and create different type of structures like, like this here. So after this technique was invented in the 1990s. A lot of researchers has been using it to do all sort of different things. For example, it's a very good way to express their creativity, like making a movie or some art. In condensed matter physics, researchers have been using has been using this technique to assemble new type of materials with properties that doesn't exist in nature. And they can do it atom by atom by precisely placing atoms in where they want. And it's also a very maybe a unique way to push the boundary of how small we can make computational material devices. For example, in these three works here researchers have been using this technique to make small memory logic gates and Boltzmann machine based on moving around individual atoms. So as you can imagine to build all these structures and maybe operating these devices. It's, it will require really thousands or maybe even more operations manipulation steps, and this is extremely labor intensive in the lab. So one of the important motivation for our work here is that if different enforcement learning agent can take over the atom manipulation part, we will be able to do this atomic assemble assembly autonomously. And, okay, so before jumping into the details of our experiments. You might be wondering if it's difficult to manipulate atoms, and what are the challenges that deeper enforcement learning agent need to solve. So, in fact, it's not difficult to move atoms. We do it unintentionally in the lab all the time and also in our life. But what is difficult is to be able to move the atom precisely to put it exactly where we want it. And this is difficult because it right requires us to have a very good control of the interaction between the tip and the atom. And to do that, we need to set the correct manipulation parameter depending on the exact atomic configuration of the tip. And that is something we don't usually know. And that's also something that would change with time spontaneously. And so without this knowledge it becomes quite challenging to, to reach this position. Okay, so here I'll start talking about our experiments. So in order to use any reinforcement learning technique. Before that we have to put the atom manipulation into the reinforcement learning framework. So first we need to define the goal of that the agent should try to achieve. Here we define the goal is that the agent should try to should learn to manipulate atoms precisely in this stochastic environment. To learn to achieve this goal we need to design a learning environment. So it's a bit like designing a game that it can play. And there is several standard parts that we will need to set. So here we decide that in each training episode the agent can make up to five manipulation steps. And at each time step, the agent will receive some information about the environment, which is called the state that include the position of the atom and the position of the target. Based on this information, the agent can take action that involves moving the tip around and while applying a current and the bias. So here you might see that this is a bit similar to atomic arm examples. So here it's a bit like we are trying to design an atomic, not atomic robotic arm. So here it's a bit like we are designing a robotic arm that is capable of moving atoms. And then we need to decide on the reward that the agent will receive. So here, how we design it is that the reward will be separated into two parts here. So the first part depends on how much the manipulation error has improved based on the specific action. And the second part depends on whether the agent was able to move the atom towards the target position within some tolerance range. Okay, so these are the rules. Okay. And, and to do so. Yeah, so this was our first experiments and here we choose the material system is the one we choose is a silver so everything here is silver include the tip are coated in silver and the substrate and the atom we manipulate are also silver. So then we are ready to apply deeper enforcement learning models. So here, the, the model that we choose to use is called the software actor critic. So the word soft here refers to the fact that here, the agents are not only encouraged to maximize the reward. It's also encouraged to maximize the entropy of its action, which means that it should try to achieve the goal, but you should try to do it, but it, but you should try to do it in the most diverse way possible. So you should try to find all the possible solution to this problem. And on top of this model, we also use some experience replay technique that can help us improve data efficiency. Here we use the high sign experience replay. So what this does is that he helps the agent to learn from experiences where it, it actually moved the atom to the wrong positions. And another technique we use is the prioritizing recent experience replay. So what it does is encourage the agent to learn more heavily from recent experiences, and that's from ancient experiences. And here is our training results. So during training, we, we monitored this for values. And as you can see here, in the beginning of training, the agent was receiving very low reward and has very low success rate and has a very high manipulation error. But towards the end of the training, the agent was the reward improved and it could reach 100% success rate, which means it's 100% of the time it's able to move the atom to the target position within the tolerance range. And here, we can also see that it's able to reach a manipulation error that is very low. So this is below the theoretically possible error determined by the geometry of this silver 111 surface. And the episode lens also decreased, which means that the agent has been trying to complete this task with as few manipulation step as possible. So here we terminated this training when the agent reached 100% success rate for extended amount of time. And that took us about 6000 manipulations, and that means 40 hours in the lab. And, okay. And here we also mark the three times where we observe that the manipulation tip has significantly changed. And, and you can see here that every time when the tip changes significantly, the performance will decrease, but after some retraining the performance will improve again. So that is an indication that the, the continue the deeper enforcement learnings ability to continue to learn actually make them quite adaptable to environmental changes. And that is something that that is maybe better than any sort of fixed manipulation parameters we would usually use. So here we show that for different tip conditions. We compare the reinforcement learning agents performance with sets of a set of manipulation parameters which are the horizontal dash line here. And you can see here that these parameters that are hand tuned by experimentalist, they don't always do well so every time the tip change they might just deteriorate. But the deep reinforcement learning agent after some training you can always reach near optimal performances. So here, then finally here it's a video that shows what a trend agent is doing in this learning environment. So, in every episode we will give a new random target to the agent and, and it should try to move the atom which is the blue dot here towards this position. And here you can see that it's, it's capable of doing that. So, once we have this trend agent, then we are ready to combine it with some other classical algorithms and make a fully autonomous atomic assembly software. So we combine it with two path planning algorithms, the Hungarian algorithm and the rapidly exploring random tree algorithm. And here, as a demonstration we use them to make this 42 atom structure. So they, it was able to do it autonomously and with atomic precision. Yeah, so in summary. We showed that yes deeper enforcement learning agent is able to learn to manipulate atoms and it's able to do it with optimal precision and against changes in the environment. And, and it was able to do it was a and the training duration and the amount of data it required was reasonable. And so going beyond this work, we are interested in using similar technique to do more challenging tasks in the lab. So in manipulation tasks that could mean manipulating molecules or atoms that are more difficult to manipulate. And even things that maybe we don't as experimental this we don't know how to manipulate yet. And here is the paper that describes our work here. And thank you very much. Okay, we have questions already. Thank you very much for this amazing talk. So, could you elaborate a little bit more on the multiple atom case, because I imagine if you want to build this 42 atom structure. So there's a little bit more decisions to take right because the agent has nowhere to take the atoms from and which order to build up the structure so is this given like sequentially a single atom 42 single atom manipulation that the agent is supposed to do, or does it decide, you know, how to build the structure from scratch. So, yeah, so, so here the deep reinforcement learning part, the agent only know how to handle one atom to one position. So, to build a bigger structure we kind of outsource the rest of the thing to these two path planning techniques. So the Hungarian algorithm is, so we have 42 atoms and 42 positions is should try to minimize that this the overall distance, the old atoms need to travel and rapidly exploring random tree algorithms so this is not an optimal algorithm so it's a, so after we deciding okay Adam a go to position. One, you should try to find a path that doesn't that protected from colliding with other atoms. So, so the collision is also a problem here, but but it doesn't know how to do that in the shortest path but just in the safe way. Thank you for the, for the really great talk. Just maybe one silly question and maybe said that I missed it but this is in the simulated environment right. This is not actually in the lab. This is actually in the lab. And you did this in one go without doing this in a simulated environment. Um, we did in the simulated environment was just to make sure that everything works as he should but otherwise we started with a randomly initialize agent in the lab that was that didn't see any simulated data. Okay, pretty cool. Thank you. Okay, and last question from Kevin. Thank you for the nice talk. I was wondering whether, like if we change the chemistry of the substrate, would we need to restart from scratch the machine learning, or yeah what was your opinion this. Yeah, so, um, we didn't try that but I, I would think that some level of retraining will be required. So, um, so I didn't mention this but like in the actions that the agent is allowed to take there is certain range of bias for example, you can only apply a certain range of bias so if what we want to do is like out of range. You will not be able to handle it. So it doesn't have like this general intelligence. It's just optimization problem. Okay, then let's move on. Thank you again. And our next talk comes from nearby from Cynthia Estebanowska from Ljubljana.