 So, we are going to try and debug some non-reveal writer bug, and we are going to do this by recording the execution of writer, where we first refuse the bug, and this is some video which I've recorded a couple of months ago, and now I'm trying to remember what was going on. To reproduce the bug, we select the entire document, and then we copy the document and paste it while everything is still selected, which will replace the entire content, and then we push the undo button and it's going to crash. So, now we use the rrreplay command to replay what we have recorded, and this puts us at the gdb prompt, and now we start to set a few... well, since it was crashing in undo, we set a breakpoint on when undo actions are created in writer, and now we just continue the replay, which is going to take a while. By the way, I've cut out quite a few chunks of the video where nothing interesting was happening, so this sort of thing probably took longer, and now we are at the point in time where the first undo action was created. Now we turn on logging about which more later, and the backtrace of the breakpoint indicates that this is a delete undo action, which makes sense, as the first thing that actually modifies the document is that the selection that we paste it into is deleted. So, the next undo action turns out that there are some flies that are floating objects inside the document, and for every one of these, we get an own undo action that is stored inside of the delete action, which you can still see on the stack at frame 6. So, yeah, there are several of these, and they will turn out to be quite uninteresting for this bug, but yeah, we'll have to wait through them. And now we are at another interesting point where the undo action for the paste is being created. Now the document is basically empty, and the content from the clipboard document is being pasted. And then we arrive at another undo action, which is again about fly objects that are being pasted this time. And yeah, at this point, we just express our disinterested in floating objects by disabling this breakpoint. And now we continue, and we arrive at the position where the crash happens, which in this case is an assertion failure. So now we look at the backtrace to see where we crashed. And if you look at frame number four, we are currently trying to set a field mark into the document. And the field mark is a kind of bookmark that represents a word compatible text field. So now we are going to look at this object in detail. And the most interesting bits here are the mn start node and mn start content members. These indicate the position where the field mark is being inserted, and also end node and end content. Now we are trying to have a look at what is actually there in the document at this position. And it turns out that the node there is an end node, which is obviously incorrect, because an end node cannot contain a field marks or any text at all. So now the question is, where is this object being created? What was the state at that point? And to find out, we are going to use a very interesting feature of R, which is we are going to set a breakpoint conditional on the address of the object. And then we are going to reverse continue, which is extremely useful. So this basically runs the program in reverse. And every time when we switch the direction which the program is being executed, it will just the first time stop immediately at the position where you already are. So you have to do it twice in that case. And well, we end up at the in the constructor of this object that was later causing the crash. And now we see that in frame number three is sw-undo-delete. So this object is being created as part of the delete undo object. And now we switch to gdb's text user interface, to step through the code a little bit. You can switch with ctrl A. And now that the object is initialized, we print its members. And we see that they have exactly the same value that they had at the point when the crash happened. And then we look at the node that is at this index. And for sure it's not an end node. It turns out it is a text node. So at this point in time, the document looked different than at the time when the crash happened. Now we try to look at the actual text in the text node. Unfortunately, this is not as convenient as it could be. And we see here the three magic characters, ctrl A, ctrl whatever that is, and ctrl B at the end that indicate the start position, separative position, and end position of a field mark in text. Now we're going to take a look at the most important data structure of a writer document, the nodes array. And because this is a very large document, it's going to be quite inconvenient to look at it in the terminal. So that's why we turned on the logging in gdb. So we can now look at the log in a text editor, which has far more useful search features. And we quickly find the node that contains the field mark by searching for its index. And here we can see that it's surrounded by start nodes and end nodes. And the nodes array contains a tree structure, essentially that is encoded into the array via start nodes and end nodes that indicate the nesting level. So typically, in this case, what you're looking at is a table and the start nodes and end nodes, our tables helps. Now we have continued to the place where the crash happens and printed the nodes array at that time. And now we can compare these two nodes arrays in the text editor from the log file. And we can see that yes, indeed, the index of this text node is different than what it used to be previously. And now we are going to take a look to see what other difference there might be. And the first question is, what about this table? Does it still start in the same position? And we can see that the table node is also shifted by one position. And apparently, the previous table, which you can see up here, its last text node, is still at the same position as before. And it's followed by three end nodes that indicate the end of a table. And then we have this here, section node, which existed previously, but now when the crash happens, there is no section node anymore. And this is the difference why it's crashing. So now the question is, why is there no section node anymore? And if there is a section node, there should also be a matching end node somewhere. So we are going to have a look at the end of the document now to see what the situation is like there. And we see that the size of the document differs by just one node. And there is actually a section node there. So another interesting question is what did the document look like at the start of the delete undo actions undo. So we are going to take a look at that next. And by the way, this nodes array is printed by some hundred lines of custom Python code that is loaded by GDB creates this nice indentation to indicate the nesting level of the nodes. So we reverse continue to the start of the undo execution. And it turns out that we see here a section node at index 19. Now we look if this is the same section node as we can see in the other points in time. And we see that yes, indeed it is. And that the difference between these two is that it used to be before the table. But at the time where it crashes, it is after the table. And then what actually is the difference here between these two? If you look at the lower document, then it has an additional text node just previous to the section node. And if we search for this node pointer, then we don't find it anywhere except here. So it looks like this is indeed the extra node that is also part of the problem here. So a few general points about the nodes array. If you look at the one that is visible in the terminal, then the body text starts at node 17 and goes until node 22. And there is just two text nodes and a section in the body of the document. Everything preceding that is special things like text frames. And there is an only node in there. So that means there is some embedded object in the document. The other special top level sections might contain things like foot nodes or tracked changes. But there are no such things in this document. So what are we going to do next? We are going to have a bit of a look at the code inside of the undo implementation for sw on to delete and see the many interesting things that it does. So this is the point at which it restores the sw history, which contains various things such as deleted flies and bookmarks and whatnot. And here this was the code that handled the start node. And here is some code handling special cases where sections are being inserted or deleted. And here is yet another yet different case that handles section nodes. This is the part that handles the end node, a couple of different members that are in the sw undo delete object that are for these various special cases. Here once the model has been updated, the layer is being recreated. But well, we didn't even get to that point. So that's not where the bug is anyway. So now we look at the object itself. And we see that this would be one of these special cases. We don't have a valid pointer here. And we see there are these members like MNsectDiffMnReplaceDummy and so on. And this table del last node and del full para and none of these special cases are active. So it looks like a plain ordinary delete. The delete undo has lots of individual steps and they all have to be done in a particular order because some of them depend on previous steps. And yes, the order is not the right one than your problems. Interesting question here is about this node array at the start of the undo. Is it actually looking plausible or not? Because that would indicate whether the delete undo action is the one that has the problem or whether the paste undo action is the one that has the problem. And if we look at it here now in the lower pane of the text editor, then we see that there are two text nodes in the body of the document. One is inside of a section and the other one is outside of the section. And this is maybe a bit suspicious because the paste would have been executed after the document has been deleted completely under that point. There should really be only one text node in the document left over. And if the undo of the paste has created a nodes array which has two text nodes, then it is a bit suspicious. So now we are going to look at the situation when the undo action for the paste was created. And we go a bit up the stack to see the code that is in swdoc that creates this undo action. It is created before the paste is executed. So before anything is inserted into the document and it has passed just the insert position and the way it works is that there is a separate update function for this undo object that is called later after the content has been inserted. And at this point in time, before anything is inserted, the nodes array looks like this. So we see that in the body there is just one text node inside of a section node. So there is no text node preceding the section node but still inside the body. The only other text node is node number six which is in some sort of text frame. So we have continued until the paste has been executed and the content has been inserted into the document. Now we have the situation before the content was inserted in the bottom viewport of the editor. And we want to look at the situation now that the content has been inserted. So we print the nodes array again. And now we look at the nodes. We try to find our node there and well, it's not there somehow. We find the previous nodes arrays. This is weird. So somehow there are lots of nodes in this document but the node addresses are not the same as previously. Oh well. The problem was that we were looking at the wrong documents nodes array. This is the nodes array of the temporary clipboard document and not of the real document. So yeah, well, sorry for wasting a bit of time but this sort of thing tends to happen when debugging. So now we need to look where we can get the real nodes array from. And apparently this copy function is called on the swdoc of the source documents but the insert possession parameter must of course know the target document. Now we take a look at what is going on and we see that inside the body, the very first node is a text node preceding a section node. And then at the end of the document, there is a text node followed by a section node followed by another text node inside the section. And if you look closely at the address of this text node with index 3760, it also exists in the middle nodes array but it does not exist in the one that is the second from the bottom. So it was inserted by this paste operation and the middle viewport was from the time when the undo of the swinserts had run, had finished and it's still there. So it was not deleted by the undo of sw undo inserts. Now we continue until the time when the undo of the paste is executed so that we can have a look at what is going on there and we step through the code. We stop to take a look at the member variables of the object. We see the node index 3760 as the end position of the insert. So this would presumably be the last node that has been inserted but it is also a node that has been inserted. It did not exist previous to the insertion. So here the nodes are moved from the documents nodes array into a separate nodes array that stores content that is preserved for undo. And here one of the nodes is being deleted and this would be a text node that is being deleted. Now we want to step backward a bit to see which node it was. Reverse next is a bit slower than next but it does work. Now the node that is being deleted is the one with index 18. So it is the one at the start of the undo range that we saw printed earlier. If you remember the sw undurange base class of the sw undo inserts and its members. Now we try to find the node in our previous printout but it does exist despite us not finding it and the explanation for this is multiple inheritance. The top level node class is not the first base class of sw text node. And here we want to set a breakpoint in the constructor a conditional on the node address and it is important to set this on the sw node constructor and not on its subclasses because the addresses that are printed here from the sw nodes array are sw node addresses and these types use multiple inheritance so the subclasses don't necessarily have the same this pointer. Now we reverse continue to where this node was created and we see that unsurprisingly it was created while inserting the clipboard content inside of the copyimpleimple function in this piece of code. If there is no text node at the destination the pdesk text node variable is null then a text node is created and what is at the insert position what kind of node is there tis index 18 and we expect to find the section node there. So the clipboard content is inserted before the section node and as we have seen the sw undo inserts does not delete this text node that was inserted here. So now we investigate the code in a bit more detail how it undoes the insertion to see if we find anything interesting that looks like it would delete inserted nodes and there is this bit of code that does indeed delete an inserted node which is a text node but we did see this code being executed in the debugger and it did not delete this text node it deleted a different text node and the else branch in this case wouldn't do anything much different it would just delete the text node in a different way. So now we want to find out more about this variable mp text format call like how it is initialized and we see that it gets value if the point of the rpump parameter is on a text node so that means this start node is a text node start of the insertion range and a text node always has a paragraph style associated with it so in that case it's not null if you manage to get into this branch and assign it and and now we look a bit at the code which inserts the clipboard content here the undo object is being created and we see that this variable is even initialized depending on the source text node and not anything that's at the target insert location so still nothing terribly interesting now this set insert range function is called after the content has been inserted and it updates various member variables of the undo object such as the end position and now we take a look at the copy function it does not do a whole lot before it creates the undo object it is passed the parameters rpump which is the source range that is being copied and rposs which is the insert position and the variables p start and p end point to the start and end position of the source range there is some code to handle special cases like if there's no text node where the cursor can move forward or backward into here we have one of writer's favorite code patterns the do while false loop with a break in the middle but what we are actually interested in is places that use or manipulate the inspose variable which is the insert position and we can see here temporary text node is being inserted in case the destination position does not have a text node but some other node and as we can see the source range did have a text node at the start but the pdesk text node variable is null in our case so that's why we create a text node at the insert position and now we want to actually step through this code and let's see why did we skip that previous block so we certainly don't have p column cell true so maybe it's because the start position is at the start of a paragraph that's the third part of the condition now the end node of the source range is a text node that's why we get into this branch so the top that's the branch we have taken if we hadn't taken that branch we would have split the text node at the insert position and so on but that is entirely hypothetical and this is another branch not taken now here the source end text node copies the text to the destination text node that was just created and there is this very interesting comment before that which says something about the insert node being deleted doing undo which kind of sounds like something we would like to happen so there is this variable that is indicated here and it is indeed passed on to the undo object later and what does the undo object do with it it says this boolean variable to false and increments the one of its positions and it says something about a table selection hmm clearly we don't have a table selection in this case so this point is kind of curious why this code exists why it was added and the rest of what this function does is not interesting at this point now we try to look for places where this variable would be set to true and the first place is where it's being declared it's set to true if the start node of the source range is a text node which is the case here and the second place we don't hit that one and we hit the third place this here if the destination position does not have a text node and here we even have this also ensure insertion to check that the variable is not set to true twice basically so now we take a look at the git log of the undo implementation here to see if there is um anything interesting there why this code was added so we see one commit where it was translated from the Klingon original to English and the original comment was added in the initial import commit CVS from the year 2000 so that code was always like this so we found out not very much here but clearly it's very suspicious that we hit two places in the code that want to set this boolean variable to true if there is this assertion about that causing problems in undo because in every place where this variable is set to true a text node is being inserted or an existing text node is being split into two so now we try to look at the git log of the copy code and we see that this was actually not added in the initial import but it was added in some in the year 2003 at a time when star office releases had some code name from Star Wars or something and there are two bugs cited here but unfortunately these IDs are for the star office internal bug tracker that has been lost forever now so they are also useless to us we won't find out what the scenario was that was fixed by this commit one often finds interesting things in the git log but unfortunately not this time so now we take a bit of a look at the undo code to see if we could learn anything from there given our improved understanding of the copy code that we have now and in particular if this condition would have any effect there so here we see this boolean member mb start was text node is checked but it does not result in deleting any additional nodes or anything like that it just moves the cursor here again is the code that deletes one text node but clearly there's no way for this unknown code to delete more than one text node which is what we would need here and here we actually saw the first line of of this debug output was the also ensure being printed so now we step through the unknown code and we check what exactly is being moved here and by the way i'm positively surprised that this also ensure was actually an indicator of a real problem because a lot of these old assertions are just noise based on basically wishful thinking and we are again at the place where the text node is being deleted and we see there are two text nodes before the section node and then we mess up with the keyboard and accidentally suspend our debugger but that can be fixed so clearly the problem here is that in this situation where the paste inserts two text nodes the undo must delete two text nodes and not only one if you want to find how we fixed this bug then you can just search for the issue id in the git log and look at the patch thank you for your attention if you made it this far and i hope i was able to convince you that a recovery play debugging and the reverse continue feature is really quite useful when figuring out tricky problems such as this and we even managed to do it in a single gdb session this time there is one more very useful trick that we didn't get to use this time which is that you can set a watch point on a variable with gdb and then reverse continue and then stop when this variable got its current value