 So this talk is about change tracking in writer and its performance issues. So I'm Michael Stahl, I work for CEV. And yeah, so first we are going to have a look at what the problem is. So we are trying to solve here. Then we are going to discuss the general idea of the solution. And last part is the implementation. And at the end we are going to have a look at the final status basically. So of course change tracking in writer doesn't have only one problem. This is just basically the most obvious problem there. So unfortunately the other problems will still remain. So in the menu we have this wonderful setting where the user can set the track changes to either be visible or to be hidden. And it's such a popular setting that we even have it twice in the menu. Now this is what your document looks like if you have a track change. So basically it's a very simple document. We have two paragraphs at the bottom. Those are the red marked text notes. And the three letters that are marked in blue, those are supposedly a tracked deletion. So we have deleted these three letters. And if you set the menu item to show the track changes, then your document model will look like this. Then you toggle it to hide them. And what happens is that your document model actually looks different. The track change is moved out of the document body area there and into a special section in this array data structure that then contains the deleted text. And the body text now only contains that part of it that is not deleted. So in order to implement this view option we have to actually modify the model. And now this gives rise to this performance issue where we have a couple of editing operations such as deleting text where the implementation looks like this. So it first saves the current setting of this document model show and view, show or hide. And then it moves the red lines in the document model to be in the body text because what comes next, the implementation of these editing operations, operation can only be performed if the red lines are actually in the body text. And at the end of it, all of the red lines in the document are moved again to where they were previously. So we have to move everything twice. And of course if you have a thousand red lines in your document then it's going to be very slow editing. And then we have this ridiculous dialogue that pops up after I think 250 where the user is advised that hiding the red lines is going to be slow and would you rather show them instead. And what this doesn't tell you is that showing the red lines is also slow but it's less obvious why because the auto format will in that case move them around every time you split a paragraph. So that is the problem and the solution is conceptually quite simple. So essentially we want to extend, we want to get rid of this moving the red lines around in the document model. Instead we want to just teach the layout, the document view to hide the deletions. In order to do this, we extend the layout frame that is associated with the paragraph, the SW text frame with the simple new data type that contains just a listener, a multi-listener that allows the text render to listen at multiple model text nodes. And it contains extensive information that is the text ranges inside of the text node that are actually visible. It contains the merge text that is the visible text in all of the nodes that are in this text frame. And then it has a couple of pointers to specific text nodes like the first and the last one. So this is quite simple but the problem is then making use of this data structure and maintaining it during all of the editing operations. And also the reason why this is some effort to change is that there were previously two invariants that are now invalid. So previously if you had an index into the text frame then that is a one to one, the index into the document model SW text node. So all of the formatting information in the text frame refers directly to indexes into the string that is a member of the SW text node. So with this new merge paragraph data type what we get now is that we have two different indexes, a model index, the SW position and a text frame index. And we have to use the extent information in the merge paragraph to map between the two. And the other invariant is that previously we had a one to two N mapping between nodes and frames. So one text node could have multiple frames. So for example if your paragraph is so large that it's split across multiple pages we'll have one text frame per page. And yeah there is this listener mechanism SW Modify SW Client that is used all over writer and it's also used here. And this mechanism has the limitation that an SW Client can only listen at one SW Modify. So with the merge paragraph what it looks like now is that we have this additional writer multi listener in the text frame. And this allows us to connect one text frame with multiple text nodes. So it will get notifications on every editing operation on every connected text node and can update itself then. So now for the implementation we had four milestones basically. So the first one was that we needed to adapt the text formatting itself. And yeah first we started with adding a new flag to the root frame that is the root class of the layout to that basically stores whether the layout will hide or show the red lines. And yeah we have to add this merge paragraph data type to the SW text frame. And yeah there were lots of classes there involved that needed to be changed. In particular the difficult one is the attribute iterator that is the class that splits the text basically into a portion that within a portion everything has the same formatting. And yeah this has to be done in such a way that it doesn't stop at track change boundaries but merges the text before and after it if it has the same formatting in one portion otherwise you get kerning problems. Yeah and so then we of course have this problem that we have now two different indexes. How do we find all of the places that need to convert between these two different types of indexes? And in that case what really helped a lot is that we leveraged the type system to find all these places. We use this strong int template which basically is a wrapper around an integer which only has explicit conversion between itself and the underlying integer type. And so we can start by annotating the obvious functions in the text frame converting them to the strong int type. And then every operation where the cell int 32 and the strong int are mixed in one function call or in one addition or whatever are going to be flagged by the compiler. So we can just go through all of the compiler errors and adapt 1000, 10,000, no it was like 3000 lines of code to explicitly convert in the right places or change the variable types whatever. And yeah then we will have quite high confidence that we found all of the right places that need to do this conversion. Right and yeah of course there are these mapping functions at the text frame that can actually take an SW position and map it to the text frame index and vice versa. So yeah. Okay and the nice thing about this is that we can do this incrementally so we get a long series of small patches that where we can then use git by sect if we did any mistake to find quickly where the problem is. Right and then we have this other problem that we now have multiple nodes connected with the text frame instead of just one and so we had to remove like five different functions from the text frame that all returned this one node and replaced it with a couple other new functions that return the text node that is going to be used for the paragraph properties and the first text node and also the text itself which is now the merge text which previously was also coming from the one text node. And yeah then there is this SW iterator thing which I am not going to go into detail now. Yeah so that was the scope of the first step basically and the second step was then to adapt the connection between the core model code and the core view code so that the whenever editing operations occur that would merge to paragraphs or split them or something like that that the proper number of text frames merged or not is created for them and so on. And once that was done the feature was in a good enough state that we could turn it on in master and if you have the experimental mode setting enabled. So yeah. That by the way was also the state of where we are at the previous talk on this topic at the LibreOffice conference. So what's coming now is basically the new stuff that we have actually done now where previously it was just planned. And so the third milestone was firstly about accessibility and there was this accessible paragraph class that used to have connections with both the text frame and the text node and would notify any listening accessibility infrastructure about changes there and turned out that this accessible paragraph had yet a third idea of what is actually in the paragraph and a third way of indexing into the paragraph. So there was another way to map indexes already in there and yeah we had to adapt that. And we also had to refactor it so that it is now a client of the SW text frame where previously it would listen at the SW text node which was unique previously because just like the text frame there was one to end connection there. And the other part of this was various things in the document that can be numbered. So the general idea there was that we want to store two numbers for each of them. So in one numbering all the document content is counted and then in the other numbering for when the red lines are hidden we just skip counting everything that is deleted. And yeah this had to be done for footnotes, for numbered lists and outline numbering and this part took I think three times longer than I expected because there is really a lot of stuff that is interested in numbered lists unfortunately. And then there are various different kinds of text fields that have references to other things that are numbered or yeah the chapter field displays the outline numbering and so on. Yeah and so much for that and then the fourth milestone at the end was just a grab back of various miscellaneous things like the paragraph properties. We had to make sure that if paragraph properties are requested or to be set that they are set on the only SW text node that is actually the one that determines the properties of the layout frame and not any other merged ones. So the cursor may be positioned in the second merged text node but the properties should be applied to the first one where the cursor isn't. And yeah that took off some time and then the auto format was surprisingly important actually because it turns out that previously the auto format would only work if the red lines are moved out of the document body into the hidden section in the notes array and that is exactly the situation that we wanted to get rid of. So we had to change the auto format code to no longer work on the document model so it just iterates over the document model but then it actually gets the text that it operates on from the layout frames and it then will iterate to the next text node that is no longer in the same layout frame and so on. Yeah the auto correct is also is actually not implemented in writer but it can be triggered from the auto format in various interesting ways. So yeah but that was because there was already an abstraction layer for the auto correct thing as it was actually not implemented in writer so it could not directly access the text nodes this was relatively easy. And then there were the comments in the margin of the page and those needed to get updates whenever things were deleted so the comments with anchor you just deleted also goes away. And yeah then we have this PDF export which contains a function to export links to everything can link to bookmarks and outline nodes etc so we had to make sure that the things that are deleted if you export your PDF while you have the track changes hidden you don't want to have these links to these hidden elements that don't actually show up in the PDF so the links point to nowhere. Yeah and then there was the find and replace where we have changed it so it will find strings in the layout frames now no longer the text nodes so yeah that was also somewhat difficult. And yeah then there are table of contents indexes and so on. Yeah these also needed changing so that they the generated index also only contains those things that are visible. And yeah the one point we didn't really ideally solve yet is the linguistic features because ideally you might want them to operate on the text frames also but currently they still work on the text nodes. I'm not sure how big a problem that is so if you just if you don't have a track change that starts in the middle of one word and ends in the middle of another word then you might not notice it but yeah we just ran out of time there. And once we were done with all of this we finally removed the silly warning dialogue that I showed you earlier because we no longer need it and then we turned this new way of hiding the track changes on by default in master and it's on by default in 6.2 in the release. So this is your opportunity to test it and file bugs if something still doesn't work. Yeah so what then is the status at the end so this was yeah quite a lot of changes were needed there all over writer so it's yeah. If you want to toggle the new layout based show and hide mode you can use this dispatch or command dot you know short track changes that would that already existed previously but I have changes so that now it works on this view setting. And the document models a red line flags. Those are now while you're working with the document they are almost always set to show everything so everything that means everything is in the document body. And there are just a few features where we still move things around in the document model between the body and the special section for red lines and those are the document compare and merge features which are relatively obscure. And then there is the ODF filter in particular where ODF is defined in such a way that the track changes are well not in the document body they are also in a separate XML element that precedes the document body. And because the import and export filter for ODF work via the UNO API it turns out that we have to move the red lines into the special notes array section in order for that to continue to work. And then there are two properties on the UNO component for the writer document where you are still able to move the to call set red line red line flags internally and to move things around in the document model. Those are the show changes and red line display type properties and the reason why I haven't changed these is that it's possible that there are macros extensions whatever that rely on this current implementation of what they do. And I'm not sure if that is really the case so I just left it like that for now. And yeah in every other case where you do some editing operation in writer the if set red line flags is called then it will do nothing because the flags are already set to have the red lines in the body and the call wants to move them to the body so it does nothing. So yeah we can basically conclude that this performance issue of that that occurs when you edit your writer document is now fixed. Okay this is good because yeah we are at the end and all that remains is that I have to thank our sponsor the city of Munich without whom this work would not be possible. Yes so thank you for your attention. No it's the same. No that what that does is it sets a character automatic style property and a paragraph property so that is the purpose of these properties is that you can compare documents. So these are called RSIDs. So if you have what if you start with one document and then you edit it on one side and edit it on another side and you will have the initial text with one RSID or with a common set of RSIDs in the one branch you have a different kind of RSID. And then the other branch you have also a different kind of RSID so you can match you can see see that these two branches with two different RSIDs from different editing operations and the other things that have the same RSID are from the common ancestor. So then you know when in the compare feature what was deleted what was inserted basically on each of these branches and so on. So I think that's what those features are useful but I don't actually have anything to do with the implementation of them so I'm just trying to remember. So any changes document or that's hard to come by that's the computational complexity that is there for each time you press the key? Yeah I mean I think you could rate for several seconds if you have a thousand red lines or something like that if you just hit backspace. I haven't done really any extensive timing there but at least our customer said that with their test documents they could see that it was the editing operation that was previously slow was now basically instantaneous. So extending change tracking to other types of objects than plain text. Yeah I mean I think it's sort of orthogonal to this work so yeah shouldn't have much of an effect I think. No more questions I think. Oh we are over time we should stop.