 I think we should start already time is okay okay I wanted to talk about what is happened on LibreOffice site since in version 7.0 in operability topic or can pronounce this word correctly this is expected to be a joint talk together with Michael Stahl the part of the talk I will talk about what I achieved the part by the Michael initially we're planning to spend more time about interesting cases found during this work but practically especially after yesterday talk with chat with Marina I've decided to put more focus about how actually this is done in my case probably it is will be of some use to people trying to do these things for people who are already doing for it for many years it is a very unlikely to find something new okay practically expected to talk about in this case about Microsoft formats and how LibreOffice is opening and saving them back where of most popular among the people and organizations and we should also should not forget about the old Microsoft binary formats dock XLS and PPT are still in use and also RTF for example in my country official governments are still operating with this old binary formats many many years later and quite obvious goal that LibreOffice should do things correctly with this format I mean loading and saving them back well how in general this is happening when LibreOffice is doing something wrong on dock X file and how this happens to be to work at first the goal is usual to minimize and reduce the test case received from from bugzilla or by any other means from customer or many others because usually this case is some it can be quite huge document very uncomfortable to edit especially when LibreOffice is built in debug configuration it is really painful to load the document with many images with many pages it is really slow and because it is made many times it is not nice and another topic of doing this step is the actually to easy to locate what the root of problems or it is located and they find the actual provision magic step and expected to be a profit but no classical problem that I was talking many times and many other people are doing the same minimize the test case is not guaranteed that we really fix the original problem so most important is to ensure that the original bug dock is also resolved and probably that's all at this level mostly I will not talk about the secret step number two it is a very specific step to the free and it's obvious but the step one about the minimizing and reducing test case it is some of most interesting this area as far as I see nobody invented any better approach when they do this this algorithm so at first the on every step we are just making the file backup account file backup why this is important because if we ruin documents somehow or bug is no longer reproducible we can easily fall back to just previous phase pretty obvious when just removing what looks like this is not used in this bug and probably it is not used we should verify that the documents is still okay isn't in word verify that the document is still broken in writer if I was okay repeat this cycle so pretty obvious but if we are going to the tails my personal opinion that the reducing of a document should happen on xml level so not edit the document directly or in writer or in word reason of this because especially for interoperability not for ordinal bugs but for interoperability bugs is that there is a huge chance that after saving the document in writer or same this once again with in word that bug will disappear it will be no longer reproducible forward to see it depends and for writer it is mostly a true because of course it if it is interoperable bug it will be most likely inside import filter and if the document is imported incorrectly and then saved back well we will find nothing and the serious problem here it is that we need to edit the xml files which are not expected to be user readable basically in this case which are located inside the archive which is actually our document package this is a relatively easy task for users who are skilled with Vim or emacs because of the possibility to do the things that directly in archive and we can format xml pretty good but I'm not a bad person and as you've seen the I've used mostly windows and forced to use windows and these programs are here not so convenient in this case and so I should invent my own ways how to do these things well one of the most important phase we need to somehow to format this document LibreOffice has a nice feature to enable pretty printing for xml but unfortunately it can be enabled in expert configuration but unfortunately it works only for oddity and for the format and only it works on the saving the document so not our case one of the easiest things it is just usage xml lint tool from xml LibreXml2 package or even when it was too lazy just google some online xml formatters where those no such file sites copy paste something and put back in document actually this way is working but not very convenient so practically just I invented a stupid primitive script on shell it is just doing the simple magic unpack all the package inside a temporary folder run xml lint with a format for any xml files and we pack it back and replace for file it works okay on Cygwin did not check it on native Linux but where it is most likely not so important free to use if somebody requires it and when a moment moment moment that's practically all what I can mention about this pretty printing here is a missing slide for me about actually reducing the document practically on xml level it is I say much more easier to do with these things but as for me at first they try to avoid and remove all the images and all the media data because it is decreases file size greatly improves the load speed of document also greatly and the important topic here it is not to forget to remove the references in inside relations file because otherwise what is quite strict about this if there is a reference to file the file doesn't exist in package word will fail and unable to open such files and then step by step we try to remove headers and footers probably style sheets no actual miracle just step by step by that algorithm I talked before some as I said see no medical bullet no sliver bullet here invented just as far as I know almost everybody is doing about the same way and right now wanted to remind about several interesting cases and interoperability bugs with resolved and fixed and just before going here and some introduction in info about how the list are represented in different documents inside oddity document the list are pretty simple so any list has a reference and the actual representation of list in style inside style sheet it is a very it's defined information about how this level is represented how we are formatted what the ballot or the symbol says and so on pretty nice and simple idea and it works what's about the dog kicks well station is much more harder we have for each numbered paragraph we have a marketing corresponding number a number and ID which refers to the list of the right table and in the site this list of right table there are actual references to real list table and of course like in oddity this list ID can also contain this inside list table were also possible references to styles what's most interesting here is a display in this diagram that we two different lists are referring actually one the same list definition and the site list of right table it is some special properties which are defined well this list is number two but we are using this number one but let's use exactly the same list but start from me for example from one or for that level we are starting from another value so we did a really overriding list this approach looks nowadays very strange but at least for me but it comes from the dark ages when 644 kilobytes of memory were sufficient for all and in that time such idea to store list it was quite helpful because of greatly reduced disk usage and memory consumption which is classical story that in document we most likely have almost identical list but can be many of them and in that time this approach was pretty good we use less this space in memory nowadays well only for historical reasons from my opinion and such architecture is really have reason of many problems and misunderstanding misimplementation and so on and so on and for list it is one of the important topics and let's look how it's in practical in practice one of the topics I was fighting times ago which is the this interesting right to left to order at least because the Texas RTL I also put the screenshots from left from right to left to right it is a what is expected to be to left the writer we don't care in this case about the different puddings only one small issue for different headings we see here that the writer put dots here and expected to be dashed symbols sounds easy but actually it is not investigation shows that the inside the doggies format we have a list from our strings so for each level we define such a string where this percentage size signs I see substituted by corresponding numbers or any characters so in that document I was showing here this from a string was looking like this in complex case theoretically what supports the some crazy things like whatever what we have in writer well writer supports the prefix what is going before the number in suffix what is after and hardcoded inside the code the dot symbol as a separator that's all so all these require to be redesigned so nowadays a Libro says internally is using the same form a string as the same idea as inside the world all other are converted to this format and the suffix and prefix I used only as a fallback cases for this for the this feature is not extracted into UI but as far as I seen somebody was trying to do this where I say not working separator the field but it doesn't work as expected for me for me it looks like it is a good topic to include inside the standard because the such form a string case much more flexible when the prefix and suffix and initially problem of this bug doc was some Arabic I suppose text and the limiters were absolutely different not minus less like here and there audacious but some specific Arabian Arab you did it or I don't know I'm not the expert in that area another simple case looks like a simple but it paints me drinks me a lot of blood we do see some unexpected format level numbers in different places again it looks relatively easy but it turns out that after debugging that the what the table that I shown you initially about that list table list of right table is not only feature how the list can interact with each other there are also special styling the talking which can refer from one list to another and he is also at the additional complexity to how the list are behaving and how we we can combine it and another reason of terrible regressions and entropy bugs so what's in practice when I was fighting with this list I made 16 commits but I didn't count them exactly just lazy we've mentioned of least in my latest comments unfortunately many of them I regressions I introduced by myself during previous list fixes from one point of view it is looking like a moving the one step forward two step back but I'm not quite sure that this is a case because we right now all these cases are covered with test cases they are tested and from my point of view it is anyway a progress and I want to especially thanks to Xisco for his patient and finding all the regressions and kind of assistance here so this guy my friends in and let's not brief what I wanted to talk Michael your turn okay thanks so oh slides went away did you turn off the screen sharing no oh no no they are back okay cool yeah so one peculiarity of a word document model is that the paragraph mark is an actual character in word and so you can apply formatting for that character and word will paint the formatting then on the paragraph mark if you oh I'm sorry I have an issue here because the shared screen does always goes away I will look at the other screen okay I'm sorry I'm right so word paints this paragraph mark this formatting on the paragraph mark which you can see if if you turn on the formatting marks in the toolbar and it also paints the formatting on the list label if the paragraph is in a list and in writer the situation is different because there is no paragraph mark there are only paragraphs and paragraphs are objects basically yeah so this presents an obvious inter problem and what we found is that the import of this paragraph mark formatting was then implemented in the log X filter while creating a text attribute at the end of the paragraph that is has no extent it's zero length and this could then be painted as a list label and it could also be recognized in the dog X export filter and exported to dog X again and yeah this was a bit of a hack and it can only handle simple and most basic cases and for example if the characters at the end of the paragraph happened to have the same formatting as the paragraph mark then writer would merge the empty attribute into the previous attribute and the export filter wouldn't find it anymore so it was lost yeah and it could also happen that if the user edits the document that the empty text attribute would then no longer be at the end of the paragraph and it would also be lost so next slide please then what have we done to improve this situation well since we don't have a paragraph mark in writer we have added paragraph formatting item that is now a property of the paragraph object and we called it list auto format because we think the the more important aspect of this is that it applies to the list label and we we can paint this now as the list label the list label can already have a character style applied from the list formatting itself and that would then overwrite the the list of auto format because that's the way it works in word basically and we can also import and export this paragraph mark formatting in the dog X filters and then this works with hard formatting attributes already in LibreOffice version 6.4.0 and it turns out that you can also apply a character style to the paragraph mark and this is working since version 6.4.7 but this is not entirely complete yet basically at that point we had solved the the problem that our customer had and there are some obvious follow-up issues there like RTF filters and it would be useful to be able to export this paragraph item to as an ODF extension and there's also no use interface to be able to edit it and we can currently not paint it as the paragraph marker yet only as the list label so next slide please so here's an example and on the left you can see the reference document in word and the first half of the examples have a character style applied and the second half have both a character style and then hard formatting on top of that and on the top right you can see what it looked like in writer version 6.3 and so the list labels most of them were not shown correctly but a few things did work and then in the lower right you see the current status where all of the list labels look the same as they do in word but you can see that the paragraph end marker is looks the same as previously so there were no there was no improvement there yet okay so that's all I have for this and we should like to thank our customer LHM the city of Munich for sponsoring these improvements okay now back to you Vasili practically this is all about we should wanted to talk about wherever task and if interoperability fixes are quite boring and I think not worth any mentioning just ordinary work nothing to share this where I just the most interesting cases from our point of view thank you for watching