 In stojevo, ampak, oto bilo prejseljne operezovane bilo, a na vse to je naprijetu, ki so udrivali po vsačje, iz kako dobih je dobročena in iz bilo še ospečno naprijet. To kakva, da sem prihodil noščoj trafini? na ladanju slovakosih, imeli klasoiski, izdenim slovenimi salahniih informacijna technologij bahtu. Prišli smo tko pošli dupo tašnjosti in prižilej se, da jih se pravih a ne pošlo. A to kako je čist. In prišli prišli, da pošli, da si je pošli. Zato se vsi taj bo hvalo različila, da čiti jaz tudi. Microsoft documents. It's obviously just an excuse. There are completely different reasons why do they use Microsoft stuff. So thus I started my activity in this area. I started already some time ago for the first time I presented this topic at Plugfest in Berlin. That time I did something very primitive. I just overlaid two documents. Differences are in color. The common stuff is in black. So this works for documents which are nearly identical. Otherwise we get nonsense. Results where I computed some measure and presented this as a right document. Then later in Milano I presented something different. I extended that by different kinds of measures and different views and tried to somehow evaluate the documents. In both cases what I did I've chosen set of applications and did the tests each with each. So if I selected 5 applications then I had 25 tests times certain number of documents so the number of tests grew quite rapidly. In Milan there was a talk by Adam Fine that time from Cloudon who talked about round trip tests where we check certain set of application in comparison to one reference application. I have a feeling that this is more suitable because in certain case we can choose such reference application. In this case I've chosen Microsoft Office and we can then compare set of other applications with that. So, what will this talk be about? It will be about automatic difference grading of documents so based on rendering into PDF then it will be about visual inspection of such differences, then on the round trip and print tests then about document relevance. I will show some results and then at the end I will show how we can automatically bisect interoperability errors based on these measures. So, differences. We take a document, we render it in one application into PDF in another application into PDF and compare these results as bitmaps. We have four measures. Two of them are computed per page. It's a number of lines of text. Sometimes it happens that the line of text is missing. I have not observed it recently but a few years ago I had some documents which were not completely shown in LibreOffice. So, this is one error measure. Zero is perfect and five is bad. Then there is text hate error. Sometimes interline differences are... interline spacing is sometimes different in different programs and this number in millimeters shows how good or bad it is. There may be also other reasons. For example, there is an object somewhere where it should not be and it then changes hate of a page. Then, what I do, I segment the page into lines, text lines or any kind of lines according to spacing and then compute two measures for those line pairs. One is vertical position. I simply align them and check if they are on the same place or if they are shifted. And the second error measure is once I have them aligned, I compute differences between details. I will show it in a while, the details. These measures are in millimeters but somehow we cannot compare millimeters in one error measure to millimeters in other error measure. Therefore, I grade this by a certain scale which is more or less arbitrary according to really personal feeling which is perfect and which is bad. We have eight degrees. Zero is bit identical. So it is absolutely perfect. One is perfect but not absolutely but the difference is not visible. Then five is really bad. Sometimes it happens that the rendered document is empty so it is graded by six and sometimes it happens that the document cannot be opened so it is graded by seven. And once I run my scripts, when I get numbers, then I generate a report. It's called spreadsheets with all these measures and links and everything. So most of these error measures are quite simple and intuitive but this one may be not. It's called feature distance error and it's about this case. So we have here a detail. It is there. So, balleted list rendered into applications. The true one is this ballet. The correct one is this two ballet and in some application instead is rendered this other ballet. So how do we find this difference automatically? The idea is quite simple. This is an image. This is a black foreground, white background. I filled the background with distances to the features. So here you can see a distance field. It's simply value which grows with distance from the foreground. Here it is bigger. Here it is also larger. Here it is larger. This is the other one. Then I subtract them and there where the detail is there is difference visually verified that maximum of this difference is related somehow loosely to the size of the difference. So this way we can grade tiny differences in rendering even down to level to sub millimeter level. So we have our measures. Then we run two kinds of tests. One is print test and one is round trip test. Again very simple. We have a document in this case dog X. It is opened by Microsoft Windows, printed to PDF. Then it is opened by say LibreOffice, printed to PDF. These two are compared and report is generated of this one. This is the report and views for visual inspection are generated. They will be introduced in a while. The round trip test nearly the same. Just here we open by LibreOffice. We save it to the same format to dog X. Open by Windows not by Microsoft Office print and then compare this to PDFs. And do again we do the same. These two results I mean this print test and round trip test are in general different because there is one thing which is called bag you know the name. Simply there are some features documents which cannot be handled by LibreOffice but are somehow kept not changed and therefore these round trip test should be better than the rendering than the print test and really it is so. Grab bag is the feature. Do I remember it correctly? Ok. So we had four numeric errors we have four views they are not really equivalent but somehow they are related. I generate side by side view very simple trivial intuitive one. Then I just overlay them as before these colors. Then after segmentation into lines I align vertically and overlay aligned lines and then I align these lines vertically and show the differences in details. So let's see what's here. I will use my glasses in order to see what's on my screen. Yes, it's completely different now. An example. I have a set of documents this one was tested by LibreOffice 4.3 and these numbers were computed. Hate error was nearly 5mm so there were some differences between lines. Feature distance details was 11mm so quite a lot. Line position error was similar. So in my reports it looks like this. So this is the first one zero, the same number of lines these two grades three and then grade five. They are encoded in color so that it's visually easy to perceive what's there and the numbers. So side by side view we see that they are pretty corrupt so letters have different size positions are different so this is side by side view the next one is overlay as expected nothing special we see so now it's simple and clear for a different document this will show something which is not visible in the side by side view then we align vertically so in this view line difference is vanish and we have nicely aligned lines so we see here that the position of these three is correct the position of these two is not correct clearly we see differences in these letters and then the last view is aligned so we aligned according to the dominating part of the document so correlation of these lines is computed so here we see something new which was not visible before here even within the line there are some problems with position so the word alignment is fits but the word right is shifted a little bit so these are those four views just if you are interested it was Librovis 4.3 Librovis 4.4 it was better so you see how to get back so this is 4.3 so you can see here this part is bad in 4.4 it got better so there was maybe you know somebody maybe corrected it it's clear who it's clear to somebody who did that so these positions were corrected but the line positions are still the same as before so a demo now show one such report so how do these result results look like so I click on the link open it so this is my report so here we have a few test cases these were doc documents this one is for an application this was abbi word open office 3.3 Apache 4.1 and Librovis 5.2 the reference application was word 2010 so we see here what is good what is bad so clearly abbi word performs really worse so here there are for example this insert image this is good for the remaining ones this is link to the original here right by the number there is this sign it's link to the rendered pair of images so these are the four views line distance error text so it is explained here what's there behind so for example we can have a look here how does it look like so this is such a view it is written here which kind of view it is which file this is source it means rendered by Microsoft this one rendered by Librovis 5.2 so we don't see anything bad great for example here we have another case let's see what's here so this is side by side view so there are differences but maybe not visible on first side if you look more closely this position is different so in this this one we see line spacing is perfect just these are those differences maybe we can look on these details so here clearly the only problem is one can see that the only problem is space between the number and the text in those headings so in this way we have an overview sorry I forgot to say those in blue are such results which are good in all four measures in order to somehow show which is good and which is not good so those below two where all grades below two are where all grades are below two these are in blue so we can clearly see things which are really good from those which maybe also poor so this is the report this is such report let's go further we may have lots of documents to test my batch consists of about 1600 documents which were also obtained from Adam Fine he collected them for the purpose of testing and now we have lots of documents and we see that really many of them are quite poorly graded and so it's just big mess so it's not bad to introduce some order and the idea of document relevance is to order the documents according to their complexity so how do we measure document complexity simply we extract tags documents then we get large number of documents so I took maybe 2000 documents from the internet and extracted those tags and counted the tags and then sorted the documents according and so I counted the tags and assigned it a certain numeric value which corresponds to the rank so if something has rank 1 it means that this tag is in each of the documents if there is rank 0 it's nowhere in the documents and rank 0.5 means that this tags is present in 50% of the tested documents so once we have these numbers we can grade relevance as this relative frequency of least frequently used tag in a file usually a file has lots of tags there will be definitely something which is nearly in each document but it is not important for us we cannot grade anything according to this but the least frequent tag shows usually the most complex feature which is there so therefore we we take this frequency of the least frequently used tag in a file as a measure of importance and then we sort files according to this relevance so another batch of results so thousands of documents were tested the test was between Microsoft Office 2010 and different LibreOffice versions I took versions from the bye-bye section repositories I took all those I think 6 or 7 repositories not all there are more of them but this I took this gdb utility daily until 4, 2, until 4, 3 until the last one repository took the oldest and latest version within the repository and used them for testing I performed those two round trip tests so it was quite some work there were 32,000 comparisons it took 2-3 days so not real time stuff here I have table with results if you would like to open it use LibreOffice 5.2 or more recent I think there are people who know the reason there was a bug which made opening files with hyperlinks very slow and in these there are 50,000 links in such a document a lot of and opening with say 5.0 takes several hours I think so let's see what does this do works I hope I have 50 so this is it and I have such document you see pretty long maybe I should make it smaller in order to fit here we have this is the rank or this relevance so the least can you see anything there? no somehow the screen rank of this is 0.88 so it has really very frequently used text these are the text within the documents then here we have file name then there is a measure which says if there is progression or regression if there is negative number it means there is progression so the grade decreased with comparison the grade of the last in comparison to the remaining ones if there is the number is positive so there is regression column which is the worst worst grade in the last column because it is the most recent file so we can have a look at it but I am running out of time so what can we see that even here these are simple documents say for example I will show this one I have no idea which one that is so this is this simple document so there is nothing just new lines and everything but even here we see that there are problems for example oops I touched bad but for example let's see here there are problems there are different wrapped lines about possibilities on the developers mailing list I think currently that the reason is that Librov is ignores setting of kerning in dog X documents completely simply there is this styles.xml files where kerning for the whole document is set and Librov is I think I am not quite sure because I am not really good in this area but I think that Librov is completely ignores that so maybe it is not a problem for a user but it is a problem for me because even for documents that are perfect it results in a very bad grade and it overshadows other bugs and makes my grading worse than it is so this is that so I have so let's go further so you can download everything and you can try that and if you want to talk to me I can explain it more closely so these are results for example in round trip test I see there are 300 regressions 1000 progressions nearly 1000 progressions in print the results are really poor, 700 regressions they may be really tiny it is not that regression means completely bad something it just may be a tiny little little bit worse result so we can see there are problems with simple documents this kernel handling incorrect line spacing there may be this evaluation is not perfect there are lots of problems still because somehow the render document can be bad in so many ways that these four measures simply are not able to take into account everything so the last thing is automated by section if we see in the result that there is regression we can by bisect simply we find the document and by bisection repository in the result we run by bisection script do some magic and submit patch so let's see how does it look like so again the demo it is here simply I have everything prepared in a directory of by bisection repository I have here extracted latest and oldest version in order to check if there is really regression in files I have something to test so it is this file and then I run a script just by bisect git something it will take three minutes and I don't have time for that and then I will get progress report you see here running it says here that there are 22 revisions, 11 revisions 5, 2 so just by bisection it runs automatically because git supports that I just wrote kind of wrapper and once it is finished I get here few few files it is the last good rendition last view of the last good display view of the last bed display plus some log cut this one oh sorry this one but log oh, I forgot to look at this direction so there is log, you are probably more familiar with these logs so you just get the revision where the bug is so Miklos was probably not author of the bug but he is also of the repository I think so don't blame him don't blame him now good but to summarize we can test any office application with command line we can test any format it was supposed to be here ppt and pptx not doc and docx maybe these thousand 600 documents should be somehow classified and we should perhaps maybe if you are interested I would be glad if you can use it somehow to improve LibreOffice so thank you, that was it