 Welcome, all of you, a little update on some work that we did around PDF export and improving that. So there is actually customer work here funded by Blackboard and BRZ, and it's a number of nice things just incidentally, coincidentally coming together, so I thought I'd go and talk about that. Quick recap of the history there. Initial work was from Sun, I think around 2003, implemented an OpenOffice 1.1 that was already fairly complete, and at that time a really unique feature with stuff like hyper links, not only external hyper links, but also links cross-reference inside the documents itself, working some initial form support and other stuff, and was supporting up to version PDF 1.5, and some nice things like shrinking images when you knew it was a massive JPEG that you were embedding and scaling small on the writer page, you could reduce that while exporting. That was funded by Sun, and that was community work during OpenOffice times, most notably from Backpack, Giuseppe Castagnu, most of that landed into 4, and that was the initial implementation of PDF A, which is the archival standard for PDF that you are required to use under multiple jurisdictions if you want to properly archive documents and have it, like when you get an audit, have it recognized as the proper document, and it cuts down in a number of features, usually with the intention just to make sure that 20 years down the road when you don't have the funds available, if you don't have the original system, you are doing that available and any more, you can still faithfully render the document. And tons of fixes there, really I was looking at that at that time, there was some 203 branches or CWS, as it was called back in the day, quite some work now, and I don't know if that was purely volunteer or whether it was some funding from the Italian government or from somebody else, maybe people from the audience might know that. Then comes Liberoffice work with some great stuff around PDF signing, which is really useful for an electronic workflow that you export your document, you write your document in Liberoffice, you certainly do not want an extra signing step with a separate application, but you want it straight there, sign it right as it comes out of writer so you can be sure that there's no interference with the document and it's also much nicer and much more usable. That was implemented by Gokshin Araslan as a GISO project, it was an experimental feature for a few years and then Tor finished that up, got it into a non-experimental state for Liberoffice 4.4 if I'm not mistaken. And there was a number of fixes mostly I think from Miklosh, but perhaps also Tor was fixing there between 4.0 and 5.4 that was around this PDF-ium being able to stick PDFs inside the PDFs and also with timestamps like this whole signing thing and then getting hash support up to par with SHA 256 like elsewhere. And then there was, at least from looking at the commit history, then there was again a bit of a hiatus and nothing much happening and then came around a customer. So actually the trigger for the PDF-A update was that there was a customer who wanted to export a longer writer document and needed the PDF-A option and unfortunately that writer document had a transparent image strategically placed in a frame so that suddenly the entire writer page would have been rendered as a bitmap and that was really blowing up the document and also reducing the quality and the only way to fix that really was to bump PDF-A support to version 2 which includes transparency. So and that was kind of triggering that part and then there was another customer who really needed better accessibility support because it was a requirement from their customers that whatever they're producing needed to be accessible by law. So for enabling that we needed, that was already working mostly working in writer just impress was a bit of a mess. Right, so there's a meter bug that I have linked for you. The slides will be up on the conference page. At least I will mail them out tonight and then whenever that page comes up it will be linked from there. That has something like, I don't know, 100-tish bugs linked from it but it's not really all of that I would think is really the actual export. There's also lots of things that are not as nice as they could be or upper layers in the application are not supporting a feature properly that PDF could use but doesn't yet. Right, so yeah PDF-A2 has a number of other upsides. Right now the only thing that Liberal Office makes use of is it enables transparencies or stops disabling that and emulating that by putting it all in one large bitmap. In theory we could also support JPEG 2000 image compression which just gets JPEGs really much smaller. Layers, better open type phones, oh yeah and probably pretty interesting but it's not implemented so that probably needs a bit of review and validation and work which I didn't get to pay this so you can use the newer encryption and signature features with the A2. Okay, so how is that accessibility done in PDF? This is just largely about tagging so you have your graphical content and then you have a different stream in the PDF that says oh this object there with a bounding box over here that is an image or there's a paragraph and that is on indent enumeration level such and such or further annotation or for text you usually get a UTF-8 like a unicode character stream that is not in random graphical order but that's in reading order so you can actually extract easily extract the text and put it give it to the screen reader and it also has positions there so when the user says oh I can't quite figure out what this is so pointing there and then you get some text or get it to the clipboard that's handled by a tagging. That's the original PDF ISO standard, the 2008 version section 49 that has the, I think it's a type which would be 48 I think, that has all the details there and as I said it was available in writer so the engine or the underlying technology also in the VCL layer to write that out it was already there, just the information from the application that was only implemented for writer and in impress it was just not being passed down that this very paragraph that you have here is an enumeration or that this image here is a foreground image and not some master page background. So that work is sponsored by Blackbird board, thanks for that wouldn't have been possible without. There's a very old bug there 3, 4, 1, 3, 5 were for impress that the problem is that images do not get any description so you can set the description on an image and in the UI it just doesn't get exported so it's useful for the screen reader in LibreOffice on your desktop but it's just not being passed down to PDF that was also working for writer just not for impress. Same story which is not in this bug but that you can label, you can say well this is a foreground image so this is obviously interesting to a user with an impaired vision so you put this screen reader kind of highlights that this is an object in the foreground just this figure tag and then there's background stuff that's usually not important like a background image or some decorative content that gets the artifact tag and it's usually not that's kind of not really accessible for an on-screen reader. And the second point there was no upstream bug for that just still was broken and missing that there was no bullet list and enumeration tagging so the only thing you were getting in the screen reader was a random list of paragraphs and with no relation to each other with no hierarchy with no this is this is the third the fourth the fifth line and an enumeration and this is now also fixed so that you get this L which is it's very so when you look in the PDF unpacked stream it's very self-describing so you have this L for list then you have this LA for list item and then you have the LBL for list label and it has the one or the bullet character or some icon and you have the list body which is actually the text behind that and a proper screen reader with PDF support nicely displays that and same story um yeah for like this works exactly the same both for bullets and enumerated lists. There was a feature branch for that with some plumbing done by Armin and the drawing layer because that was actually the place where the the impress and also the draw text gets prepared and then passed down into a meter file with some extra meter data that then the PDF export in VCL unpacks again and writes out it's a little bit involved but once we got we got that place and got it fixed and and the the data properly handed down it was well it was it was still quite some some effort to get it working for all cases but once that initial understanding was there it wasn't that hard anymore and on top of that for commits fixing up after the fact so um second part is fixing validation issues that was um the other customer with the pdf a so for for that I really wanted to make sure that what we're producing there is valid pdf so after I updated or cranked the or incremented the pdf a version number I thought well let's just run it through some validator there's vera pdf which is open source european union funded validator written in java which is fairly complete with the stuff that it's checking even goes and in part it goes beyond the standard or interprets the standard in a certain way so but it's very nice to have and it's kind of the the fallback or the the the the standard um when you um but what everybody can can can possibly agree on as a validator um and at what did find a number of um problems when I was running that over the crash testing document corpus like just exporting all of those with the pdf a to setting exporting all of that uh to pdf and then running vera pdf um on top of that and it actually triggered or I found a bug in vera pdf um so it was giving me this fail 618 dash 1 uh validation error and I was checking the pdf and it looked pretty okay to me and then after some head scratching and debugging figured out that it was just a bug um and or an unfortunate way in java not being able to properly parse utf-8 strings out of a byte buffer so we could ignore that I have a kind of hack fixed there on disk um it's not but like in theory at least I could file a bug with vera um but it's not like in a state that I would be happy to um give them that patch it fixes the validation issue but it's not very very generic uh it also found some real issues um so um most important that that was causing the most work then and and the code was that um there was a number of funds used that were not embedded which is a requirement obviously for pdf a and it was all almost all of that was triggered by by form export um form export is a bit of a special case um but the default setting and liberal office for pdf a has that enabled um and it's actually permitted at least to a limited extent in pdf a so it's not something that we could ignore uh because unsuspecting users would just have a document with forms and then bonk on the button and get a pdf a that would actually be not not valid so that was um you know just a bit of legwork there to fix that and make sure that even for for check boxes which had this this nice check mark uh as a glyph used and using the the the sub dingbats fund for that and there was just no uh nothing there in the export that was embedding that fund because it was one of the 12 standard funds and then of course if you don't do pdf a you don't need to embed that and it's also quite quite a bit of overkill to embed a fund just for this this bloody uh one single glyph there um and the same story for the radio button um there was also the the circle for the um for that no no it was the it was actually the bullet it was a bullet character used for the for this little um enabled button thing um and and so i just um um i could remove that there was no need to embed any fund um because there was um there's standard functionality available and um um and in the in the pdf form uh implementation just for the checkbox i had to um go to the effort and use open symbol there and just embed that that glyph when that when a case arises uh and some problems with the uh with the fund sub setter kind of um that's really quite interesting quote there um with quite a number of to-dos um and i just solved one problem there that there was some there was a problem with passing order like you were writing something out and then the information that actually you could put at that place you would only be getting later while you were writing um other stuff so it was just going back to that place putting in the um the glyph but there so it was coherent with the rest of the sub setter fund um and the uh yeah some color space uh and some annotation uh dictionary that was also like that was comparatively easy just changing the order of how things were written um and some um yeah like really detailed like and i don't really know if that was very pdf being being extra extra uh annoying or whether that's really um that i think that's a question of interpreting the the standard there so that you don't you only reference stuff and you don't like put it in there at the object but you just put a reference to another object and anyway but that's all fixed um so there was also a bug about pdf a valley so most of that also applies to pdf a um so that was yeah you have been writing invalid um pdf a since 2007 or eight um so that also fixes that bug gets rid of a bit of that code gets rid of a workaround for some old open office bug um yeah so in the end um the uh pdf a support was rather trivial so that was pretty much just um incrementing a number in the uh document metadata with a bit of extra information but it was just boilerplate so um and but that was like i did that of course i did that there was the first thing i did and i thought well it was an easy fix and then i discovered all the validation errors which were the actual work um for that and of course that um now there's the like um there's lots of if statements in the uh exporter that check can i use this feature or can i not and for transparency in this particular case um for pdf a2 um i enable that so now it gets uh used so um the transparent bitmap gets exported and not the emulation gets triggered and the customer was happy um yeah as i said there's this meter bug with a real load of extra issues if you thought for severity and then there's a um memory usage problem which might be something that qa could have a look at whether perhaps that's actually this transparency emulation and with pdf a2 that would go away um yeah and some crashes and quite the number of features that are missing things that we could do better uh the most interesting probably um as a feature is pdf a3 support uh which has um among others so the really i mean beyond this like checking the box hey we can do in pdf a3 which is just incrementing the bloody number and you're not getting anything extra except for that there's one thing that's interesting there which is you can there's now a way to do this hybrid pdf like this embedding the source document um in pdf um as a proper like standard uh mandated way uh which um would actually fix another validation error because if you use hybrid pdf uh it becomes immediately invalid um so and when you do this is this there's some source tag or something now that you can use and it would then um uh also for example in ocular show that there's an uh there's an attachment and that's the the actual source for the document so that's an obvious um obvious improvement that we can now relatively easily do and getting rid of this kind of hack from from the early days um with the hybrid um which was just there was nothing there like that philip just came up with that hacked it in and then later people realized um uh on the on the pdf community that it's actually useful and then the standardized it but differently differently um another thing we could consider given that we are doing that in in other places that we do run uh validators on everything that um liberal office exports you could possibly do the same uh for pdf as i suspect the embarrassment of writing uh invalid pdf is about the same as writing invalid odf um and with very pdf there's an open source tool there we already uh have a java dependency for the odf validator so that wouldn't really add uh any any any extra it just would slow things down obviously um for for any unit test or for the crash testing but that could be something that could be run on a on a dedicated host once a week like the crash testing just to see that we're not getting not not accidentally triggering some regressions down all righty so um there was all i had to tell you so we have some minutes for questions i guess like five or six so i'm sure you have lots of questions don't fall back okay no questions thanks anybody yeah question originating from the french forum i recall a guy asking for asking a question regarding the export of a presentation to pdf format and he was wondering if there was any kind of documentation uh detailing if any transitions or going through the export uh yes there is i don't so i'm not sure that still works but there's in any case there is a a subset of slide transitions that are in pdf and that at some stage were used so when you when you use the right slide transition and then did an export to pdf from impress you will be getting that if you use the wrong slide transition either you got a fallback or no transition but i'm not i have just we should probably test that but you don't know where that where that would be documented apart from the specification of the it should be part of the office documentation or some spec somewhere no uh yes so if it still works and it's not documented it should be you have no idea where so that's probably a question to bring up with uh olivier i i'm not really up to date with that thank you but i mean there's i mean there's a number i mean there's this there is so somebody who would be using pdf for presentation um there was actually quite some nice things that you could do on top which is also emulating like this latex beamer for example does that emulating enumerated list fly in by just outputting more slides with the transition effect there so for simple content and for like just one one effect per per slide that could be done it will be a bit of a hack but that's just i mean that's the same like like latex beamer does it and with that you would get um for simple presentations who will get almost perfect um output there just a question you might notice that you talk about the accessibility of pdf is your customer asking other modification for accessibility um could you rephrase i'm didn't quite get the possibility for did they ask you to change other things that than pdf for example the product or to make it more accessible for pdf you mean if there's any pending bugs for pdf accessibility for accessibility in terms not only pdf did they ask you to do things with an accessibility was there a pdf uh no okay thank you okay if that's no further questions three two one thanks so much keep enjoying the conference