 Hello, my name is Jaume Pugente, I'm a software engineer, I've been working for the past half year at Colabora and my talk for this LibreOffice conference is about how LibreOffice handles opening PDF files. During this talk we will see different approaches and situations, including the way PDF files are currently opened in draw, an alternative way to open PDFs used by LibreOfficeKit using PDFium, and the specific use case of inserting them as images into other LibreOffice documents. Right now when a user opens a PDF in LibreOffice, the document is loaded in draw as a graphical document. The current approach tries to translate the different types of elements that a PDF file might contain into their respective types of shapes, shape being the building block of a graphical document in draw. This translation gives us an editable representation of the opened PDF file. But since PDF is a format concerned primarily with layout, it has no structural concepts like paragraphs and such. So a plot of text might be imported as a single text frame or as a collection of small text frames with arbitrary divisions in between. This is an inevitable consequence of this approach, since when loading the document, draw cannot differentiate between a set of text, of a set of closed text that makes sense to an end user like a form or bigger text units divided into smaller fragments. If we look at the example on this slide, we see that the title of the section has been imported as a single text frame as expected, but the paragraph that follows it has been divided into almost different 20 text frames, some of which contain only a couple of words, which is not what the user would expect. Other kinds of PDF elements like lines, polygons, specie curves, images and such, have their equivalent representation in shapes, but as in the text case, no information about the relationship between these elements is imported. If we look at this example, this chart is stored in the PDF as a collection of lines, polygons, text and a broken image. All these elements are properly imported to their respective shapes, so the chart is shown faithfully, but this subdivision into shape elements might not be what the user looking at this chart expected. For example, the four blue bars are represented using a single polygon shape, the same goes for the red and the red ones. This is a fine representation if the only intent of it to be showed as the user, but it's a confusing way to represent an editable graphic, which exemplifies the one my problem with is approach, and that is the user has the ability to edit all the parts of the document, but that anything can be highly confusing and frustrating. Some edits can work fine, but others, even very small ones, might require the user to redo whole sections of the documents used to make the small edit. As an example, in the first text shown, since the title was a single text frame, it is possible to edit it easily while keeping the document same, but any change in the paragraph greater than changing a letter for another will produce ugly results, be it overlapping text if it's adding text or a noticeable gap for removing or shortening sentences. The only way to truly edit this paragraph is to erase all its component text frames and replace it with a single text frame as it should have been from the beginning. Another problem with loading PDF documents like this is that draw shapes do not always use all the information that their original PDF elements contain. Here we can see a couple of examples. On the left, there is a PDF that contains an embedded font which the shapes cannot use, so as in any other use, LibreOffice use case, the font is substituted for whatever LibreOffice considers to be the best approximation. On the right, we have a highlight annotation where the highlight is imported as a white yellow rectangle which loses all the information about its associated text. So changing the text will not change the notation, and vice versa, notation is just a floating rectangle that can be moved in the dependency of the text which is not the case on the original document. An alternative way to load PDF files in LibreOffice is implemented using PDFium. PDFium is an open source PDF rendering engine. It's maintained by Google, and it is the engine used by the Chromium browser to display PDF files natively. The characteristics of this engine that are useful to us for the hundreds of PDFs is the fact that it can generate a bitmap representation of a given PDF file and that it also provides an API to consult elements contained inside the PDF file, their values, the relationship of each other, and so on. So, who is PDFium used in LibreOffice? In DRAW, it is marked as an experimental function. It is controlled with an employment variable, a low import use PDFium, and by default this option is disabled. To use it, it has been turned to DRAW. On the other hand, in LibreOffice kit, the option to use PDFium is always enabled. So, services that use LibreOffice kit, like Call over Online, will always import PDFs into DRAW using PDFium. Finally, when inserting a PDF as an image into another document, PDFium will always be used independent of the employment variable, or if it's LibreOffice kit or not. So, how does it look to open a PDF in DRAW with PDFium? Each page is imported as a single bitmap image that has a single bitmap image. This removes all the ability of editing the contents of the open PDF file, but also removes all the confusing around importing and translating the PDF elements into their shapes code parts. Also, since this approach uses the engine to directly render the PDF, there is no loss, there is less loss of information. For example, it correctly uses the fonts embedded or partially embedded in the PDF and their corresponding styles. In this image, we have the example of the text scene before imported with PDFium, where each DRAW page has a single shape that is a bitmap image of a PDF. And the text is shown as would be seen in any other PDF reader since it uses the original font contained in the PDF. Despite all that, the compatibility of PDFium with LibreOffice is still incomplete. For example, PDF annotations are only partially supported. Some work fine, like the highlights shown before, can be imported into DRAW with PDFium and looks as one would expect, but others still require some better support that are only partially supported. Here we have a couple of examples. On the left, a couple of annotations have been added to PDF using Ocula. A floating annotation that is marked with this sticky note image and a line annotation that has an arrowhead decoration. The floating annotation is imported into DRAW as the standard LibreOffice document annotation and so it uses the author initials as a mark instead of the image included in the PDF. And the line is imported as a line, losing the arrowhead decoration information. Finally, I'll talk about the use case of inserting PDFs as images into other LibreOffice documents. While opening a PDF file has advantages and disadvantages in each of the approaches we've seen, inserting a PDF as an image is the ideal use case for PDFium. Since the objective of the interaction is to create an image, PDFium offers exactly what we need. In a situation like this, users won't expect the ability to edit the documents since the workflow explicitly states that it is being used as an image. This means that with PDFium we've obtained better quality and better fidelity as we've seen without having any of the drawbacks of not having editability. Another advantage of PDFium, specific to the use case of inserting PDFs in other documents, shows when exporting that document to PDF. When exporting the parent document to PDF, the simple approach would be to include the image as such into the exported document, which would be the only reasonable approach if we're using the imported as draw shapes. But since we have the original PDF and PDFium happy to see manipulate the contents of the PDF, we can insert the elements of the original PDF into the new generated PDF as cross-objects and cross-object reference. Not only let's us preserve the things that PDFium gave us, like proper fonts and styles, but also preserve more utility for future uses of the document. Instead of storing an image, it would store the original PDF elements that were used to generate the images, like pictographics, or even a same image but with higher resolution than the one being used to show to the user. This has been my talk for this conference. I hope everything was clear and thank you for listening.