 So let's get started. My name is Sean. I go by batch. I work with the 15 and a little bit of PDF here. And PDF rendering is my main interesting. The interesting thing about PDF is that it's really a print format as opposed to a document format. That in itself gives us interesting challenges. People are used to PDF being reliable and producible at the same time they expect to edit PDF documents. There is a conflict of interest there between these two worlds. So we'll see how that goes. We're going to go over some of the rendering challenges and make sure it's reliable. We're going to talk a little bit about the challenges and how we're going to end the rendering. And we'll see some results as we go. So here is an interesting complex PDF. And this is how it looks today. This is how you record it. You can see it's a mess. It doesn't be a good place. And this is another one. Again, it's complex. It can be scripted. It has complex images. And you can see it's multi-layered with obviously a lot of context. Positioning, photographing, alignments, I should say, in fairness, these are not represented. Most cases were refined. This is why people complain about PDF being completely broken. But these are some of the more complex cases that are broken in ways that are not acceptable. They're essentially non-functional. So for those users who have a little kind of complex script, Asian script, complex documents that have charts and text that they can bring over, for them, this is really important. For customers who need this kind of functionality in their office suite, they can migrate to their office in a different manner. So we need to have this. This is unreadable. It is unreadable because the script that is used here is not actually rendered from false. The glyphs themselves happen to be bedded in the PDF. So the PDF, when it was generated, was generated in a... I would say full proof or a safe way so that it would be portable. But unfortunately, many of the viewers, of A&M, who were of choice that we have in the office give us this. And this, of course, is an exception. So the challenges that we have is that PDF is not really an editable document format. And we want to both render it accurately and at the same time be able to edit it. This in itself is quite challenging because the format of PDF is really designed to make sure that everything that you put on the page comes out exactly where you wanted it meaning it doesn't really understand what the text is in the same way that writer understands text. It only understands that there should be a certain character rendered with certain properties at the certain XY coordinates of the page. So the character next to it doesn't need to belong to the first character even though visually we understand that these two characters have the same word or maybe aren't because the space is also encoded, necessarily. And I will show you an example of that. When you try to edit text that is really just graphic symbols in the PDF world that becomes very challenging because you have lost pretty much all the context. So one of the challenges is when you extract the text from the PDF it comes out as individual characters in many cases and we need to do some extra processing to figure out that this is one word that there is a space in another word and so on until the new model ends in paragraph. So here as you can see this is how it's supposed to be rendered. This is the text as PDF intended to be shown and this is how you actually get it displayed if you simply took every element in the PDF and tried to render it as supposed position. These are individual elements they have nothing to do with one another in fact they don't end there as a sentence or even as words. These are just quoting characters. This is a major, major problem and PDF does help us quite a bit there although again we'll talk about some of the things that are missing since this is work in progress so we already completely done that. So why are we not really able to use popular to solve these problems? Some of the main problems that we have to do with core support popular is missing for many of the complex things that we really need like as I showed the case of the agent in the script that is included in the list is completely missing and under this license in popular as some of you will know popular is included in the real office an out-of-the-process wrapper because we cannot integrate it and that has a huge overhead at least in performance also the agent is overhead for those who need to debug some of the problems and the accuracy of the end result is also very high and it does have a lot of cases it's really not well maintained so it's something of a dead end considering all these problems together so what are we looking for? This is what we're looking for something that first of all gives us a view on the results because in many cases the user really doesn't want to add at all and I have mentioned that people typically expect to be able to add it and in the document they own unless it's protected and PDA is one of those but in practice often that's not the case often we want to view the document and probably not do much else besides in those cases what we want is we want very fast and very accurate revenue you don't want to go overhead trying to get the PDF imported into the literary office world so you want something that will really save all the overhead to create all the thousands and thousands of individual elements that the PDF represents you want something that revenue is fast and accurate and shows on the screen with low memory and CPU consumption and you also want to inaccuracies that come with the editing work for meaning since you're only showing on the screen you can't actually print it into a rendering PDF into an internet just show that on the screen and not worry about creating editable text boxes and editable shapes and layer things almost that right so this is actually something that people can play with by namely the environment area just to get to those media tools because that function is already available and people can get back to that PDF here on the other hand is giving us solid alternatives and here first of all it is significantly faster if nothing it doesn't have to go through by wrapper it is rendering directed to an image and that is happening with higher accuracy and speed it gives us something that is not necessarily difficult to recreate but it's very good that it is used already by others and it seems to be working fine and that is the ability to give us a text stream a UTF stream out of whatever shape the PDF actually stores or stores the text in so it has characteristics to try and figure out where the word race will go depending on the positioning of the characters it will be able to figure out the currently which is the delta with the y-axis delta with the characters and it will be able to deduce from that this is actually a new first character in a new word and not just the last character of the word that I have in it is a kind of trick on the on the other axis so I mentioned I said y-axis I mean x-axis and y-axis I mean to detect the new lines so the app is already quite a problem it is the problem viewer so it has a very, very wide distribution and it is very easy it is very easy it is very easy and it has an incredible license that we can integrate with our code and that gives us a very solid path to resolve all our goals and in the future we will have editing ability which will also make some progress so as I have briefly mentioned so in a little bit more detail we mentioned the immediate goals and these are to be able to render very quickly, very accurately the PDF pages and show them in online by default as images to the user without editing and this satisfies quite a lot of use cases where a PDF is a PDF that we are able to browse and view and share it they are not interested in editing just yet but if we can do that with very high accuracy we don't get random symbols and we get very beautiful layer and rasterization of the PDF that satisfies our user and also is significantly faster when it does this rendering because it isn't recreating all the high-quality objects and layers and editable shapes in memory so it does end up actually being faster if not necessarily with a smaller footprint in memory and I will show some numbers even though there was no goal as such to reduce the memory footprint you will see that there is an interesting story that I will share towards the end of the talk and tomorrow legendary in this whole I will be talking more about optimization where we target both loading and memory saving directly and this is part of that story as well now doesn't come with many of the shortcomings of the talk but we also want to go to phase 2 phase 2 is editing the PDF and for that we have chosen a smooth transition for the user so the user will go to document the PDF into the graphics and we will see the memory of the pages when they choose to edit the page they right click on the image and using the rate command they are able to essentially rate the image to its editable components and that essentially gives the user the ability to go between the image and the editable in the future when we have a much more editable functionality we might choose to make it editable or we might actually make it an configurable option that the user can choose at least on the desktop potentially for the implementation we have gone through multiple stages internally to transition from popular to PDFing for the image rendering where we did that is first of all we already had the ability to import the PDF and that only rendered the first page we've extended that to make it more flexible and we use Google in terms of API we've also done significant amount of work to try and extend the PDF API so the story there is that PDF there doesn't seem to have for manager requirements to support editing as such as long as it is rendering correctly it seems that people are happy as users however what we need is we need API to extract all the PDF details and information so that we can use that for editing and in many cases it didn't export those so internally those are the details for example for the scaling of the test the object there is matrix transformations that happen and we needed those we needed the transformation matrices and we needed the scaling and other properties of each of the individual objects and to achieve that we added new API at least to my last count and most of these are already upstream into the PDF project so this is a good story because now we don't have a very large patch and what we are doing is we are essentially simply including the upstream latest PDF and most of these API are no longer that only we use we also created a new reporting piece of logic that is the PDF filter and to help the user break it through at the moment we had to store the original PDF and that is stored as a link with every image in the page in a shared way so there is one copy of the PDF within the old PDF format that is used when you need to break the PDF either within the pre-office or externally and with the external tool that is also accessible this is the missing bit so there is more work to be done as I said this is working for RetroSync it's a large piece of effort and we are still missing many of the necessary dependencies to get the editing fully functioning some of these include features such as more complex edges and lines or any shape rendering details but we are also missing some more test cases now that we have both author and PDF we don't have the recent coverage so we really need to our testing work there another concern is that Hunter has been around for many years now and it has been testing on a large number of documents PDF documents PDF here makes a new code that we have that is not well tested more than 200 PDFs within the office so we need a better baking time so that we are really sure that we would have a regular document to transition from author to PDF so with that I want to share with you some of what we find interesting in the corner we have the old rendering that is happening with the author and the original background is the actual rendering in the image directly within the liberal office so these three choices in the liberal office directly are very recent these are of course enabled with the environment that I mentioned we won't get them by default because we don't want to create functionality for everyone so you can see subtle but critical differences popular is unable for example in this case to correctly render the background of that boss and that results in just a readable text so it's really not very helpful even though it is something that you think is probably important and this is another one in very very subtle you notice the difference yes anyone noticing the difference there is an image of a house it's closely behind the text not in front of it it's very subtle but it's not useful not useful at all for the engine this is another one it's a complex math meter it happens to me very similar to what you would expect but you can see it's a group of many of the complex parts of the math another case in not a very bad case but you can see that the text is very disorganized and that involves a lot of problems there is another one quickly you can see it should have bounds where PDF essentially describes the limits of the the vice of which the graph is rendered and has a title that even though there are coordinates for the lines and everything you're not supposed to render but the popular seems to be unaware of those limits so it ends up drawing that let's continue remember from the first slide and you can see that once you render it properly it's okay it's not recognizable that this is pretty pretty impressive now for the memory the story here is different in a couple of minutes so first of all these are the first three have complex the fourth one has a lot of images in it CSS is a very complex demo of different things that you can do in CSS which is used on the web so there is a lot of complex formatting of text and other elements and the last one is the math document that you've seen so it has a lot of text but also a lot of math is simple so you can see that in JRED it's with PDF image rendering you can see that in a couple of cases it's difficult to hire but in other cases you see that the memory actually went down so what was your the answer is that when you render an image you essentially have a fixed number of fixes that you are rendering you're essentially saying in regards to how many different things the document has you're going to render everything into x and y resolution images and then you store that in memory as a PNG now what that means is that if the document has something extremely complex and you render it the fixed resolution that you've chosen it is going to end up taking about a fixed amount of memory you cannot consume more than the size of the image of memory that is your worst-case scenario so you've essentially capped your maximum memory consumption by rendering the images however the other side of the point is that if you have very sparse elements elements in the text that describing them in library office native objects they would take much less memory so with that point you can see what is happening with the when you're rendering the text the text was more compact and important once you render them you're actually consuming more memory than you should but in other case where you already have very complex structure that is really consuming a lot of memory when you try to recreate it in the library office so by rendering to an image you are essentially limiting the maximum footprint and at least if this is representative of a typical collection of different types of documents you'll see that the average is around 100% in this case it's 97% of the original memory that you're consuming with the actual image rendering meaning that here there is no major loss unless you happen to have only a single type of document or a majority of the document or a single type of document that either wins significantly or loses significantly on the memory front so we didn't have any requirements to improve the memory footprint but we also didn't want to make the situation much worse so with that we have a couple of minutes and I'm having state questions