 Hi, I'm Tomasz Weinger. I'm from Kulabara and I will present to you improved documents searching the LibreOffice. So first, I would like to clarify what is meant by searching, improved searching. We can search for documents in different ways. First is like searching internally and searching externally. So searching internally like inside LibreOffice is when we are just traversing the internal document model and search for some string. But there's also another possibility here. This is searching externally. With searching externally, what I mean is generally that we input the documents into a search database and using the search database then we can search multiple documents for phrases. And this is generally what is this improved search with LibreOffice meant in the title. So generally, when we search externally like with the database, we have to transform the documents into text and feed it into a search engine which then searches inside this transferred text for a search result. The problem here is that we don't get a really good context of these searches when there is one result found. So we get, okay, we have in this document we found this search result, but we don't really know where exactly this search result was found and what is the context around the search result. So this is what we want to improve here. So you can search for phrases in multiple documents as I said, and for this there exists already multiple search platforms, search databases. The one we used and it's very popular is Apache Solar, which also uses Apache Tikka to transform the document into text. Like Apache Tikka is a library like Java library that can open the document and transform it into HTML or just plain text. Another one, which is also very popular is Elasticsearch, but I really don't know much about it. Mainly I export Apache Solar. What is the general idea? The idea here is to use LibreOffice and collaborate online to add this context of the search results. And we had the idea for a search result we would render an image where this search result was found inside the document. So this is now the solution description, what needs to be done to get this idea realized. We need to somehow create the search data and put it into a search and indexing platform, like into the search database. And we have to import it inside the search data platform itself. And then we have to search on the search platform and get some result. After that we have to render the image of the location of the search result. So generally first three steps are already implemented elsewhere. But the last one, I don't know if there is some solution to also render the location inside the document and show the search result to the user. Okay, so now the first step. With LibreOffice we can create the search result data and how we implemented this is to implement it as a new export format. So this means that you can also export S or save S and it will save the current document into a search indexing data format XML. The good about this approach is that we implement this as export format is that it provides a lot of things out of the box. So without just implementing it as export format we can just already use it on the command liteco with this S office convert to switch as it says here. Additionally it also we can use just LibreOfficeKit API save as function to just create this search data document. This LibreOfficeKit API is already used by Collab or Online which provides a REST service convert to so we can already use that we don't need to implement it. Okay, so next I would like to talk about the data format. The data format of the search indexing data is just a flat XML file. The idea here is why is that this flat like means it doesn't have a lot of nested elements. It's just to be very very simple so we can easily transform it to a vendor specific format like the one used for in solar for example. So on the top now we can see how one example of this search data format has like root element indexing always and then it has child elements either paragraph or object where paragraph is just one paragraph inside word inside writer. And object also not just writer it's any shape that has some text is also exported as a paragraph. Another element is also object which can be shape or image or phone work. This one is mainly so that we can provide additional metadata for the object so we can also search inside the metadata not just paragraphs. And the paragraph has an attribute. The most important attributes are index and node type and addition to this also object name. With this we can then identify for which paragraph we are searching for inside the object document model inside LibreOffice. For objects the important attributes are object type and name. Name is always uniquely identified so we can always identify inside the document each object just with the name. Then we also export other attributes. These are like additional metadata as already said old text and description of the object. So how this is implemented is that we have an indexing export and indexing node handler classes. Indexing export class just a root class for search data indexing export which then delegates everything to the indexing node handler. And indexing node handler is just a subclass of model traverser. What a traverser is just a class visitor that visits all the elements inside the document model and then delegates this to what to do with these elements to the handler. An index node handler then just writes this inside into a XML file with the structure that I already explained before. Model traverser is derived from accessibility check functionality which also needs to traverse the document model. But currently because it's not it's like it's just a copy in accessibility check is not yet using the model traverser. But the idea is that both reuse this class and maybe some other other uses can be found for this. For example there is one one uses to to search our colors document colors that the document uses could also use this model traverser. But this is something that we will implement later. So the next step is then to render image for a search result. So we now from the we now perform the search and we get a search result. The search result now has to have all these metadata additional metadata index node type. That is important for the identify which paragraph or which object it goes inside the document model. And with this information then we can we can render the result. So this process is divided into two parts. First part is that we need to get the rectangle the location where the document the search result data in the document is located. So for this we have a search result locator class that is used for this. And the search result locator can then use either XML or JSON search result data as the format. It also also uses the special structure which can which is used inside tests. When we then get this rectangle from the search result locator we can then just render the render the image with paint tile API that is already implemented inside LibreOfficeKit. So next is then to implement render image service inside Collabora online. So we can use this service for the on the web. And for this created an array service render search result which is very similar service that already exists converts to which I was mentioned previously. That is used to create search data for indexing. So what is needed for this search rest service is that we need to provide document and we need to provide the search result. Then we send this execute the service as send this both to the Collabora online server and we get back the rendered image of the search results location. So this I now explained mostly what was done on LibreOffice and Collabora online but how everything is now fitting together. This is including the database including including the Collabora online server including all the pieces so that the user can search. And for this I created a proof of concept web application which this is looks like this. I will demo it later so maybe first I will explain what this proof of concept web application does. So it's just a simple web application that demonstrates how everything should work together. This is then using Apache Solar as the search platform. The HTTP server is just Python simple HTTP server which is then using Python for also Python for server side processing also for sending for sending rest executing rest services on Collabora online and Solar. And it uses HTML and JavaScript for the client side. And the framework that uses is AngularJS. This is just something of us previously familiar with and is very strong with data binding and rest services. And bootstrap for the UI so it's easier to build how the application looks. So the last thing is of course what we need is Collabora online server so that we can render the image for the search result and also to open the document itself. So that application has like three major processes that it performs. So first one is the re-indexing process. This is just needs to fill the Solar database with search data from the documents. Maybe what I forgot to mention is web application is taking care of one folder in arbitrary folder where all the documents are stored and all these documents are then taken in account and printed as are available for opening and indexing and searching. So the trick is here that we need to re-index every time that document changes. So how it is implemented currently we always delete all indexes and re-index everything but ideally this should happen only when a document changes. Then we need to re-index and we need to only update the indexes not just remove all indexes and add all indexes. We just need to remove all indexes for our document and add new indexes for the change document. Of course if the document is deleted we need to remove the indexes from the database. So for each document how we do re-index is for each document in this document folder we request the XML search data from the Collabora online server using this convert to service. Once we get this XML file back we can then transform it to the solar format. Solar has a little bit different format for entering all the search data into the database. So it has a notion of mainly documents and fields. Document is not like a LibreOffice document but document generally corresponds to a paragraph or object and fields are then additional metadata. We also add special field file name to identify which document it handles and special field content which is then the paragraph text. And then we can submit this search data to solar using HTTP PostService. So search process. Solar has a very extended querying API and we just don't need everything here but we can use of course all these querying API if needed. So how we search with solar is we just send a simple GET HTTP request to the solar server and as a response we get a JSON document with the results or with no results depending if the database found something or not. Other formats are supported like XML but if we are dealing with a VAD application, JSON is the simplest to do. We don't need to parse it like for example XML. Yes, web app only searches the paragraph text currently so only content field is important and this is the only field that we search in. But we could also search in other fields so for example to limit only for certain type of objects, certain type of paragraphs or something like this or just additional, just for example if we want to search just one document we can search only in the file name field. When we get the results back from we need to transform the search result again for something that the LibreOffice can render the image and Libre, currently LibreOffice supports either JSON, as I said, a JSON or XML and the search result needs to be compatible with that. And also we can reuse this JSON inside the VAD application itself to show the search results. Generally it's just an array of objects which has key and value pairs for the meta button. Then of course then we have to show the results in the VAD application. Now last is rendering the image now. After we show these results on the VAD application we can then request rendering of the image for each search results. This is done asynchronously so we can show the results first and then render the image for and update the search results when the images get rendered. So we use a render search result series, we send search results and the document to the collaborate office, a collaborate online server and then we get back the image in just pure binary PNG image. And with that we can transform the image to base 64 string. This is generally done because it's easier to deal with this on the VAD application. So the demo. So this is the web app, here we have a list of documents and as first thing we need to re-index all the documents. So this here shows the status of what is going on and the current says re-indexing and now it's finished re-indexing. Now we can search, we can simply search for let's say LibreOffice, search. And now found in about LibreOffice document we found a couple of search results and this is now rendering where the search results are found in which paragraphs. These are now images inside the document. And now we can search for something more general. For example we can say web and now there are multiple results of a web that are in multiple documents. Here we see that this result is font work and it found a result this web inside font work text. If you go down you can see like this is image and it found the result inside the caption of an image. The similar like image is also a shape in font result in the caption of a shape. The following two ones are shapes and rectangles and in font the result inside the shape text. And so on we have next a couple of paragraphs in other documents. So if we are interested how this document looks like we can just click here and it will be open the document inside the collab around line. And this is the document. We see that we have here the image, we have here a shape. There is also a table which is also shown as a result and another shape. And this is the font work object that we show previously. And then I'll go back maybe search for Lorem. And we have like this Lorem documented. Lorem is found in a lot of places, a lot of generally just paragraphs. And here we search finished and it found nine results. And generally if we go now inside here and change the document we always have to re-index the document or the search result won't be found. All the changes in the documents won't be found in the search result. So this is all for me. That was my demo and thanks for watching and bye-bye.