 So yeah, welcome to a little talk about details of ODF, which is all about how to save, transport, serialize, and then read again changes, like for change tracking, like changes to a document. I thought I did. Okay, so this is a GSOC project. It's been sponsored by Google. That's the cast, the student Rosemarie. For a variety of reasons, unfortunately, the project did not fully finish, but it's in a state where the export part of that feature works reasonably well so that we can be move that here. The import part is still a bit lacking, but this talks mostly about the general concept, why it's useful, and also showing that it's implementable, how it's implemented, and what's already working. So yeah, thanks for Google for sponsoring that. There's a link to the feature branch, to the code that's in the repo. Yeah, and with that I hand over to Svante for some pet talk. Thank you. So change tracking is just, or this way of doing change tracking is just one incarnation of a very new feature. And I'm hijacking this session a little bit to explain the very new concept that is very, very important. So I claim that documents are not sufficient alone to have an interoperable collaboration, yes? So in the following slides, I'm going to show you what is missing. This is by the way Titan, the god that is Atlas, it's his name, who was condemned to carry the world, the sky, and so documents are no longer to apply to all things. They are coming from a different time, and there has to be some change in perspective. So what is missing? And when asked to a customer what is missing, then the customer usually says, this is a quote that is uncertain if it's very true from Henry Ford, the inventor of cars. But if you ask, if you would have asked the customer, what do they want at that time that only carriages? They ask for faster horses. But you notice that the customer is very, in general, very linear thinking. So faster horses, in terms of collaboration from the ancient times when we did collaboration, when I did collaboration in high school, we had a floppy disk. So we have now faster floppy disks, it works, check. We have faster floppy disks, we have now email, we can attach files to it, and we have only now even cloud space, we have shared directories, which is even more faster. It cannot get any faster, I believe, right? But we still have a very, very huge problem in per se not solved. And that is you can only edit one at a time. There can only be one editor, right? It's just like this old movie, there can only be one. And you have to somehow have a pessimistic lock for the, the only one can be there, yes? So there might be, when there's a disk, only one can have the disk, and please everybody save to the same file. We don't want to have any problem later on, yes? Having, end up with, with multiple files. And that is exactly what we have. We have a merge problem. That is the basic thing that is still lurking around and that is not being addressed so long. So there are two use cases. The upper one is from the customer, and the second one is from the software vendor. So the upper one is that most of you as a user felt, like if I have this presentation or any document on this floppy disk and I hand it around, I send it by mail, and you're starting to, to check it, annotate it or edit it and give it back to me, then I'm having all these bunch of floppy disks on my table and asking myself, what is the difference to those, to my original? Because I want to, in the end, I want to have one file in the end. So I have to merge back all the changes back to a single one. And that is exactly the key questions. What has been changed, yes, especially when I'm, for instance, working with a lawyer. Lawyers used to have different market-leading office application, so they don't support change tracking and ODF at the moment. And so I have no way to understand what has been changed. If I'm now switching to a different format, then I'm still only able to have, it's called, what I have, change tracking. You have to enable it explicitly and what happens if they forgot at certain times to enable the change tracking and important things were going under the radar, yes. What I usually do is I print it out and hold it against the light and the sun to see, oh, is there anything that has been changed, right? But this is, this should not be the case in general, just not scale, okay? So, just a second. So the problem is, how can we guarantee how we merge it, that's a common user case. And the software cases, at this time where the internet is all, you are asking yourself, how can I collaborate instantly with the others, right? In an interoperable way. And the problem is, the only thing that is standardized is the file format, the complete document. But it does not scale if we send the file format around in real-time collaboration, the longer the file there gets, the larger, the longer it takes to pass. So, we have to do, it's obvious, to send only a change, but we don't know what a change is. And change tracking usually happen, tracking changes, but you know what, don't know what it is. It's funny to track something that you're not aware of what it is. It's just simply, oh, there was a change, I do it in a black box, yes. And if you reject this change, I put it back. And this is what we try to change. So, what are the requirements of a merge? And a merge, it's similar to software, it's very, very similar, and they're parallel. So, you have to know what's the difference. You find out the diff, what's the call. You have to, when you wanna merge together, you have to know how you get from this document A to document B, or to the version AB, right? What are the steps? And I would like to decide on each step, do I like to take it over or not? And the last thing, of course, which is lacking the most is interoperability. There are a lot of solutions like Google Docs, Office 365, or LibreOffice Online, but they cannot collaborate together. There's no interoperability. I claim that every single, single one of these, sorry, every single one of these collaboration solutions are some kind of lock-in. There's, you cannot exchange or combine them, right? Either with other editors, nor with business software, which is a large problem here. So, I'll give you an example of how does it work, what we are doing. So, we have, we standardize the changes of a user. And what we are trying to do with the ODF changes, we look what other common changes the user has. And we came from a problem of real-time correlation, where there was a web browser, and at that time it was a star office. How can we switch changes that both of them can put map to their model? And the best thing was to go on a semantic logical level to say what the user is changing, what every office application is doing, like inserting paragraphs, using text, formatting, selection. So, and the nice thing on this high level, logical level is that there's so little noise when you compare them, yes? In XML, when you make a comparison of this, there can be a lot of differences without being something different at all in the logical sense. For instance, sometimes the document is in one line, sometimes there's a different indent, sometimes XML might have a different prefix, and so on and so on. There are a lot of things that can change, but are just noise that keep you from the real difference. So, we have this high level abstraction, and what you said now is the logical thing. Another benefit is that different application used to have the same modernization on this logical aspect, so it's very easy to integrate. And the last thing that's very important is it's quite often atomic change. There's no influence. We have on this, do we have a pointer by the way? On the right side, we have a stack of changes. So first, we add a paragraph, and in the second three lines, we have different parts of steps. You go from the start of the beginning, and it's just like you're reading a batch script, or it's like a stack of cards, and so on the left, you see the ODT document. What we're doing here is simply we're just going, we're just inserting the heading here now. And the other things, like the look, if I can do it. You don't have a mouse. Okay, we don't have a mouse, okay, sorry. But we have now several items you can simply count through, like the one to nine is a paragraph, second paragraph, these list item one is the third paragraph, item two to the fourth, and the image to the fifth paragraph. So you can easily look at this and point to them by a convention that we are building up at the level. And the nice thing, it works for other formats as well. So on the right side, you see how it's in JSON. JSON is a serialized text format that is quite often used in the web, and as we used it for in the web office, it was the state of the art. And in this specification, we wanna standardize the JSON and the XML, which is being used to start in the zip. Tostan, would you like to show the example of how it can look like in the XML? Yes, we'd love to. So this kind of, this was in the abstract, and now let's see if I can get some demo going. So this here is, it's a very simple document, and where you can see what happened here is a simple change, like this second paragraph got deleted. So this is the development build from this feature branch that contains the GSOC changes. And if you now go and look at the XML, so just, so how this is implemented, this is a, is that visible reasonably? If not, just come closer. So the XML, the YOLI app container is a zip archive, as you know, so that's the number of XML files, content XML, settings XML, styles XML, with the prototype implementation, there's another file called undo XML, that contains reverse changes. So the final state of the document is always in content XML, so whatever you use as a consumer application, it will always show the last state of the document. And then backwards from this last state, there's an undo, let's set of undo actions, conceptually stored in this XML file to go back in time. And that's what currently is serialized with this prototype, with a simple change, like deleting one paragraph. So let me highlight that, this is Office Undo, then this is Office Change with, oops. So the problem is, I'm not gonna bore you with the details of how I hate NVIDIA Optimus and how it doesn't work with Linux, but anyway, I hope it's still visible here. So there's a text insertion, now when I go like this, I'll clear this up, with a position, so let's say a relative positioning in the, not in the XML tree, but in the conceptual document object model. So it's the, just want to help me out, it's the second paragraph from the top, and what's the one? And the one is the first character within the child of the second paragraph, and this is the meant that because it's of a type text, you're inserting just the text that we read here. Right, so the final state of the document is without the second paragraph. I mean, it's still, if you look at the, if you look at this, it's still there, but it's crossed out, so if I then go for track changes, and I don't display them, then you see it's gone, so the final content XML does not contain that paragraph, so to go back in time, I have to insert it again, and that's what this markup does. So it says insert, this is the second paragraph at paragraph position, two, character position one, so you can go back in time. Compare that with the previous, or with the current ODF markup for that, which is inline, so this is content XML, and all the change tracking information is inline, so it's interleaved with the actual document content, which is okay, it's just, it has a number of drawbacks, one of them, that every consuming application needs to implement that and skip that, and secondly, it's, for example, if you want to sign a document, and then you want to edit, or make some amendments, you cannot, because you would break the signature, but what you can do with the separate change tracking is to just edit on top and have a second signature signing off just on those changes, and that makes a number of things very, very easy, even if you exclude stuff like real-time collaboration, but even just having separate out-of-line off-band, this I think quite nice and helpful. Back to you? Yes, just leave it a little bit, because that's very, very important here, what you just mentioned earlier, we had, because we have no way to identify something or to point to it, we had an ID in the middle, and we had to put an ID and the content, the previous content was also in the content XML, so when you're looking for the changes, you had to pass the complete content XML, even if it was a very, very, very large document, but nowadays you can put it aside in an old file because we have the ability to point by convention, we know what to count, these logical blocks, and so this gives us new abilities, and this feature about do we need an XML or an ID is one of these things that has been discussed on large. So I continue? Yeah, let me find the slides. Where are we? Are we where? Just a second, I had that one here. Okay. So we had decision-making problems, so when you work in a team, there are a lot of people with different opinions and sometimes the opinions just are based on personal taste, right? Like we most know the Pink Panther with blue and yellow and it goes around and around and around and just said we can watch this movie over long, but the thing is how do you make a decision based on these advantages? And the rule of thumb for making this is the more features that we are covering, the more use cases, the better it is, right? XMLD has their advantages, especially where everyone comes up with, oh, hashing, we are so fast with it yet, but the things like Torsten said, we are not able to put something aside, we're still breaking the document and think of other scenarios that wasn't possible in the past, like there's a read-only document in the internet, you can take a look at it and then you wanna comment something, right? But nowadays you would have to load it, edit it and send the document back because it was only possible to send everything, this is the full document back, but nowadays you can have a small set of changes and point within the document and you make something like, and get the developers know it like a pull request, you ask them, do you like my changes? Would you like to take them? And the guy does not receive the complete document where he doesn't know what has he changed, he just receives the changes and can read it even by manually, humanly and said, oh, he just changed this text, this is totally safe, he doesn't break my template with another application or makes something wrong, we can take it, it's safe, right? So this is very, very important. Okay, how does it look like? How does an architecture might look like? And I worked for you before for Open Exchange, which is, had a VEP office, we started with this and there the common use case was the following on the server, the ODT was transformed to sequence of changes, just a different look at the same thing, it's like the sequence of changes like a batch, how a user would create the document from the top to the end, like long macroscript or a recipe. And this is sent to the browser and the browser will then create the document from the beginning to the end based on the sequence. And the nice thing is the browser doesn't know, was it a file or was it from a different user who just sent me the thing? So he only had to care on these changes, right? It's very fine, Galileo. And it's just like you're getting the ODT out of the equation there. Okay, so what's next? What's finally to do? And it's a little problematic because LibreOffice developers currently not, I say, they're not cheering for this new change tracking because the downside, I didn't realize it at first is that although change tracking is required, there's a lot of effort to spend there and for the user, it's quite the same. It's only changing for the change tracking feature itself behind the scenes. There's no benefit a user would pay for, right? So this is hard to sell for people who have to sell their developers. So we have to come up with different features. And I work in a different area on the Apache ODF toolkit. That's what's been used in the backend to, I call it the ODF sequencer because it takes the file, the document and maps it to a sequence. And then later you can get some changes as well in other sequence and apply it back to the file, right? So it takes all the complexity of this ODT out for you. And this is a new feature that's going to be added in this incubating Apache ODF toolkit project. And the next thing of course is something that LibreOffice helps. It would be nice to have this for testing because currently we just load documents LibreOffice to see if we're crashing and it would be nice if we're saving and see if the features set have changed. And because I did the same thing for my regression testing fonts, I know that once you have a bug, you still want the test running and you need to adopt the references and you need some tooling for that. This is something to do. You did pretty much that on a different level of abstraction but with this thing like normalizing and then the thing like. I would love to. And the last thing, I mean, the next thing, it's just based on that on top of these tooling that is required. You know, get perhaps, you know, versioning systems and currently if you check in any ODF then it's just a binary file and it's hard to see the differences. And even if you make some tricks and get the XML there's a lot of noise, you can get not stable versioning. But if you work on these changes on this high level abstraction, you can see these changes and you can merge changes very easily. So this is something I'm going for because this is the first, I think the first use case where customers might say, oh, that's nice, I would like pay to pay for this. So we get some momentum with this because it's quite stalling. So this is, which I can do mostly alone and change trigger specification is already there needs, there's a need of implementation and LibreOffice is the largest ODF implementation of course needs to be on board. Otherwise it's, yeah, lifeless, fruitless. And what we are, what this topic about was this change tricking in this area, this have to be implemented, this should be implemented. Otherwise, yeah, we have a problem and independent of it, there are some old anchors to depths from staff is about format is just for the developers which have to be addressed as well. Like if you have bold letters and you make a change tricking within the text and make something underline and you're not able to undo it. It doesn't work with the application core nor with the ODF model. It's, yeah, that is broken, have to be fixed as an isobrack and, area, especially in writer to touch that. But it's kind of orthogonal problem. I mean, that probably needs addressing anyway at some stage but it's orthogonal to how you serialize the actual change track information. So, yeah, so again, thanks for Roses for doing wonderful work here. I hope that at some stage we can complete that and at least have that as an experimental feature add-on and master. So, Roses, thank you. And with that, I just want to... Yes, last slide. This is for Michael Meeks because we have some little dispute on this. This was not the last one. The last one, that one, I believe. Or the last one of mine slides, okay. Oh, wait a minute. We've got a problem here. I'm gonna fix that. See, it's fixed. All right. So, we have to remember to group up and to play smart. It's, yes, it's not sufficient to be open source, in my opinion. We, LibraOffice, or LibraOffice, should still support ODF because it's, ODF is with LibraOffice, the biggest player, other way. LibraOffice, the biggest player of ODF. And there are a lot of marketing effects by an ISO. We see it by the ways that Microsoft, the things that Microsoft did when ODF ISO came up, they created fully new format, quite similar to the one that ODF did, and they even took over the name. It was the first alpha version of ODF called OpenOffice 6ML, but the name was changed because it was too application dependent. It should be something that, like, Coulombora, not a Coulombora, like other, Abii word was it, yes, and others should use the format and they should not belong clearly to one application. So the name OpenOffice 6ML was dropped and Microsoft took over Office Open 6ML, just switching the first two words. It confused me totally in the beginning, but, and it's also apparent that they have to move certain steps if we open up the market. And the next step is in interoperable collaboration, in my point of view, because one of the largest blockers for getting to the enterprise market is the business software, enterprise software, which is highly dependent on the Microsoft API. And if there would be some, let's say, document API, some document changes that are interoperable, like you can put the ODF directly to the git and you get these changes and can put them directly into your viewer, into your business software that you're writing. So you wouldn't no longer necessarily have to exchange this ODF documents, but these changes, right? To allow mergers and real collaboration, like user, customer, designs most, then we would have something there that would push the market again. So group up, play smart. Thank you for listening.