 Hi, I'm Svante Sjubin. I want to talk about ODF and its toolkit today. I've uploaded the slides already if you like to. They are on GitHub in the ODF toolkit. And I start slowly. I try to say something about what is a standard. The standard is like a blueprint, a cooking recipe. So it's for usage, for interoperability. I think the most famous standard for me is the D&H4 paper format, which is in Europe quite frequent. We have a letter in the USA, I believe. And so that the same size can be reused everywhere. Yes. And I think standards have been coming from Second World War, where they need the screws on the same size. And by adding reusability, you will prevent lock-in effects. So others are on the market. And of course, you are able to lower costs, especially when you have shared tests and validators. And what's yet not in a standard, I'm working as well as an invoice standard in Europe. And it's easy that the invoice should no longer be a paper, it should be digital so you can consume it. But the standard itself is being published as digital stone. So it's very hard for software developers to get the information that they understand that. And I think one of the few where things like this have been done, like we are structing the default values, so I might show you later. But there's still a long way to go. And so digitalization for me is that you're able to throw the data over the digital fence, and the other side can automatically access it. It's about automation that no human is necessary. And especially it's occurring for invoices, you can throw it anyway. But also the standard, if there's a standard, then people should extract most information. I'm telling you this because I worked on this ODF toolkit and trying to writing an ODF application, and it's like painting my numbers. There's nothing you can add on art or something. You just have to fill the standard. And it's simply boring work, tedious work to do so. All right. So you might see in there, take some time here for now, there are two standards. There's an ODF ISO and an OASA standard. So what's the difference here? And I'm not so much sure about ISO, so I know the real regulation of SEN, which is a European standard. So the European standard, SEN committee, which is the dark blue things, every country has one standard organization like DEAN, I mentioned before, for Germany, which is then being collected as a group as SEN or international as ISO. All right. So I just mentioned that it's the members, but you see it on the slides later. There's a nice but chaotic thing. And I just want to show you here, there is ISO and there are levels here. There are columns and rows. So we have, first of all, the national level, national bodies like DEAN for Germany. And we have Europe level, the group of all European national, I showed earlier, and then the international level. And then we have columns where it's electrical engineering, telecommunication, and all the rest. And because we are not an electrical engineering tool communication, that's a file format, we have all the rest and we are here in this column for an international standard. So and why is it so important? Because there's a regulation that's the European law that's binding for every country, that standard that's being used by a country have to be a SEN standard. Or ISO standard might be international even better, I believe. I'm not sure if it's better, but it's the Dupont-Pardon. And that is the reason why, so it's the, I said it's a queen, let's say the difference. The ISO standard is the queen of standards because it's so powerful that governments have to use it. It's so important for companies, big companies to get it. That is the reason, that is the reason why Microsoft first standardised in ECMA and then standardised later in ISO. And that's, I've just forgotten one thing. So this site is out of the scope of standard, you see there's OASIS and there's WC, ECMA and so on. And that is the standard. And just to mention once because I think it's a little bit soft, I have to pay Dean to participate, to be participating SEN. And then later on the standard is being, the work is being sold by ISO or SEN and or Dean, yes. And so it's like you pay in that your work there and then it's being sold and it's no open standard. And the nice thing or the trick I would say that's possible is there's something called a fast path. You can do the standardisation at WASIS and then throw it over the defence, it's not digital, but then it's being released again as an ISO standard where only editorial changes happened. So for instance Great Britain demanded to use ODF and it's, as I said, it's not free as governed. I'm not aware of the legal situation there of if there's European laws. So I have to ask back on that I realised and there's this fast path I explained. And there's no difference. You can choose if you want to keep the technical addition on the, let's say on the level below like at WASIS or like Microsoft did, they choose to drop the work at ECMA and continue work on the international level, which is I think even more difficult to participate because you have to buy in and it's more expensive to go there. So I think OASIS, ODF did it the right way. There's a fast path. There's an open standard you can download, you can see and still it's an ISO standard by just throwing her off the fence. And so what is the status? Sorry. So the status is we have an OASIS standard one or three. It's published two years ago. ISO standard one or three is in the queue. There are some editorial problems they want to, I think they have to, I'm not sure why the criteria are there to change something and so let's say it's simply the queue and I go into details and we at the TC level are the feature-free phase of 104. So talking about the standard, sorry, the TC. So here at the LibreOffice conference half of the voting members are present. And the order is just the order by the members with the OASIS staff because Regina here, I think she's the most active and thank you. I must say you're the pulling, I won't say horse. You're the pulling force. No, no, really. It's most of the provosts are coming from Regina and I'm more there because I want to digitalize the standard. I have a different view on this area and you will not see so much about the detail what an ODF is there, but I gave a few links there. And she comes from an ODF. Michael Stahl is from Allotropia, that's been sponsored by Thorsten. Arthur Herschel from Microsoft, he gives feedback on how Microsoft does things and unfortunately Professor Anna Skulsow left, he was working for Caligari as a spreadsheet office and Frans and Patrick are both working as an editorial, working on the draft that's being paid by the community and there's the collections and all right. Yeah, of course. He's too much to do, he's too much on his plate and he had to focus on his daily work. There was no, there was peace, there was no, it's very constructive. Orgina, I want to say anything about this? Yeah, just no time, she said. No, no, no. It's wonderful harmonic. I heard about that some other people have struggled in their boards, but we are very, very nice, very civilized technical persons here. All right. So what is the situation on the technical level? We have the feature freeze and we have an issue trigger for where we have listed, there's a query, this is a link. If you click on that, you can go into detail what's happening and for a tutorial things like there's something broken, we have to write it down as well, but we don't put it in the JIRA issue because otherwise it would be polluted with a lot of issues. These are just the, I would say, semantic things that we want to change and for the applications. And these issues are later generated into the agenda with a link, so you can jump to it and see in the detail what we have changed, what was the discussion. Okay? And the same happens here with the tutorial, so the GitHub. And the GitHub was created, we edited it in 2001, we realized after 11 years we had to do, I'm not sure, after one or two to one or three after, about a decade, all the tooling to transform to HTML and the extraction, the extraction default values, all this should have been on one level and so that's the reason we started that. All right. The other thing is what is the status of the ODF in LibreOffice. And thanks to Regina here, she provided here the list of these budgets. These are mostly even ODF one or two problems that are not implemented in, I wouldn't say mostly, but a few that are there. And the list, what is missing for one or four is in the next budget. And as well, but just because it's a little odd here, there's a process about the missing automation that we need, that the editors don't have to do so much manual work. They are paid editors, right, by the community. And a lot of these things can be avoided just by automation. And one of these things is, this was the last TC call, that is a typical, it's an element here. And this is, I think maybe it was once generated, but we realized that there are some of these links that if you click in it, you jump to that element, sorry, attribute here. And these are broken, either the numbers and they're going to the wrong element. So it's, of course, we should go there with the ODF toolkit and go there. Is there any name, string name that's similar to an element or attribute and has it the wrong style and is there behind a link, which is correct. And okay, but this can be done. It's not rocket science. Okay. Just to give you an example, what's the state? So what's ODF quickly? It's a zipped XML. There can be other things like images and also adding other documents. And it's about the package with different parts. And the XML is the third, the biggest part. And there's all the formula. I just dropped it there. And what I believe, realized what was missing is like in W3C, there are a lot of primers. Like you have a technical guidance just to read through an introduction from a high level view. What is the format about? Because this is what the OD2 toolkit also misses like someone who is not aware should just get a primer of ODF and then he knows how the toolkit handles that. Yes. So just realized when I opened that. And that is one of my favorite slides for not only the pizza, my favorite restaurant for pizza outside of Brussels. But it is a general problem. I realized very late that there's something different between the syntax and the semantic. Yes. And the syntax to me is just when you go to an international restaurant or the group, yes. And we all want to eat something. We have different menus and they're all different languages. But they point to the same thing, to the same bread or wine, a vino wine, wine. And we have different words, tokens, and languages. But there's just one, so there should be just one semantic, no alternate effects. So we don't start there. But okay. But there's, let's say, so what I say, it doesn't really so much if we save it as XML or JSON or a binary format. As long as I would prefer that in the future, there will be a transformation between these things for free at the standard level. But the syntax is like a container, like a glass, where you liquidate it in it. And if it's not sufficient, like if your text file, that format's got lost, yes. But Mark Down would help there a little bit. But you've got pro and cons about this. So all I'm going to say is that we should separate more about the syntax, the semantic. And the semantic for me is something that we reused in tests and specifications and applications that are called features. Yes. And this is something like we test, we have feature tests, and these wording should be in the specification as well. And this is not very clear. Why do we need it? We need to get rid of this complexity. There's always a saying, how did an elephant spoon by spoon? And you have to, one thing is, if you have a large piece, you have to cut it down. And these things are featured. These joint features, like there's an audio application that might be able to handle with tables. And that is a feature. You like buying a toaster. You've got the features, and you say, oh, yes. And you can checkbox it and can test it. And to be able to communicate across audio applications, it would be helpful that we put that feature on the level there. Yes. And there's more about it because we, let me see, there are also the sub feature, like of course the background color of the table doesn't exist without the table. And what I also want to say is that the user has a certain thing, how he can change a feature, yes, like inserting a column. And we have absolutely no such thing. Like we only have XML and define will be a loading and saving. Always say it's a shock frozen work of what you just done. You dump it down and you get it back and you continue working. But something like the user, yeah, because it's always the same thing, inserting a column, you put it in the right column, sorry, inserting, yeah, in the column line. And then every row, there's a cell being added, yes. And this change pattern might and should be defined in the specification, right? Because then we can have tests more easily. There's, for instance, DOM, HTML browser have JavaScript and JavaScript only works across all the browsers because they have a DOM, a very simple syntax based API. But they have an API which is on the standard level. So they have macros that work on all applications. And I love it that this asset test, if you just click on that, then it jumps up from 0 to 97 and Chrome and Firefox that tells you out of the box, when you buy opening the document, what is the level of conformance? Yes, there. And I love it, yes. And we are unable to do that such test at a specification level because we do not have any changes. And, change tracking, think about it, it's more easily if you know what it changes. It's very simple. So if we desire such interoperable tests and macros and the API, and not any API, but a high level semantic API, so how should we proceed? So one thing is we have to find these features. There's something called, because there are sub features, we think there might be a feature tree or something. And we can, of course, reverse engineer because, of course, LibreOffice has all these features and it's what they usually, usually adds like a table and moves away. These are the large chunks of XML. And we can define these state changes. And we can also look at the grammar that just, because the XML is not all equal. Yes, they are, some is just boilerplate, office body is always there. There's no sense, no semantic there. And something like table, it's the start of such a feature, what you, a junk of a puzzle piece that you end or delete. So, and the other thing is we need to have maybe a dictionary that when we, when we look earlier, I'm not sure if I'm going, no, I'm not going back. When I told you this slide about what's the problem with the elements, they had some wording there saying the types, the types and values are being listed here. But we never define what a tape and value is, yes. So, this wording might be mapped to XML. So, and guess what, the table exists not only in the LibreOffice or in the ODF, but it's an HTML and doc book in OXML. It's a concept for inserting column, which is across many things. Okay. So, stating about this, I'm going to the the ODF toolkit. Of course, we have the validator. It was just updated today with the latest validator. And we have, yeah, some XLT, you can run this, I just mentioned it because part of it, but this, yeah, it's just fun. You don't have to, if you want to transform the XML in the zip, you can do it because they are using the ODF DOM, which is the corner piece. And this ODF DOM is for editing this file without a layout. It's very, from the API, which is very simple, and it's for insertion and subtraction. And the last thing is, of course, you, the main reason we did it once for Sun was that it was the back end of the office. And we did the transformation there. And also OpenChange did it. And there they used it to changes. We transform, and if you call this here, this is a document. And this is the list of changes. Like user would edit it from start to the end. And it's being just mapped. Like you, I call you and give your orders how to write down, like a plotter, yes. And you can, this is a link for the latest jar. And if you have a Java and put any ODT there, you get the list how it's being plotted. And I wrote in the document presentation, sorry, the documentation of the presentation about the translation tool. This JSON here is very nice to get all the text that is translatable into certain lines, yes. And so you can easily now take these lines, translate it, like DBL or Google Translate or any software you got, and then injure the back. Because this is an adhesion. You can tell, put this in the deleted first. And then insert this text here in this third paragraph. You just give a number, like counting it from back to there. So this is very easy for tooling and for translation. There are other reasons to do it as well. Like chain collaboration between offices. Okay. So the thing is, the centerpiece is the ODF DOM. And there's the validator and they are using it. And the main piece here, the most headache bringing piece is that we are trying to fix before this talk. It's a good thing. But I did now, there's a generator of the source code from the grammar. Because I told you earlier that I don't want to just write, read the pages, hundred pages and write it down. But I want to generate as much as possible and take the grammar and regenerate the source code from it. Because it is the grammar. Then there's something called the multi-schema validator that I overtook. It was I'm not sure we take it to ODF, but it's MSV. We did a release for the validator as well. Fix a few things. Michael Stahl did a few fixes there. And it's multi-schema validator because there are multiple grammars like XSD, DTD and RNG, which is one of the most powerful and most understandable instead of XSD, which is used by IBM a lot. And then something called a partial velocity. It's just a template engine where you pump in the text, where we have templates for the source code. And then we fill it. And also the grammar, which is reasonable when you see it not as a file, but something you can ask question queries like, can a text be nested sometime? Or does it work? Is it valid? Yes. And this was ODF 202. And you see, I'm not 100% sure of the exact numbers because, as I know, I'm a fixing code generator. And there might be a little bit that there was some popping up here. And we made a few things. And this is how the grammar looks like. It's like, if they define something, there's an XM element. And then something is optional. And so this is how the table. And then this link you jump in. And what's nice is I, you see there's an RNG HTML. It's with XLT. We, instead of having one text file, we have to search and find their links now. You can click and jump through it and have lines and you have links to give a pointer. And if you want to show something to them, that's very, very usable. Okay. So what we did is very simple. We said, okay, that's easy. We make a Java class for every element, a Java class for every attribute. Yes. Easy. Unfortunately, things are not so easy because we realized, wait a minute, actually, we have, for the same name, different content, right? So for instance, this is compared easy because they have a different parent, but in the manifest, yes, it's, we have different grammars for the, it's just like a content table. And then the root level, it has to be one of three, yes? Because it's the manifest of that version. But if you refer to other files, you might have different versions, older documents or newer, yes, later on, and they might have an arbitrary string. So if I generate it, what is the attribute now allowed? This value or this value? Any guess? Depends on the context. Depends on the parent. So we generated on the, at the parent level, here we generated as one to three, there's only read no set, yes, because of forbidden. And here we have a get it setter, yes, with a string. And it's getting more complex because even in the manifest, there is, let me see about it. It looks a bit difficult, but it is, there is an attribute name, this your key generation name, you might see it, and it has one fixed value, PGP, PGP, yes. And if there's PGP, there are no other values, yes, but there is a choice. You can either have this, or there it is again, and then it can have either this fixed value or any URE, UL, yes, and then suddenly there are one, two, three attributes existing as well. So only when there's a certain value, it's alone, and if it's the other values, suddenly attributes appear, and we thought, gosh, how shall we generate it? Like, we have no dynamic, you change it, and suddenly you have get and setter here. So what we, Mike and I, we realized that might be the best idea to have a constructor or something, a software developer to either you allow to generate this attribute, this element, yes, the parent again, yes, it can either generate one attribute or three attributes, depending on, yeah, there are three choices here. One alone, one with this, and these three, and one with the URE, and these three. So, and now we are, at the hack first, going to design how we traverse this to analyze that we have these three constructors. That gives a little bit headache, but it's good. Other thing is like default values depend on the parent person, but that's comparable easily. And this is what we, what we are currently fixing. Okay, just from a high level view here, what we are generating a lot is this XML layer. What we are handwriting or headrunning what's working is the zip, the package layer where you can put in files, arbitrary files, and where the content table has been updated all the time. And what is not currently available or just handwritten is a semantic layer. And I believe if we would have something like an API, like at insert table and define something, then we might generate this as well. And we might have, sorry, an interoperable API that we, that we share. And the funny thing is, sorry, if we say this is just implementation detail, say private, you can exchange it, then you can have this API and like open text document, insert table, and you don't, you're not aware that you have ODF XML, you might save it either as DocBook, if we are comfortable this way, or even as JSON, yes. So it is an implementation detail, this ODF, and we should be aware that the semantic of ODF is something that we should separate. And yes, couldn't, yeah, last month, five minutes. I thought the questions, yeah. Okay, these are the links to the sources. We hope to get an, yeah, we hope to get an a release soon and maybe a one at zero and maybe, wait a minute, I may go to the first page. Yes. Oh, wait a minute. Oh gosh. Okay, any questions before I, I just wanted to show you this once again here because, slide show before, sorry, from current slide. Here we go. Okay. So what we're currently doing just to digitalization of specification, think about the idea, currently we are extracting from the specification, but it's because it's a moving target, it's more complex, it will be more fun to generate the specification from a certain set of data. Okay. So, and we can generate this already. We know from the grammar what is the elements and attributes, but what do I not know is the element defines the name type value. Yes. So, these are three semantics. These are semantics tokens, yes, of course syntax that, that's being mapped and should be more specifically or more distinct explained somehow. Yeah, that's because I want to generate more. I want to define more clearly. Okay, that's it. Okay. Any questions? So, thank you. Yes, please. Oh gosh, not you. Yes, sure. Oh, that is, that is a good question. First of all, I, what I do very much like is just take a look in the specification because it's even it's a blue. Oh gosh, it's so late here. Sorry, slowly. Okay. But if you, I have somewhere, here it is, here it is. You see here, if you look at this, it's, it's, you can see the, the, I love the HTML. It's, it's, it's faster. You can jump easier than, than, than the ODT, which is all the PDF, which is, which is huge. Yes. So, and you might look at it and think about, just look at the headlines and see about what is it about? Yes. And, and you might also say, what is difficult to understand, give a feedback, what is, what is it, where it's, yeah, what's the problem? Otherwise, just tell me what you, why you're interested in. Yes. So, but yes, go to this, we talk. Okay. I think that what you mentioned earlier, that you understand the primer. Yeah, yeah, yeah, yeah. So, it is. We should put those information. We have, yes. Not just there because if you are there, you know. Yes. It's, it's, you're totally right. And it's, you know, the spare time about writing a primer is a little bit, yeah, it's difficult. I just, I'm happy, just I remember that the primer is missing while I'm doing the presentation again. It just does, yes. So many ideas of things that we talked about last year and still unfinished came back. And it's good to put them all together to refresh in here. And so, we have just to, we have either, where we have the blueprint level, this is the OASIS. And then we have the, the implementation level that is either LibreOffice or which is more generated because it's more the green field approach is the ODF toolkit, yes. And, and the ODF toolkit, there's a website in the very end. And, and Mike and I are maintaining it. And Mike and I are co-editors for the, so we are the doer for, for review, peer review and peer programming there. And it's always good to, for the automation there, yes. And, and we're going to do, that's all the work. Not, not, yeah, but, but no, it's, it's very, it's, I find the, the, the work on the cheese is very nice at the moment. I do like it very much. It's, we have progress there and unfortunately there Patrick's currently unavailable, but I stepped in and, but it's, nevertheless, it's, so it's a bit bumpy when I speak English, but anyway. The point is that, I mean, the work you are doing, it's critical and crucial because it's the format that we are using in LibreOffice as a main format. That, that, yes, that we are doing, yeah, yeah, yeah. It's so important to show how much you are doing it, how much there is to do, because maybe other people could be interested. Exactly, yeah, I'll try to, yes. So, STDF, we should also, you know, support you. Cool, yes, that would be great, of course, yes. It's, that is much more helpful and, as I said, the more that we, we, we push this boundary a little bit forward and coming from out of the box ideas, like we have to cut these separate semantics with this and we generate more the specifications blueprint, which is, yeah, yeah, some, some weird ideas, but when, when you program it a few times, they said, I got this, I'm, I'm, I don't want to program this again. I just want to generate it, yes. It's, it's, it's some kind of natural, but it's a good question. I, of course, I've not thought about it until you asked me, yeah, but this is, of course, yes, I have to, the best thing is we, we discuss it here and come and I might, and explaining his, he helps to, to think about it and, and turn, turn in the head. So if you have a question, I believe, Regina, you are the, the, one of the best contacts for the next features. What is missing in the ODIF standard? What is missing in LibreOffice? She has a wonderful overview. She sent a spreadsheet of this overview that I'm, I'm, so, and I'm more the one, I, I want to generate it. I want to, but yeah, like distinct teams, distinct interests and different, but, but yes, it's, works out really well. And of course, Michael with, with technical internal knowledge of LibreOffice. Yes. So if there are any other further questions, I'm, thank you very much for your coming and listening. If anything else, just talk to me. I'm happy to turn my head around this again. Thank you.