 So I thought, of course, of sex, drugs, and rock and roll, but I'm not sure if I could come down with this. So I could summarize it to one thing, it's money. So this might be interesting to you. The P is for the prototype fund, and it's about funding open source projects. Let's start with, you have to be German citizen. That's the, but it's nice place to live. You can choose, you're out, yes. You can be from any nation. And I was able to won the second round of prototype funding, which at that time still get only 30,000. And now, because that's quite funny, there's group of six girls or young ladies, I believe, but they are handling it. That's what their idea, and they are with the government of research. They're getting the money from them, but they're in between there. And there's very, just you fill in one formula with your idea, open source, how the idea sells everyone, and it goes very easily through. And within the 50,000, there's like two and a half for consulting for you, for coaching. But yes, even 30,000 was great. I was planning to be here with a friend of mine, but he started to do a fishing app. And so I had to do it alone, and that's so far as I wanted to go. So take a look at it, protofundee. It's worth to jump into it. And the second thing is called, this is the good news. It's not a bad news, but ODF is about the standard, the file format. And to me, I think open source is not enough. Open source, sis, different software have to be interval, have to communicate with other, otherwise there's still the chance of a lock-in. And although standards are very slow, the ISO, the pros of ISO has been made for material, for pages that never change, and software needs evolution, right? There's a contradiction here, but you have to keep in mind that the ODF, the blueprint of the ODF application like LibreOffice is very, very important. So, but I hear often, oh, standards, I don't like them. I like open source, that's sufficient. No, I don't think so. And I'm pretty much sure that the user needs a freedom of choice to switch between ODF applications. So, with no further ado, some history of the incubating project. I believe it was 2005, I must admit, I've just did a good guess. When at Sun Microstyms, we came together, we had the star office at that time, maybe even open office, but we thought about bringing all these software solutions that we made for the server, these tiny things. Oh, I just want to unzip the XML package with the XML and add something. And we had, most of us have something in Java and we put it all together in one place and because it was opportune at this time, we made it open source as well. And IBM had the same code fragments, so we came together. And it was then later being used and pushed by being used at the back end of a web server, web office, sorry, web office, right? You have a browser, an HTML document that you're editing and basically it's been ODF in the server, it's being sent back and unzipped and transformed to HTML back and forth. Oops. So, how do you know anything about the toolkit? You might have used it before without knowing. Recently it's been, LibreOffice sponsored the validator, the website for the validator, and where you can, we have a front end as a JSP, it's running out of the box, you just get a wall file from the project and you can use it as a standalone version as well. And this is basically the main modules of the project. At the top is the generator, which is for me the most interesting part because it's generating the source code from the schema, right? And the basic idea, one of the principle I learned from software development is, the more you generate, the better it is, right? Otherwise you have to do the work over and over again or get a mistake, it's horrible lot of work. And ODF DOM, the DOM gives us, in DC it gives a sign about, it's like in a browser HTML DOM, every element has its own object. The advantage is that you will, from the start on, you have no information loss. You load the full document into the DOM, you can edit it, adjust it, and save it back. The idea to on generating it is the schema is quite complicated, I'll give more details later, so the more you generate, the less the developer has to know about the schema and can be guided with the, let's say, typed classes. There's even an element class for the paragraph called text P element. But the disadvantage is the memory is larger than, let's say, binaries or bits and optimization. But I think in the toolkit, in the first place, if you need to have a research, you have to improve the generation and then later on you can generate source code not yet like in Java, maybe later in Rust, and of course maybe in a binary representation and going away from the DOM. Aside of this, using the DOM is the Excel-T runner, which is definitely Excel-T, which enables you to run the Excel-T script directly on the ODF document without the need to unzip the content and the styles as well. So those guys who love Excel-T will happily be using the scripts directly on the document. The other thing we just saw before is the audio validator. These are the both important works from the audio storm. And the last thing that's been donated by IBM and no longer supported by IBM as soon they, yeah, as soon they choose to move on to something else, they abandon that. And then this, I make it in red because I think as well it's, yeah. It could be done better, yes. It was a very fast work that did once. So let's take a closer look at this ODF DOM. There's a package layer, which is taking care of the zip, unzipping and the manifest. And totally independent, this can be used by any other software as well. EPUB1, by way, use the same ODF 1.1 packaging format. Unfortunately, they forked for no reason I am aware of. Maybe they didn't, yeah, we don't talk to each other. And they invented their own sign in encryption and have their own packet format fork, which is nonsense, of course. But yeah, we didn't have dinner together. So, and the next thing on top of that is the, as I said, the generated layer. And this generated, sorry, can be split it again in two different areas. One which is totally, yes, generated the implementation detail of the XML. And the above, we call it here, the document API is the way the user knows it. Like my mom would say, there's a paragraph, there's a table, they don't, she don't know anything about how it's been implemented in XML. And the funny thing from the user perspective, many office documents look the same, like DocX, ODT, if they're loaded into the same office, most documents look quite the same, but the XML is totally different. So on the abstraction of the user layer, they're very much the same. So, these layer concepts can be found also in the specification, which consists in the ODF one or two specification in three parts. And you've seen the lowest layer here, the part three is the package format, and which can be used by others as well, as I said. And the first one specifies the XML, and the second one is just the formula for the calc. Might be used in writing as well, but usually only for the calc. So, there's also this separation of concern or modernization being given by these three specifications. And what we have, we talked earlier in the first part, the XML is the schema, the grammar, that tells you what is allowed. And this is quite complex. I'm not sure if you can read it here, but I would say it's actually from a usability perspective. For me, it's actually because we have about 600 XML elements and 1,300 XML attributes, and this is quite a lot. Some would say if I write an office with only the paragraph, it's also quite complicated than adding styles and so on. But embracing everything is quite impossible without the way of generating and making this easier to understand. I've chosen the table here, and you see there are a lot of references and so on. I won't go into it into detail, but we started this generator, 2050 I said, and in the beginning, we simple on the first try with XSLT, Christian Lipke did some Excel, former colleague, did some Excel T transformation and read this XML file directly to fill it into Java, which was quite of work and quite of things he did. And only a subset of course. And we couldn't use this, oh, it's B, it's not a P binding, yes, we couldn't use the Java XML binding because the sun standard for mapping XML to Java classes only works for W3 schema and not for the relax and G schema. Well, the nice thing about standards here, so many you can choose one. So no, there's no interoperability as I said. So instead, and that's what we are currently using, we use two different open source technologies. It's a multi schema validator from sun, which takes part, take care of the parsing. You can read it, have to don't invent it or write it yourself. And there's internal model then. And from this, you take this and fill it into templates, text files, where you can create anything. We create a HTML documentation, there's some Python, I believe, and yes, mainly Java then, it's been tested. And all the information that we wanted to use was being sucked out of this model into list and maps. And somehow I realized that was quite difficult. When I tried to improve this, I realized I couldn't find these things in the list and it was very hard to expand it. And I thought it would be much better if we could directly take the relax and G as a graph, right? Because every XML is a tree, basically, yes, but as soon you got references like a style ID to a style, you got cross references, where then you're starting with a graph. And graphs, as you might know, with the success of the social networks like Facebook, where graph theory comes in the daily work, is the main focus, the work, the research in this area has normally expanded and the algorithm to use graphs and alter them, much, much better. So what I did, and I reused as well, when it says, okay, I want to load the rex and G into the graph database, which graph database do I use? And the nice thing is the TinkerPop API, TinkerPop and Patchy, is again hiding the implementation detail of a graph database, that you can use every graph database and they have a language called Gremlin, a script language to traverse this graph, which is then transformed to each of the graph database they are using. And I feel pretty safe to go on an interuptable level and again here, right? So what I did, let me first, I put this in the notes there, I've stolen this from the Kals computer club presentation where they did source code analysis with graph databases. And so I thought when they can do it with a source code, which is much more complex, I can do it with the rex and G as well, because with rex and G, if I ask anyone here and ask, please tell me what is the minimal document that is possible in ODF, right? Simply go to the root and take all the mandatory elements and put them together. You will not know basically, but this is an easy query for a graph database. Give me now, start here and now give me the minimal document that is being used here. So I thought I need to reverse engineer the rex and G or have a better tooling to understand it and to control it. And that was the reason why I came up with it. So I started with the middle schema instead of reading the rex and G myself, I go as well on top of this and I simply dumped this memory model into a text file line by line and then wrote just for fun this antler grammar to generate a parser, you read it and you map it to the graph ML, which is just simply a graph format which is quite interoperable. And with this I could visualize first time a graph. So are there any questions at this point because maybe this is quite, I'm speeding up a little bit on that because this is an essential idea. Why I'm doing this because rex and G is so big and it's one huge text file and we want to improve it and want to work on it and like Stefan using Clang compiler plugins to traverse to C++ source code which is very huge. I want to use a graph database to traverse this tree of rex and G to answer me questions in an automated way, right? And be able to do refactoring later because otherwise it's too huge for manual editing. This is just, we need a better tooling to embrace this complexity. All right, so what I did is please graph database, give me from table to table all the child elements and everything in between, all nodes in between and there are nodes in between like choice, sequence and so on. So you will don't, you see just a picture like a star picture, you don't see the details, right? This is the table to table and all the elements around. I have a GV scale reviewer there just and the red things are the attributes, right? So do you see there some structure? Okay, I will zoom a little bit in, yes, the attributes and then we've got this here and I will explain a little bit. There's a sequence, okay? A sequence of one, two, let's mean there's an order. You have to first you have to use this and then if you use this at the top there's an element called text soft page break and after this you can use the table role, okay? This here is boilerplate at the moment, right? And this here, epsilon means nothing. So you have the choice to have nothing of this. In other words, it's just meaning it's optional, okay? So the next step and that's what I'm currently working on is I'm refactoring it and improving it by exchanging this to optional and whenever this name is similar to this I remove this as well just to simplify it, okay? I've got five minutes left and going on. So what I'm trying to do now is there are a few things like choice and sequence that I need to generate that's not yet in the coding and also when there's a parent like a style and that has many styles that have many style which have an ID, I want to have a map in there. I just want to generate it out of the box. I want to generate as much from this DOM layer as possible because I don't want to roll it over and over again. And another thing is when there's a reference, and XML said, oh, there's a reference and there's a start of a reference and there's a stop of a reference but they don't say that style ID and style name or style, but they are connected, always connected. That's missing information. So the next thing is I want to annotate and enhance the schema with additional information. So I can generate more. And the last thing is, and that's the most important thing while I'm doing this all of this, is there are user changes where they're not specified in the schema. The schema says, oh, you can put anything as long as well it's fine. But the users among us are just doing the same thing in all offices. We are adding tables, adding paragraphs, adding characters and this is the high document earlier you see the high API, the user API where I need to implement it for collaboration because if we colorate it, the single document that's the only way we have it is no longer possible. It's broken. We cannot merge if I give your documents, give it back, I cannot merge it. It's like we need changes. Like in a Git software commit, I want to ask you, what have you changed? Give me your changes, right? So I want to have user changes on the high level thing and I want to be able to answer this question. So my work on the prototype thing was, oh, wait a minute, I forgot the site. So this is just that the user changes isn't implicit standard but it's not being documented anyway. It's in our mind but it's not written and we have to start to write it down and have these injured delete and modify changes for all these user components we have to annotate in this schema. So on my work on this prototype fund is that I promised to put an ODT into the ODF2 kit, use it as a black box and it's been transformed into sequence of changes like a cook recipe where you can say, oh, insert the first paragraph, insert hello world, do this in the second, do an image, third, do a table, right? It's the high level changes, it's totally equivalent and the other thing is it should be able to accept new changes and merge it into it, right? To have a proof of concept of this and to see how it's work. And the new thing is I want to generate as much as possible to avoid redundancy. So the user changes are de facto not a standard yet, right? So we are in need of enhance the relaxNG to generate it, right? And otherwise there's, because it's optimistic, why should I write it by hand if it's for all applications the same thing? It's much better to have a way to annotate it. And how we do it, that's easy but I'm unfortunately running out of time. Okay, any questions? Thank you first. Okay. Yes, please. Because I clearly hear you're saying sequences important and then why not stay in the XML3 model with X queries that don't work to... Yes, good question. So the sequence by the way is just if you and I are working on the same document we again have branches and we are again in a graph, right? Like in the Git model. But the graph is because it's the natural recipe.