 LibreOffice conference talk. Take number 12. Hi, this is real. Take number 12. I really hate pre-recording talks because when you talk live, well, you have talk once, you correct yourself, you ignore the way you are talking. Well, that's all. But for pre-record talk, every time you listen to yourself after, well, this is a little terrible. I should take another attempt for this. And yes, another huge problem for pre-record talk that I'm not able to see Milano in person, not able to see you in person. My apologies about that. And yes, this is another talk about structured document text and how things are happening in writer about this. And yes, because this talk is pre-recorded, I potentially duplicate some information which was talked before. I cannot dynamically react on previous talk results and tweak some presentation, probably remove some errors. I don't know. So, it is done as is. Sorry about that. Time to introduce myself. My name is Vasily Milinchuk. Nice to see you. Sorry, not in person. Sorry about too many sorrys and apologies to this moment. I hope it is all to this moment. I mostly work with CIB software company and allotropia company. We are doing LibreOffice as well among our activities. I do LibreOffice for around seven years. Right now I looked inside a Git log for my first commit. It is marked as 2015. So, yeah, correct number will be seven years to this moment. To this moment, I'm certified LibreOffice developer. Of course, through these years, the document foundation member and so on and so on. I am reachable by my email used in my Git commits. It is still valid, still actual. It is not changed or by this complex nickname on IRC or Telegram. I'm on that channel. I'm reading the data in a very lazy way, but on a pink I will react. So, if you have any questions in general about topics or according to this presentation, well, ready to answer. And glad to see any feedback because I do not able to see this feedback in person. And let's move on to topic of our speak. Done. And move on to main topic of our speech. At first glance, SZT looks like something related to the forums. And yes, if we look back in the history, since the beginning, it is beginning or dark ages for me personally, but probably somebody else do see how these features do appear inside Microsoft Formats. I don't know. For me, these oldest things were forever. And yeah, initially we have some features for document filling forms. We are made with usage of fields. We are for making a simple input check boxes or any combo boxes or drop downs. And we are quite primitive, limited in features. And through years, these features were extended with third-party tools. I'm aware of some of them. Even at CIB, we have implemented some specific extension to these form features and were in productions even nowadays as far as I know. But, well, guys in Microsoft decide that, well, this forms filling way is a bit outdated and not flexible enough. And I'm agree with them. It is not enough. And why do we invent Activix forms? Well, what else could do guys in at Microsoft those days? Activix and all that, all that. Yeah, it brought much flexibility in creation of the forms. Right now, much more varieties, much more tweakable features of these forms. We can use even the complex custom forms for it. And of course, we brought as much problems with security and of course, much problems in LibreOffice. And since a while, again, guys at Microsoft decide to do something new again. This is what we know about called structure document text. And at first glance, as I said, it looks like another attempt to invent forms, third attempts. But looking forward, it is not only about forms. If we look inside the word, well, yes, we inside developer tab in design mode, we have classical buttons to insert some elements inside your document. And yeah, right now, we're inserting SDT elements, SDT forms. All our oldish features are hidden under special drop down list. And here we see classical form fields, Activix controls, and so on. So all the forms created nowadays with Microsoft Word are SDTs. Except special cases when somebody decide to explicitly put some legacy stuff. How things are looking inside the document on XML level? Well, initially pretty simple. Inside, for example, the paragraph, we have a new token, SDT. It has own properties. Here is the list of some properties. And SDT content. Where we have content inside? Classical run with some text and some content. Looks pretty similar, pretty easy. But even on this level, writer has some problems. In most cases, a writer don't recognize SDT at all. And if writer do recognize SDT, it does not keep all the properties correctly. And these properties looks like a nonsense, but we are quite important to Microsoft Word to understand some features of SDT. And of course, we have classical situation. We open some document in writer, save as DocX back. And we have no more SDT. Or SDT is some invalid state or lost some information, and so on and so on. Well, to solve such kind of problems inside a writer filter, we have classical solution, grabbacks. It is a special storage for keeping all the DocX nonsense we do not need inside the writer core by itself. But we need to keep this data for later saving back to DocX, because this nonsense is important for DocX and for Word. So after some refactoring of this grabbacks support for SDT, because we were really bad, I think right now it became a bit better. Not perfect. I don't like current situation. Still don't like it. But it became better, I think. And yes, this does help to resolve some problems related to SDT. And you see some five-digit numbers. Quite old bug was resolved during this process. But this is not all. What's about SDT? If we look further in situation, well, there are some levels for SDT, which can be used in document practically. These are levels defined in specification of Open Office XML. Basically, we have inline level. It is exactly what looks like we have in example before. No problems to this moment. Well, there are some nuances possible, but no complex efforts are required to fix something. Next one, block level SDTs. And well, here we have multiple problems. We had multiple problems until work done by Nicholas. Many thanks to him about this, because when I looked into the situation, I ran away and I'm scared. But he did this job and he seems things became much, much better. Specification defined next thing, cell level. It is content of some table cell. Well, according to my understanding, this looks something similar to block level, to previous one. And yes, in case of block level, we have some paragraphs inside, or even tables inside. So inside cell we have about the same situation. So, well, something similar. But what about final type of SDTs? It is row level SDT. So entire row of table can be defined as SDT element. Well, I have no idea how to do this correctly. And most important, I have no idea what we can use this SDT for. Well, there are some ideas, some scenarios, potentially useful, but I did fail to do this practically. No, it is possible to create this type of SDT practically. But I have no idea what sense for this SDT. But who knows. I am not an expert in this area, in spite of spent time in this area. So, well, classically, we see that the SDTs are used as a form. And in case of a writer, we still have some problems with these forms. And much of the problems came from that not every type of SDT is recognized as a form element. And the classical use case, like I mentioned, the bugzilla task, when we have a document, which is right protected, the user has access allowance just to edit form content. But because there are no form elements, the user cannot edit nothing. The solution was quite simple. Let's use a classical writer text input fields. Well, solution not best, practically. But on semantic level, looks okay. We have some input field in source document. We have some input fields inside writer. Well, we are a bit different, but actually doing about the same job. And yes, this did help to resolve an initial problem. Well, we have an edit field and right now form and document became editable. But, well, this brings some other new problems, problems with formatting of this field. And yes, this one is not yet fixed. Sorry about that again. But I have some ideas how this can be improved. Try to find the time to resolve this topic. This was easy case. This case became much, much more interesting. In spite of I feel myself, I even expert in SDT. Oh, what else you can do with forms? I found this one, situation. And on document level, things looks pretty familiar. And it looks very related to what I already show you about. Paragraph, SDT inside, some properties nonsense, we keep in grab back and we are back correctly in dockets. So no problems here. Content with some text inside, some content. What can went wrong? We open it to writer. We see the same some content element. So everything is fine. But let's open it inside word. What? Valid answer. Instead of some content. What is this? Where did it came from? It is not written anywhere inside our document. But what does this play? Absolutely different content practically. And to understand what actually is happening here. We need to look deeper inside this SDT properties nonsense. And where we can find interesting property data binding. With definition of some interesting properties which are responsible about fetching the data, where the content of this SDT came from. And situation with SDT became very related to situation we have with forms, with fields, with shapes. Things defined inside SDT content are actually just pre-recorded results, some calculated result, which can use any reader, which doesn't understand about SDT. But if you want to do something more complex, receive more relevant data, or update the data, this is also a topic. You should understand what is happening inside SDT properties, especially about data binding. As I already said, situation is very similar to what we have with field and field result, shape and shape result. If any reader doesn't understand how to process the shape, it can just take shape result and here it is a bitmap, just draw it and don't bother. It will be something about what is written in shape, in most cases. Not always, but well, since you don't understand shapes, your problem. And if we look into this data binding more correctly, more deeply, we have several attributes. At first, XPath. Well, no surprises, just XPath expression, which is used to calculate some external XML file. Yeah, in this case we have some custom XML folder with custom XMLs, which can contain some specific customer data. Not always this type of data binding can be referring to some internal properties of document, not always about custom XML. But again, XPath is used. Store item ID. This is what I was missing for a long time to understand how we distinguish among many XML files, which file to evaluate XPath. And my initial implementation almost a year ago, what is quite primitive and stupid. We were evaluating XPaths for every XML we know about. XPath succeeded. Okay, we got a result. XPath failed. Well, we have some XMLs to check with. Well, I understand this is really stupid, but I did not know how we store item ID is working really in practice. And this property contains a global identifier of storage. And this global identifier is stored in some special another XML. You can see it has the name item properties and number dot XML, which defines the global ID for XML file item one dot XML, for example. Instead, Sterex here can be usually numbers used item properties one and matching corresponding item item one, item two and item properties two and so on. And also we have some special predefined global identifiers for core properties. It is classical document properties like title, author of document and so on and so on. And another identifier for custom properties. It is user defined properties which stored in dockx package in separate XML file. And yeah, we can query for modification and for displaying inside is a tier all this type of properties. And final element we have for data binding is prefix mappings. It is just a list of namespaces defined in this XML tool XPath related to work correctly. This is how things are designed. But practically I have questions about this design. Starting from the end prefix mappings. Why do we need this property explicitly given? Because inside this special XML about item properties, we already have information about used namespaces. Why we provide this second time inside data mappings? I have no answer. More important topic about this global identifiers instead of resource identifiers. In most cases in dockx package we use local resource IDs. We have special dot rails files, which defines list of resources. And when we inside document we just link to these resources. For example, it is an image in separate file. And we just place our ID number seven, for example. And this will be a reference to this image. Or hyperlinks are also defined as a some resource ID. And inside document we also just provide a link to this resource. But in this case, we ignore this relation files. Yes, all these XMLs are mentioned there because we include it inside package. But in Dataflow we don't use these IDs. Instead we use these global identifiers. I have only one idea why it is made so. With potential extension to support some extensions to Microsoft Word. For example, we have some extension which registers some identifier, global identifier, unique identifier. And when document try to ask for some data from special storage ID and the storage ID is matching, the job about seeking the data is extension related. But in this case, why we limited only to XPath? Because my extension can be something more specific. For example, I can fetch some data user asking from SQL by database or some other source. I don't know why XPath. Well, again, no answer. But this is how things are designed in the team. This is reality we live in. If you think these complex features are just some geeky features created only with custom XML editing, you're wrong. It easily can be done inside Word by itself. Under the same developer tab, we have a huge button called XMLNamingPain. And clicking on this button shows corresponding panel on the right side of the Word. However, you can select and drop down list all the existing attached XML files or even attach a new file included in bundle and iterate through this XML3. And for any random note, you can just insert content control and type of control you want to use. What's important about content control? Yes, I was talking to this moment about reading real value from this external XML file. But on modification, we also should update it back into this XML. And yeah, this nowadays is also done by writer as you can see. So as you probably noticed, SDT is something much more wider than just another attempt to fill the forms invented by Microsoft. As standard says, this is a final form of extra standard semantics represented by SDT. Well, what I love about specifications is that I understand all the letters, understand all the words, but all together I have no idea what it means. What is this practically? Again, based on my understanding of specification of open of XML, we have structured document text. We're also referring to specification as a content control. And it is another attempt to separate the document presentation and some data we put in document. This is one possibility provided by SDT. As shown in the previous example, we have some data kept somewhere in custom XML file within bed in our document. And practically it is just a combination of custom XML markup, which is already about feature I've talked about, and smart tag feature of Microsoft Word. It is just ability to mark some areas of document with some special, let's say bookmark for later automate the document data fetching. And yeah, it can be used to fill the forms. So what practically did happen? This work to support all the features I already mentioned about took a bunch of commits done at the end of last year. I hope these efforts did improve LibreOffice and we don't break it very heavy. And actually according to my impression, SDT became much more stable when it was before. At least in the cases I'm aware about. But it is actually not a final situation. We are missing tons of different things which are used in SDT. For example, the interesting feature about control of multi-line. For any element of SDT form, we can tweak and decide are the enterprises allowed in this text field or not. So can you have more than one line in this element or not? In writer currently we are missing this. Color of highlighting, nice and fancy feature, making your forms much nicer. Instead of just boring gray highlight, background highlight like we have in LibreOffice. Right now Microsoft Word allowed us to have some other random colors to highlight. It would be nice to have this. Placeholders and embedded text about the doc part. Well, this is another complex feature I don't want to dig into because it is related to glossary part which contains an entire new document structure, which contains some document parts which can be referred from our document in this placeholder to be embedded inside main document. It is complicated stuff and LibreOffice does absolutely nothing about this. I'm not sure how often it is used and use it all. Probably we are happier right now without this part of Open Office XML specification. I don't know. Building blocks. Another cool feature of SDTs, allowing to construct complex documents. Right now nothing here in our case. Repeating sections. Another type of SDTs and probably this case where I have this table row SDT makes sense. Image galleries. Also I can insert it in my document. What I can use it for. No ready answer. Well, probably some cases are possible. And yeah, this is not complete list of things which can be happened with SDTs. And yeah, we are missing in our writer many of them. Still. Bad news. There are tons of them. Good news. I think they are mostly business oriented features and not actually useful for many end users, let's say. Because for me as an end user, I can use writer to fill any form containing SDT and I'm happy. But if I want to create a form with SDT, well, this is absolutely another topic and we can discuss. And that's all. Thank you for listening to me about this topic. Sorry, no questions can be answered during this session. You can reach me somewhere else by email or on IRC or on Telegram. Many thanks. See you!