 My name is Bercht Machils, a little bit more about me. I'm from Belgium, I'm about 30 years old. And I've been enjoying computers since a very long time, as you can see in the picture. It's with a Commodore 64, by the way. I've obtained a PhD degree in microelectronics. However, I haven't really been practicing electronics anymore since I graduated. I've been programming in C and C++ in Python. And some other languages also for professionally and for fun, although in the case of C++, perhaps the fun part doesn't really apply. So what is Rhino type? It's a document typesetter in the style of Laatig, who here does not know Laatig or has not used it before. OK, a couple of people. But it's not necessary to understand this talk, so you should be fine. Basically, it takes an input document in a structured text format. So for now, this is limited to restructured text and Sphinx documents. And a difference with Laatig is that the style and content are more strictly separated. So I'll come back to that in a couple of slides. So taking this restructured input file, you can style the document elements using a style sheet, which is similar to CSS. And you can choose to typeset the document in a number of formats, such as a book or an article based on the template you choose or provide yourself. And the output is a PDF format. It's also possible to add other backends so that other formats are supported in the future. I'm thinking of SVG or scalable vector graphics that might be interesting to display in a browser. Then onto my motivation for starting RhinoType. Like you have used Laatig before. And generally, it has been a pretty good experience. You provide it with content, and it takes care of formatting it in a rather nice way. However, there is also quite a number of problems with Laatig. This is due to its age, mostly, I think, because it doesn't rely on modern technologies. Typical complaints about Laatig are decrypting warning and error messages, as those you can see over there. Like undefined control sequences then followed by the name of some weird tech macro. And you have no hope of finding out what the problem is. So this takes a lot of time to fix these errors. The tech macro language is not very accessible. It's very different from modern programming languages. So if you want to extend or customize a style in Laatig, you need to be quite the expert in tech. Lastly, it's also a very large and complex system. It's not very transparent. If you want to install Laatig, it's for the tech live distribution, which is a default distribution. Nowadays, it's a 2 gigabyte download. And installed, it unpacks to quite a bit more. I think there's also some smaller distributions, but then you often run into problems with missing packages. So that's not really a good solution. Maybe I should ask if any other people here kind of hate Laatig, even though they have to use it now and then. OK, that's good. Gives me some confidence here. So we had that. My goals in the development of RhinoType is for it to be as capable as Laatig or eventually even more capable. But it's very important for it to be very easy to use. These problems I mentioned in the last slide, I want to try and fix these. So it should be easy to style documents. And I hope to fix this by presenting the CSS-like style sheets. I think they're a good match for styling documents in general. And document templates should also be easy to modify or configure at least, or it should also be easy to write new ones from scratch. RhinoType is written in pure Python. Currently it's written in Python 3. I might do a back port to Python 2, but I'm not sure about that yet. And I try to minimize dependencies. So currently, it depends only on docutils for parsing restructured text. But if you're using Sphinx, you have that dependency installed anyway. And it also depends on a pure Python PNG library if you want to include PNG images in your document. The current status, I've already shown this slide during the lightning talk. So there was a first release a couple of weeks ago. It's a kind of early beta, you could call it. So it contains most of the features that are also offered by Laatig. One major exception there are equations. But these are definitely on the to-do list. And also important is that there's very few documentation on it. However, there's a readme, so if you want to try playing with RhinoType, recommend you start with the readme and that should get you started, along with this presentation, of course. Furthermore, I need to do lots of testing, like write many unit tests, and fix the bugs that pop up. I will release a new version soon. That will probably be early next week. It features nearly complete Sphinx support. For example, it renders the Sphinx's own documentation, which is a large document, and uses most of Sphinx's features. And it also comes with a prettier style sheet. So in the first couple of slides, I will present how you can use RhinoType on an end user level. And afterwards, we'll dive a little bit more into the details, looking at the style sheet mechanism, for example. So if you look at RhinoType in the center as an engine, we feed it an input file. That is a structured text document. So for now, that's restructured text or Sphinx documentation project. Other backends could be added. Just takes a lot of work, I suppose. In the case of Markdown, that work is limited. But for Dogbook, I think that's a very broad specification. But I believe it's fairly used often by publishers. So I think that's a good front end to add. The structured text, it can refer to images. These can be in PDF format or in PNG or JPEG format. So these bitmaps are more or less included as is in the PDF. So they don't require any long processing time. So these get fed to RhinoType. And then RhinoType looks for some style sheets and document templates to determine the style of the documents. These are basically both Python source files. I might provide a text-based layer for style sheets in the future to ensure security. But that is not yet present. And of course, we need some fonts to render text. And all currently widely supported font technologies are supported, including for OpenType and TrueType. And out comes a nice PDF if everything goes well. So who is not familiar with restructured text? OK, so that makes things easier also. So here's an example on the right. So it's basically plain ASCII text. And it's structured as the name says. So you can see two sections. And then there's an enumerated list with a bullet list inside of it. A very important feature of restructured text is that it is, in fact, extensible. So you can add new roles and directives. And you can use this contrast to LaTeG, where the input file is a text source file. You can basically program anything. In RhinoType, you're kind of, I shouldn't say stuck, but with this restructured text format. So you cannot program in it, but you can still extend it. And on the RhinoType side, you can implement a corresponding part. So in some sense, it's still programmable. And the content and style is more separated. So you're kind of forced to make this separation. And I believe this will lead to a better and a cleaner document. And even if that sounds like more work than in LaTeG, I think it will be much easier to do this in RhinoType than trying to figure out how to do something in LaTeG using the tech language. So like I said, there is also a Sphinx frontend provided. Anyone who is not familiar with Sphinx? OK, looking good. So you all know this. It's used for larger document projects, like, example, API documentation. But you can also use it for books and manuals that are not related to source code. And you can render to HTML, to LaTeG, and then to PDF, or EPUB. And now with RhinoType, you can render directly to PDF using nothing but Python and the Python packages. So I'll show a little demonstration. First, I'll render a small restructured text file using the command line tool that is included with RhinoType. And I'll also look at how you can compile a Sphinx documentation project using RhinoType. So this is just a small restructured text file. There are some images in the images directory. Open the restructured text file. So this should look familiar to most of you. Here I have defined a custom role for typesetting acronyms. This will be matched in the RhinoType style sheet. So these will be displayed as small capitals. And we have a paragraph with some inline styling, inline image also. And here we have a, maybe I should make it a little smaller. So a paragraph with a custom class assigned using the class directive. We will also use this to apply special formatting in RhinoType. This is, of course, something you normally would not do in a normal document. This is just to display the styling features of RhinoType. A variety lines paragraph, some more of these here. And then we have some typical body elements like lists, field lists, option lists, a code block, an indented paragraph, a table, and some images. We'll see how that gets rendered. So the RhinoTool is available. At this moment, it only accepts a single argument. But I will add some more so you can choose the style sheet to apply, and perhaps also choose page orientation, and maybe configure page margins if you want. So you can ignore this line for now. I will not explain this. But this line shows the error reporting or warnings in RhinoType. So it shows the input file that is the cause of the warning and the line number. So it points you to the exact location of where the cause of the problem is. In this case, there's a very long line in a code block because these code blocks are not line wrapped. It flows into the margin of the page. And it also displays that this occurs on page 3 of the rendered document. I believe this is a nice improvement over Laatig. The second line, this indicates that the first rendering pass has finished because page references or cross references have not yet converged. Because in the first pass, you don't know how many pages the document will consist of. So we need to do another page if this reference is included somewhere. So after the second pass, the output is written. And have a look at the result. So we have a title with a subtitle. Author is there. Then we have the first section, which is automatically numbered by a RhinoType. You can choose to number this in different ways using Roman numerals or just A, B, C, D. Then we have some inline styles, such as italic for emphasized text, bold for strongly emphasized text. We have some monospace text for literals. We can have subscripts, superscripts, inline images as here. And hyphenation is also supported automatically. You can even, on a subparagraph basis, so parts of paragraphs, you can set a language and the maximum amount of, or the minimum amount of characters to keep together when hyphenating words. And it uses open office or libre office hyphenation dictionaries to perform this hyphenation. Cross-references are transformed into links in the PDF document. It's interesting to note that restructured text supports unicode, so you can write in many languages. And since RhinoType is written in Python 3, it's internally all unicodes. So I have to say I haven't tested this thoroughly yet, but yesterday I tried with this Vietnamese text and it seems to map well to my example that I've seen. Here we have a left-aligned paragraph, so it's ragged on the right side. We have a right-aligned paragraph. These are center-aligned. They're support for kerning and ligatures, which can be turned on and off, as in here. So don't worry about that if you don't know what it means, but it basically makes text more readable. So in the case of AV here, by default, there's a rather large space in between, and kerning information in the font makes this placed closer together, so it's easier to read. Like in a title, it could occur that these are like two words, while in fact it's one word, so this helps readability. Similarly, we have ligatures. So FFI, for example, is contracted into a single glyph that is supposedly easier to read. Oh, yeah, in the top paragraph, we also have a footnote reference, and the footnote can be seen on the bottom of the page. Then we have a local table of contents with page references on the right, and they are also hyperlinks. And we have the typical body elements, like the lists, field lists, option lists, some block-level elements. So here is the paragraph that generated the warning, so the text is flowing into the margin. We have an indented paragraph, and then tables. We can have row spanning and column spanning cells, as in HTML, and the width of the table, and the columns, the individual columns, are sized automatically based on their contents. This is something that is missing from the Sphinx-Latech builder, I believe. I suppose that there's a latech package that fixes this, but I'm not sure if that's possible to use with Sphinx. Then we have a section with some images, so this image is simply kind of inline image, and after this paragraph, there is a floating image or figure inserted, and this gets floated to the top of the page, as you can see here, along with the caption. And then finally, we have some admonitions that are styled differently based on their type. I'll go to the Sphinx demos, so this is basically the Sphinx Git repository checked out. If we move into the doc directory, we're already there. I've made some changes to conf.py to configure RhinoType, so first, we need to include the Sphinx builder that's included with RhinoType in the extension list, and we need to specify this variable to instruct RhinoType what to render exactly. So this is very similar to configuration variables for other backends of Sphinx, so we will render contents.rst, which contains basically all documentation. We will output to Sphinx.pdf by specifying Sphinx here, it's by specifying document title and author. And the following part configures custom headers and footers for the document. So the headers and footers, they each consist out of three tab stops, so one on the left, one in the center, one on the right. So if we add a tab element, this moves the cursor to the middle tab stop, so this text will be in the header, will be centered. And for the footer, on the left, we insert the page number forward two tabs, so we end up on the right of the footer, and that's where we insert the chapter description, which is simply the section number followed by the section title of the level one or top level section. We can also configure the style sheets, page size orientation, and configure margins of the page for this Sphinx project. We can also configure the number of columns to use to typeset document. So I will not render this real time because it takes a little bit too long, at least for this presentation. So this is using the book template, so it comes with the title page followed by a blank page, and then the table of contents. This looks similar like the one in the previous document, and this goes down to two levels in the section three. It's a longer document. You can see at the bottom the page numbers are in Roman, small case numerals. And then for the body part of the text, you can see the custom header and footer we have defined. So page number on the left and section description on the right, and the style more or less was copied from the LaTeG style file used by Sphinx. So continuing with the presentation. I quickly mentioned SiteProcpy. This is a sister project to RhinoType. It's basically a CSL processor, and CSL is a standardized XML format to describe the formatting of citations and bibliographies. It comes with, or a lot of styles are available, so you probably don't need to write your own style if you are writing a document with citations. It parses a BeepTech databases, so you can use these, and it can output the formatted citations and bibliography as HTML restructured text or using the internal RhinoType representation. This is not yet usable from within restructured text or Sphinx, but this should be a very small step. So this is an example of how citations and references lists could be formatted by SiteProcpy. So this extends this schema, so we can add references and SiteProcpy handles the formatting of those. Now diving a little bit deeper into the internals, we'll look at the stylesheets to understand this. You need to know how document elements are represented. These are basically Python instances of flowable classes. So we have flowables, such as a paragraph or an image, and then we have inline images, which make up a paragraph. After that, I will discuss the stylesheets and how flowables are linked to a style definition. So as I said, flowable can be, for example, a paragraph. A paragraph adapts to the available width that is provided by the document template, and they are flowed onto the page. So that's a term that is often used in this context. Images, for example, they don't adapt to the available width or generally not at least. They can horizontally align themselves within the available space. And flowables form a tree that together form a document tree, such as is the case in HTML also, so that should be intuitive. Here we have a title paragraph, so it has a special title style. So this paragraph functions as the title of the document. And we have two top-level sections, and the first top-level section has two subsections with some additional flowables inside of them. Inline elements. For example, this short paragraph with nested styling is also represented as a tree. I will skip over that quickly. So stylesheets, like I said, are very similar to CSS. You first select the document elements you want to style, and you select them based on their place in the document tree, their style attribute, which is similar to the ID and class attributes in CSS. Or you can select based on any other attribute or any combination of these three. Stylesheets are plain Python source files. I also mentioned that before. We'll have a look at an example now. So suppose we want to select the paragraphs that are part of a list item here, but we don't want to select this one. We make use of a context selection. So we only select list items that are direct child... Sorry, we only select paragraphs that are a direct child of a list item. Or we can also use this selector, the Python ellipsis keyword. It represents any number of levels of flowables, and in this case, it simply matches the list item. We can match based on the style attribute as for the title paragraph using the like class method. And we can match on arbitrary attributes. For example, level two section headings, we can select like this. We limit the selection of the sections by passing the level argument to the like method. I'll skip the next one because I'm running out of time. There's some limitations to CSS, I think. So I added another level of indirection. I split up stylesheets between a style matcher and a style sheet. A style matcher basically collects all the selectors and maps these to style names. And these style names can then in a style sheet be mapped to a style definition. That means that single style matcher, because these are often the same. All these selectors, the same for multiple documents, can be reused by multiple stylesheets. I think maybe this might make less sense for HTML, but I think for a document processor, this is a good fit. Rhinotype also supports variables in the stylesheets to avoid duplication. This is a problem that is often not... Or this is a missing feature from CSS that is often complained about. And we can also inherit from one style to again avoid duplication. So let's have a look at the style matcher and stylesheets. So we create a new style matcher. So for example, we select a style text with the emphasis style and we assign it to the emphasis style name. The same for nested line blocks, which should be more or less self-explanatory. And we use this matcher or refer to it in a stylesheet we create. And over there we map style elements to each of the different style names. So emphasize text is represented using italicized text and nested line blocks are indented on the left. Variables can be used if we reuse a single value for multiple styles. For example, fonts is a good example of that. So we define a variable here, IEEE family, which collects a number of fonts. And we can use this as follows in a style definition. Note that you can also refer to attributes of this or of the value that the variable references. You have inheritance. Suppose we have defined a style for headings, for numbered headings. We often also have a unnumbered heading style for table of contents title, for example. In this case we want to base this on the default heading style and we can simply override one or more of the style attributes as is the case here. So all other attributes are simply the same as for the standard heading style. You can also extend this stylesheet. So for example, you use a standard stylesheet that is shipped with Rhino type and you want to make some changes. You can simply inherit. Additionally, you can add new matchers. So for example, for the acronym role that was visible in the demonstration, you can define a new matcher. Then you can create an inherited or a stylesheet that is matched based on the previous stylesheet and also refer to the new matcher. And now we can define a style for the acronym role, restructured text role. We set small capitals to true in this case. We can also override styles that were present in the base stylesheet like this. So we change the font way to bold additionally. And we can change variables that will also have effect on all styles that use this variable in the base stylesheet. So here we replace all fonts with some others. Then a small note about performance. So the Sphinx documentation. I forgot to update this, but with the new stylesheet, it's about 230 pages. And this takes about 70 seconds to render from scratch. So this takes two passes if the cache file is in place. So the document has been rendered already. It takes only about 40 seconds. I think this is not too bad for a Python document processor. However, I still looked into speeding it up a little bit. That would be nice. So I tried Scython. Unfortunately, I didn't get to make a big speedup with that. I think this is due to the fact that I'm basically doing lots of things with lists and dictionaries instead of tight loops, number crunching. Also tried PyPy. Fortunately, it was twice as slow as Cpython. The PyPy developers are not sure what the cause of this is, but I've talked to Fijal yesterday about it, and we'll have a look at it during the sprint. I should mention the license. So my intention is to keep Rhinotype 3 for non-commercial use. So open source projects can use it to generate documentation. I'm thinking about offering a separate license for commercial use, but I have to determine the details about this. It's a very complicated matter, I understand. So for more information, please note the spelling of Rhinotype. It could help you to say that it spells Python backwards. And please look at these URLs for more information and downloads. Maybe we still have some time for some questions. Any questions? First, hello. It's just an amazing project. I love Lottics and so on. I have actually two questions, but since CSS is like boring, I will only ask about math, about formulas. So your lightning talk mentioned, if I'm not mistaken, that there is no support for formulas, right? Mathematical equations. Yeah, it's not supported at the moment. Okay. So there is a project in JavaScript world called Katex. Which one is it? Katex. K-A-Tech. Oh, no, I don't know that. It's like a big JavaScript. Sorry. I know about JS-Moth or something. That's another one. So Katex, they are like explicitly dedicated to mathematical equations and everything related. But it's in JavaScript, so it's just like a hint. Thanks. Okay, thanks. How... Well, I guess you did try to compare with regular latex for the same document, the same content, I mean, because it seemed the compilation was horribly slow compared to proper latex. It is quite slow compared to latex, I think, yes. So that's why I tried to look at PyPy for speeding up. However, for small documents, this is not a problem. But I'll definitely look into speeding the rendering up a little more. And did you compare with Sphinx? Same speed as Sphinx? Sphinx latex render? Yeah, did you compare the compiling with Sphinx? Anyone, I don't know, the fastest. Make PDF for Sphinx and PDF with the Rhino type? No, I have not compared that. I would like to know, how far along are you with the idea of rendering this to SVG? I haven't started yet, it's just an idea. Any more questions? Which, actually, text processing library are you using internally? Because to do all the paragraph formatting and all the page styling, or type setting, effectively, what library are you using? Did you reimplement everything that is in text? Yeah, so everything is pure Python. I don't rely on external libraries except for docutals or the PNG library. Any more questions? Do you support or will you support absolute placement constraints? So if a box needs to be placed at an exact position on the page, or maybe if you have a table where the cells may not be broken, if it's multi-page? Yeah, so there's nothing in place for absolute placement as of yet. This could be added, but I'm not sure how that would interact with the rest of the elements. As for tables, you can set a constraint of how many rows to keep together at the minimum. So if you can just render like two rows on a page, it will skip to the next one if you have specified like four. Any more questions? I have a question about testing. So since it's all related to visuals, how do you automate testing and stuff? I've done very little testing up to now, because I've been constantly refactoring also. I'm not sure if that's a good excuse, but I'm also a bit worried about how to do this testing. I guess I will have to mock the backend or maybe it will just be limited to regression testing that I know which is the good location, just to make sure it doesn't change. Thanks for talking. I tried the line type with Japanese language techies. Oh dear. And the line type accepted the... caused the exception. How do you plan to support as a merge byte language? Yeah, Unicode should be a good part of that, but I'm sure there's more to do there. Of course, you need to also make sure that you have fonts that contain the necessary glyphs, but this is something I could probably use some help from since I do not read or write Japanese or many other languages. Any more questions? Hi. How do you handle images and is there some sort of scaling mechanism and line type? Yeah, you can specify... to scale it to the total available width or specify an absolute width just in points or in centimeters. Okay. Any more questions? Any more questions? Thank you very much for your talk and it has been really interesting.