 Yes, thanks. So as the introduction said in this talk is about ski comp dot alto dot fee and how it's built with Spinks and some of the history and like background book that you might need to modify it and where we will go in the future So as before I share a vertical screen here So half your screen is for me and half is for you to follow along or do what you'd like So my talk is actually Self-hosting so my talk is contained within ski comp dot alto dot fee itself and I will present it using this thing called Mini pres which converts a spink site to something that's somewhat like a presentation so let's get started so the basics and So as before questions are here, please write your questions Any time and I'll answer either live or at the end depending on what's most appropriate Sco, so ski comp dot alto dot fee is the home of alto scientific computing's documentation So before 2017 it was Triton's documentation using alto's confluence wiki Now it has information on many different topics about scientific computing So it's not just Triton, but some of the Introduction to alto resources training data management research software engineers and other things It's ranked somewhat highly in search engines as in it's not too hard to Find it and Occasionally as we're searching for problems we end up back at our own documentation Which always shows that we're in sort of deep and if anyone's interested the scripts to convert the site to Restructure text from confluence can be found here So maybe before we go too far we can talk about what are the properties of good documentation Well, there's obviously organized and easy to use Um There's versions so you can tell what the former versions are It would be nice if anyone can contribute and it's not locked behind a few people You'd want it to be shareable reusable licensed so that Well anything you do can be used by others and well we care about open science. So that means it's not just The science that comes out but our processes No lock-in is quite important. You don't want some Proprietary system which will take a huge amount of work to transfer anywhere else plain text is always good so that 50 years of Unix test text processing tools Can all work and do things? Here's countless times I grep for the through the documentation to find one thing that needs to be updated everywhere like some slurm option or something Ideally it shouldn't be standalone so we can directly pull in other materials Like you have the script and you can include it with literal include and the script is also usable otherwise and well To combine all of the above together something like it is natural and that's what we've got So our basic documentation stack is we have a git repository for the docs It's hosted on github The documentation is written in restructured text or markdown. So really it's mostly restructured text and Then it's built with sphinx with various extensions and then the web server is hosted on Free service called read the docs so maybe to go through I can demonstrate making a change the docs and Get right to it. So there's this Checklist here that I want to add to this So this page here is the research software engineer page on ski comp.alta.fi and I want to add Another link right below here So let's see how this would work. So okay, here's my terminal Let me know if this is not readable So I've already cloned the git repository. So I won't go through anything new So the URL I want to add it to is RSE so let's see what files we have here Well, we have a directory called RSE So let's use emacs to open RSE. So what's in here? Well, there's a bunch of stuff but index looks like the right thing so index Okay Let's scroll down And we see under the Thing there is checklist So now we see the first Main point about the docs. So this is not just a bullet list like you see here But within sphinx, there's this semantic structure of the table of contents tree which is used to Like build the tree structure, which is what you see on the documentation sidebar So how can we add a external link to this? So I'll copy it Let's come back here. So So it looks like a checklist here and if I zoom out that the sidebar appears and you see how There's these things of the sidebar and it maps to the different sections you see here Like about research software engineers talk three Community become an RSE like community become an RSE So under checklist So I will use a little trick here and I will say I Will add it like this So I know So within these table of contents trees, you can also have external links like this This is a little known feature, but let's do it So I will save the file go back to the shell and since this is about Sphinx and not about git I will quickly use my git PR utility to make this Get PR new JOSS checklist I will add the file I will commit so I add link to the JOSS checklist Save and now I use There's some option I actually have a shell alias that I use for this Okay, so there it goes to pull request So let's see what happens if we go to github we see Under pool requests, there's several open, but I see it's open here and if I click You can see Well, there's no description the title is good enough Files changed. I see what's added and now here there is a check that's in progress So this is checking for any warnings that sphinx may have so I have this configured where some things will be Completely will have be fit failures and stop the build and Some are just warnings if I click on details I See, okay, so it's building And this is github actions here, which is a quite nice surface for testing things Maybe well, this is running. I can show how you would test it locally if you do make check. Oh Actually, I usually do a clean check then it will do the checks and Surprisingly it doesn't fail. I thought I would have to activate the virtual environment But anyway make check uses the same settings that the Github actions uses to test but github actions is Set to ignore different warnings like here. We see we saw a bunch of warnings about things that are not in the So this is because the virtual environment isn't activated some of these other ones are Documents are not included in the table of contents tree, which will we just don't have everything there because some is included But at the end we see okay, no error. So that means there's no fatal errors If we come back here, we see finished. I'll check successful So I can now merge the pool request If you would like to preview this locally first you can do make actually make clean html is done as part of the Make clean check process anyway builds and Well now I have to wait for it to build again, which will take a little while but The outputs get put in the underscore build html directory where it can be viewed with a browser and There you go. So you can open this and check it out so that's the basic summary of The process let's see Did I merge it already? So I'll click merge confirm Delete the branch and I'm all done Okay, so I think this is basically in everything So it'll take a few more minutes for read the ducks to build it so that way we can see it At the end. I see no question so far Please comment there Okay So now we get to the details and this is basically what I've already shown Let's see the extra notes here. Are there's a requirement dot txt file that includes all the Python dependencies to build it But it's also buildable with stock Debian and Ubuntu packages So you don't have to do this extra installation in order to run things The comp.py file contains all of these spinks configuration The index.rst is the root of all docs So this has the root table of contents tree directive and make file is used to build it So this is basically a wrapper to the spinks build command Make clean search check whatever Spinks auto build is a nice utility so it starts a web server and Rebuilds it when any file is files changed and you can build the results and build And yeah, so there's this question coming up about the search working on local preview So this is a static search. So if we look in the preview Build HTML, let's see maybe under static Anyway, it compiles all of the search terms into some sort of JavaScript file Which is then usable to do the search Well, someone can find this and put it in the Hack-n-D. I don't need to answer this myself necessarily But yeah, so search is client-side in the browser by loading a JSON file that contains all of the terms in the whole site So it doesn't require any server-side support to build Okay One could also edit the docs on the web So if you come to any particular page Let's see. Hmm. Let's come back to the code Let's say I want to edit that same page. I can navigate to RSE and Then index I Can click edit can add something new and if I scroll down then you can give the message and either commit directly to the main branch or Make a pull request on a default branch So this doesn't require using the command line to edit So it actually is somewhat accessible to everyone Could we recommend using this more so that more people can contribute to our docs? Okay, one of the key points here is the sphinx table of contents tree Which is how all of the material is organized? so within the Well, we saw the example here There's a table of contents tree and you can give these directives different options like here It will go only to a certain depth in showing the subsections And then you include the different pages in it. You can use wildcards and so on and Then each of these pages is scanned for its own table of contents tree To be to have the sub pages inserted and so on and so on Sphinx internally goes and builds this structure and then uses it to write things out So I won't do this myself But as an example you can follow the talk three directives from index to alto index to alto Jupiter hub to Alto Jupiter hub instructors index and so on So it makes sense you can basically add things for complicated cases like the difference between having talk trees in sections or Talk trees not in sections on pages. Well, just give it a try and find out and I don't think and ask someone if there's help I don't think there's much more to be said there So arrangement of the site so skicom.alto dot fee started from the Triton wiki it then grew these other top-level sections for other things and basically more and more got added there and Really, it's about time to rethink how it's organized. So is there a better way to do things? Is there too much here? Should something be hidden? Should the main page be more or less verbose Yeah, but this is something we could discuss after the talk But most of the sections are sort of historical things that I added because I thought they were useful and made sense and It hasn't really been masterplanned beyond what you see here right now other details about the site So sphinx is a full-fledged extendable documentation generator So it has many extensions like there's sphinx skits tab which provides modification times for pages and visually I Read the ducks thing these things. There's basically so many different extensions that we can use for different things And I've invented invested a lot of time in Digging into this quite deeply Mmm, and I quite like it. It's written Python and a quite good tool. So restructure restructure text Why restructure text instead of markdown markdown was basically made as a thin wrapping over HTML but to make high-quality docs like this some sort of some more structure is needed Like these table of contents tree and other directives So Now there is a parser called my structure text or Maybe it's called markedly structure text. Yeah markedly structure text Which is a reasonable alternative to restructure text, but really it's more like a different restructure text syntax than markdown But it's sort of Forced into the markdown limitations Um markdown also, I like to say markdown is dead So of course it's being used everywhere But the original creator of markdown says I don't want anything else being called markdown This specification should never evolve So basically there's something called common mark where people try to standardize things But all the different syntaxes are so different. There's no real portability between different things beyond simple things like sections and bold and links Which is a bit of a problem in Portability Let's see like you can read about restructure text and I'll hear more if you would like Let's see. Oh There's a good question Sphinx versus LaTeX can we contrast them? So yeah, I'd like to say that markdown is like HTML and restructure text is like LaTeX So really restructure text is different from Sphinx It's maintained in a different project which Sphinx uses which occasionally every so often when The restructure text project which is called docutils changes Sphinx will have some sort of errors or problems in it Mmm, but those are usually pretty quickly discovered and fixed. So I would say Sphinx and restructure text is like Like LaTeX But a more readable LaTeX so something that's reasonable to read in the plain text format Mmm Yeah, so some of the most surprising restructure text points. So the literals have double quotes Like this. So why is that? Well, it's because The single quotes are used for other links and references. So There for example, there's more internal references within a document than external references usually and the internal References is what single quotes is used for but this is configurable. You can make single quotes quote take on any role you wanted Links are scoped. So here we see examples of the single quotes being used to refer to another document or to a reference And there's lots of other things you can use for Sphinx here Like for example, you can use this to refer directly to a Python Python object which can link to the upstream Python docs This is called interspinx and underscores or links have two underscores after them like this one underscore actually works But then this text gets taken as a reference target and can be used in this kind of reference I think and if there's ever two links with one underscore in the same text then weird things happen Okay The github actions check provide warnings on errors. So we can in fact See that here, let's look at an example I hope it appears. Oh, here we go. So we see a warning within Here there's a Matcher which matches the problems to the locations This was actually not caused by this pull request, but a long running problem, which is still in here And here we see the errors and why things fail and Here we see a link that doesn't have underscores after it Okay, the actions view I already showed in a previous demo and I have these Checks as relatively strict And I've disabled some options that make more flexible Restructure text with the idea that explicit is better than implicit So if we want our docs to be reusable in other projects, we should be the explicit ones and strict ones Okay, before we get to little-known features, there was a comment on the licensing model Yes, creative comments by was used Okay, so little-known features we could use markdown or jupyter notebooks for pages if we wanted and it all works basically together But restructure text really is nicer for this kind of documentations than trying to shove directives into common mark It's compatible with many other projects. So Spinks is used for documentation of many Python and other projects It's used in recent code refinery lessons for examples code refinery manuals Many things so stuff can be copied around easily Comment of the previous thing here Why is it CC by and not share like well, we sort of discussed among ourselves and Did things and yet the authors are not visible Yeah, the authors are not visible other than the get history. I guess we could do something about that if someone's Interested Yeah, perhaps we should look into that Okay, mini-press. So this um, this thing I built Uses some client-side JavaScript to turn any page into a presentation format and well I've been giving you a demo of it right now So I don't really know JavaScript or these things. So if someone could help do it more properly that would be really interesting Redirect to HTTPS. So read the docs doesn't natively do this for external domains So it's done with JavaScript. Maybe someone could improve this Other output format Spinks can output to other formats such as PDF single page HTML EPUB manual pages and so on Perhaps there would be a use for this somehow somewhere There's this extension substitution, which is more powerful than a Simple Like replacement substitution which could be used for Making the site more general maybe as a demonstration Like I have some images in here now Yeah, so here we have a demo. So there's a replacement and there's a substitution which is the ID Substitution and then the original text So when it's built You can see the substitution inserted in like this You can build it showing the originals or you can build it showing both the original ID and substitution And the point of this is that it's easy to keep all of these substitutions in sync So if it's just a plain substitution You have to basically look through all the documentation to see what all the substitutions aren't anytime something changes Then it has to be updated Yeah, and they can also produce a list of all the IDs all the original text and all of the replacement text This could perhaps be useful for other sites to Like make their own custom version of this Okay, this thing skit stamp puts a timestamp on the bottom of every page You can scroll down and see it yourself So there's some open questions here Should we use pull request or not When do we push directly and I think in practice both are fine It's up to you to decide and if you're the main maintainer of something and you don't expect anyone to give useful Contributions then well just push directly. Why not sharing with other sites? We had a long-term plan to share ski comp that also fought that fee with other sites so they could customize it for their own tutorials This hasn't really been done yet and by now the docs are so complex I'm not even sure if it's a reasonable thing to do or maybe in order to do this We need to split it up into several different Subprojects and then we could Share it a little bit more easily Because everything is plain text you can import one Spinks projects into another using git sub modules And then to talk three to refer to the other things inside of it um other others that ski comp Alto feet can use the Docs is their place like for example Alto Linux people could be invited to use it Maybe there's some other Department ITs that's tired of using the other options and we could combine things together There's a lot of possibilities here. Can we make the docs more testable? So one dream we've had is to Make our examples testable Where you can basically clone it onto triton and then Run pie test or something on it and it would tell you if all the docs are still up to date and they still work um, for example this what we see here Includes everything that's needed. There's a restructure of text file that includes python openmp And it also includes the slurpscript And if we go back we see well the raw openmp file And the raw slurpscript Could we somehow link these together so everything can be automatically tested? So in practice we're not too far from that, but how do we do that without making another layer of writing to connect things together And defining this I think it might not be too difficult with Making some sphinx directives or tags which can say This directive indicates what the slurpscript is. This indicates what the Other scripts are or something like that Integrated hpc examples So we have an examples tree here Which has a bunch of examples and a separate repository which hpc examples Can these be unified and the second included as a sub module? in there Could we stop using read the docs now? So by now we have github pages that could also host this um Just as well. Yeah, that's a good question How can we keep things up to date so it requires continuous work just like any documentation What should the threshold be for removing old material? There's this extension that you can use to put a Date into pages and it will warn you when a page hasn't been Looked at or as the metaphor goes dusted off recently This is something we clearly need to think more about So visitor stats so read the docs provide limited visitor stats based on web server logs um I'm sort of opposed to using detailed Individual web tracking, but is there some way to Get detailed more detailed stats without tracking people individually Hmm I guess if we hosted this on our own web server We could have the full web server logs and Do anything we wanted And finally for Building a community Is there any way to get more people to contribute to this? Hmm Yeah So where's it hosted right now? It's hosted on read the docs.org which I can come and demonstrate here So Logged in we're at projects alto ski cop. There's an admin area which includes Well, the main things which someone might use are Actually, we can look here traffic analytics So we see over the past 30 page the View statistics so We can see where people end up, but this is basically the extent of it Oh, and if I turned on javascript we can probably see Something more. Yes, there we go Daily page view totals Hmm Oh, it also shows some search statistics what people search for So we pay five Euros or dollars per month for Read the docs gold which removes the advertisements from open source projects Let's see. I mean if someone needs to they can come through and search through this If you look at versions you can see the latest version You can edit and sometimes you would wipe and rebuild it if things go wrong So the next question is there a way to make part of the documentation private So since everything is in one git repository that's sort of difficult, but well This is text files in a git repository So we can easily have two git repositories one that's public and one that's private And you host them separately and the web server provides access control however you might want Or you could have them as sub modules of each other or you know Um as sub projects in the web server tree and then um Limit access however you would like with normal web auth So Yeah So if there's no more questions, I will stop the Well, I'll give another few minutes for presentation for questions Which should be on video and then I will stop the recording And we can see What is um What discussion there's have via zoom Okay, well, there's nothing more. So thanks for listening and I hope this has been a useful tour of What we do I guess as a summary There is really a great value to having Documentation version controlled in plain text format and there's many different tools for this and of them Sphinx is a pretty good tool to build on So I guess thanks a lot