 Next speaker is going to be Lars Jednus, who is going to be talking to us about swimming. Nothing to see on a river but swimming upstream. Thank you. I was a Debian developper for several years more than I remember. I have been told during this DEB conference that I actually joined in 95, not in 96 as I thought. But last year, about a year ago, over a year ago, I decided that I've been doing distribution development for long enough and I wanted to do something else. And this is the story of what happened next. I decided that I want to become an upstream developer instead of a Debian developer. I retired from Debian and sought out to conquer new worlds. In the intervening almost one year I have mostly worked on projects of my own. But this talk is actually about something that I did for a company in New Zealand called Catalyst IT. And a project, free software project called COHA. I don't need to tell you about Debian. You know what Debian is. The slide is here just in case I ever need to give this talk again to another audience who doesn't know about Debian. COHA is an integrated library management system. Library in this context means the kind that has books and CDs and movies and it's not the kind that has shared code. This is an important distinction for some audiences. It's free software. It's about 10 years old by now. So it started 99 or 2000 originally in New Zealand. And it's using a browser over the net and that's a sample screenshot. The resolution turns out to be worse than I thought. I apologize for that. How many of you have used the application ever? I'm just checking that the microphone works. So the COHA project and Catalyst IT wanted to have Debian packages made out of COHA. The reason for this is that most of the people involved in COHA really like Debian as a platform on which to run COHA for various social and technical reasons. And I think that's a good choice. However, they did not have any Debian packages and that meant that even though maintaining everything else about the server was nice and easy and comfortable and efficient, installing COHA, upgrading COHA, making sure everything works was more work than it should have been. So all the people installing COHA would install COHA from sources. And it wasn't just COHA, it was also its dependencies. COHA is written in Perl. It uses dozens of different CPAN modules. Might be hundreds, I haven't actually counted. And quite a large number of those modules are not in Lenny. And some of those which are in Lenny are too old in Lenny, so they want something newer. And therefore one of the first things I needed to do was package all the missing dependencies. I took a list of the missing dependencies and decided that I don't want to package about 50 packages. I will instead target squeeze because squeeze release is imminent. This was in March and I haven't regretted the decision. Squeeze had five missing CPAN modules. And I have packages of those and I'll say a bit more about those. From a more general point of view, I don't want to say that COHA is the only upstream that Debian needs to care about. I'm using COHA as an example upstream to bring up some problems I want to note. One of these problems is that upstreams want to use lots of new tools that might not exist in Debian. And the reason upstream wants to do this is not to make life difficult for Debian, but make life easier for themselves. If COHA can go and pick any CPAN module that exists, it might save them days or weeks or even months of work. However, if there is a case where the CPAN module doesn't exist in Debian, life becomes again complicated in a different way. So upstream has this urge to use new stuff. Whereas Debian would like upstream to stick to things that are in Debian in the current stable release. And this conflict is something that it would be good to deal with in some way. Most upstream projects don't have a very tired Debian developer or an active Debian developer who can do the packaging for them. So, perhaps something I can be done about that. So, I decided to make modules of the missing CPAN modules. And it turns out that CPAN modules are very easy to package. You run DH make Perl and you edit all the files it creates. Which, if you know, packaging is quite easy. The COHA project had tried to do this for an earlier version of Debian. And hadn't quite succeeded. They didn't have someone who was good at packaging. So, there's a lot of details you have to get right. I didn't get them right either. I did, however, know that there's a packaging team for Perl modules in Debian. So, I talked to them and they were extremely helpful. And especially Gregor Herrimann was really helpful. I don't know if he's there. Give him an applause. And not just him, but the entire team are helpful. And it was very important for the quality of the packages I made that I was able to get them reviewed and fixed things that would have been a problem otherwise. Then also the Perl module team has people who upload sponsored packages for other people and that makes things smoother as well. So, kudos for Debian. Okay, so I got the CPAN modules created. And then I was able to run the COHA test suite. COHA is one of those upscreens that actually has a test suite. Not all upscreens do. One of the things that Debian possibly should encourage upstream to do is to provide test suites. And they don't have to be 100% complete in order to be really useful. COHA has a test suite that has about 13% coverage. But that's enough to make sure that most of the basic stuff at least works before you try to do an installation. The next step for me when packaging COHA was to try to do an actual installation of COHA into the package and then figure out if something is missing at runtime. And given that COHA is a web application, it basically needs to talk to the web server and a database server. And I had last dealt with these kinds of things 10 years ago. No, sorry, 14 years ago. And I was having night births beforehand. Web servers in 1996 weren't the kinds of web servers that exist now. And the thing that really saved my day was the fact that Debian has a policy and especially the sub-polices for web applications and database applications. And these made things massively simpler. I had expected to be spending a month getting this fixed. It took an afternoon. And it wasn't just the policies, it was also the tools. So DB conflict common and DH especially. And some of the web server stuff helped a lot. You can all, I don't know if Stephen Frost is here, but Joey is here. Give him an applause for DH7. This stuff, the policy stuff in Debian, is something that we are, as Debian developers, familiar with. But if you haven't experienced what happens to people outside of Debian, when they try to do something related to packaging and hear about this the first time, it totally blows their mind. It's not something that most big projects have. They have an oral tradition. Usually maintained by people who left the project five years ago. Some projects have an excellent written documentation in the form of mailing lists and IRC logs that are archived in a public place. And that's massively better than nothing, but it doesn't, it's not as efficient as having a policy document. So good as for Debian for that. Cats happen somewhere. Everything that happened during this packaging process didn't go quite so smoothly. One of the things that Koho upstream had decided was that the Koho system has two interfaces. A public interface for library customers and the public can use. It can even be used without logging in and you can create a login and do some personalised stuff there if you wish. And then there is a staff interface. And the staff interface is used for things like adding books to the library or marking people so that this person doesn't need to pay the $5,000 fine after not returning the book for 15 years or whatever. However, this duality made it harder to make a package that works out of the box. So it's easy to make something that works on localhost, but there needed to be two different sites because Koho upstream had decided that there needs to be two different URLs for these things. So I had to use a different port for localhost in order to make it work out of the box when installed. This is an upstream decision that makes sense to upstream. But since upstream hasn't ever participated in distribution development, they have no idea that Debian will have a hard time from office. So now that I've informed upstream that this is a problem, they're thinking about fixing it and hopefully will fairly soon. Although my slide says stupid decision, that's a highly opinionated word. It's not that the decision as such is stupid. So upstream had reasonably good reasons why making this decision was good for them. And it works quite fine for them. If you're installing from source, then the person installing it is going to have to be editing with config files all the time anyway, so it's just one more thing to do. However, one of the goals of making a Debian package is to make sure that everything works out of the box without having any configuration preferably done by hand. And if you have to do some configuration, it should be minimal. So these kinds of things are easy for an upstream to decide wrong from a Debian point of view. So therefore Debian should talk to upstreams and educate them so that all of these things eventually go away. The next thing that was a problem was the fact that upstream provides sample templates for config files or sample config files. These never work after the box. Upstream doesn't want to make decisions that you will be using this version of Apache or that database engine and so on. So you have to go in and then you're installing from source, you have to go in and review every line of the config files and there where I think half a dozen config files up to hundreds of lines long. So that's a bit of a work. What I wanted to do was to make sure that you can install multiple instances of COHA or have COHA running for multiple different libraries on the same host. The company that was funding this work wanted to do exactly this. They wanted to provide COHA as a service so that they have one server running this for as many users as necessary and therefore I need to do some extra mucking about with the config files. So after I think four days of which are tweaking the config files, I had something that I could write a wrapper around so that I run one command to create a new library instance and it then copies all the files in the right places and fills in some values that are missing and then it works. The problem with this turns out to be that I had to take all of these things that upstream provides and throw them away and this works as long as upstream doesn't change any of their things and as soon as upstream does then this needs to be reflected in the packaging specific files that I made. Yeah, fair enough that's what needs to be happening but it would be ideal if one wouldn't have to do this. So again, pretending to be a conscientious Debian developer, I talked upstream and they agreed that hey, this system I've developed is actually much better. Also had massively less duplication of config files and they're going to be adopting that upstream as well so that's a good thing. The more generic point here is that configuring things on a Linux or a Unix system is a bit of a mess. Koho has its own config files. It needs to generate some config files from Apache and MySQL and a search engine called Zebra and there's one or two other things. Each of these has a different config file syntax. Last time have two different config file syntaxes. Life would be massively easier if there was a very small number of syntaxes for config files. If anyone disagrees with this point then please raise their hand and I will happily discuss this point. Most of the time the config file syntax differences are pointless and I suspect the reason is that every, in the larval stage every hacker needs to write a text editor, an IRC client, a config file parser and a couple of other things and that's good. They should just stop using them. I should stop using mine as well. One of the things that at least most of the WM project has learned is that having an application that reads a single config file is going to lead to problems. For example Apache used to have one single config file that it read which meant that if you want to have an other package or an automated tool add to that configuration like for example adding a new virtual host for Apache they would have to go and edit that config file automatically. Anyone who's tried to make sure that this always works especially when config file syntaxes that are slightly weird and keep changing, I'm not referring to Apache yet but there are other examples. It has learned that this is an extremely error prone thing. A massively better way of doing this is to make sure that the application can be pointed at a directory usually named something.d and can read all the files in there in some systematic order and pretend that they're all one file. A.d directory allows another program or another package just drop one file in a directory and then it works and this is very easy to do. Another thing that fairly few application support is what I like to call stacked config files so that you can have one in USR share, one in ETC whatever, one in the user's home directory. And the good thing about this is that if you have the config file in USR share it's something that nobody needs to ever change and it can have any amount of embedded documentation that you wish. If you put the file and the documentation in ETC it's going to be a config file and every time it changes the sysadmin is going to see a nice little d-package prompt saying oh, something changed. Would you like to see a diff you can't understand? UCF helps a lot with this but it's not infallible. So talking upstream seems to supporting same config file syntaxes, .d directories and stacked config files would make things simpler. If you can pick any two of these, life is still simpler. Okay, so I made some packages. Apart from the butchering I had to do to support multiple libraries per host the packaging I did was really quite simple. When I made the release the entire coha community showed extreme appreciation of this because it saved them huge amounts of work. Even though I could basically if I had known what I'm doing from the beginning made all this packaging in just a few days and if I never ever touched them again the packaging again it would still be an excellent thing for coha users. Even just a small package, simple package helps people. My packaging is in fact so simple that the coha doesn't work out of the box. You have to run a couple of more commands in order to do a patchy setup and something else. The reason for this is that I didn't want to make a package that goes and modifies a patchy's listen directive. That if someone did that to me I would be upset so I didn't want to do that to them. However the point of making the simple packages is important. There had been an earlier attempt at making a Debian package out of coha. They had decided that coha has a few hundred different configuration options all of these need to be supported via debcon. And ultimately that's a good thing I think for some of them. But they had tried to solve the entire problem and it's a big enough problem that it failed. I went in the other route and made something that actually works even though it's not perfect and people were happy. One of the things that happens when you have an old crafty project like coha is that something that started many many years ago is that copyright statements and licenses are not always quite up to date. And this also happened with coha. Coha has files written in 2009 claiming which have a copyright statement that claims that they were written in 2002 by people who are no longer available in the project. That's because people, upstream people who wanted to start a new file just copied an existing file removed all the code and left the copyright statement at the top. That's fair enough because that's what most people do. They start a new thing by basing it on something else. Also one of the wonders of free software that you can do that. However for if one or when one wants to have coha included in Debian, these kinds of things should be fixed. Luckily coha has been very careful about keeping version control system information correct. So all the information is in the version control system currently in Git. It just needs someone to spend a large amount of time to dig out the actual authorship information and so on. It would be nice to have reliable tools to do this. Anyone here who has ever done a copyright review of significant amounts of code. Would it be nice to have a tool that deals with this? Good. We agree. I have no idea if such a tool will ever exist. This slide is a or picture is from a small town somewhere in New Zealand clothing shop called copyright clearance. I have no idea why they are called that. Don't read this slide. It's too long. So the thing I would like to suggest that Debian does is make some kind of checklist or other document that helps an upstream project do things in such a way that their project becomes easy to package for Debian but also other distributions. There are 19 points there that I threw out of from the top of my head and by asking a few people it's entirely incomplete. A JPEG in a slide set somewhere isn't really the best format either. So a wiki page on the Debian wiki would probably be a good starting point. I'll be happy to copy and paste this in there if there's interest. The other thing that I think it would be good for Debian to have is something that might be called an upstream front desk. I'm good. What can I say? So one of the reasons there were no coha Debian packages so far is that the people in the coha community didn't know how to get the packages done. So they tried to do them themselves and that's an excellent thing. It's always nice if people do things themselves. But making a good Debian package takes quite a lot of expertise. Unfortunately, even with things like DH and DB config common and so on. So it would be nice if there was an easy way to contact a distribution like Debian and ask them, we would like to have a package of our software for your system. What should we do? Can someone help us? There is a way in Debian for doing this. It's called the RFH bug. This is slightly inadequate. Very few RFH requests for help or requests for packages will get a response better response at all. I suspect quite a number of people who read Debian develop kill file all those emails. They could go and talk to Debian mentor but that's for people actually making packages and trying to learn these things. Most of the time it would be better if there was for example a collaboration by someone from Debian who knows to make a package and upstream who knows and can change things upstream easily. So yeah, I would now like to open the floor for questions if Zack has any more. Do we have anyone running a microphone? We have someone running a microphone. So first Zack and then Francois. So I had three questions. One was how about an upstream from desk. The second was how about upstream guide coming from Debian. So I think you agree with this idea. The third I'm curious about your experience with the policy we have in Debian for web applications. So you said mixing the DB config stuff and the WWW policy stuff that you were quite surprised by them. While early on in the lead cycle of squeeze we had quite some problem with web application policy with something like I feel I think 50 RC bugs filed against web application in general in Debian and there were all legitimate RC bugs and we had a brief discussion back then and my impression at the end of the discussion was that we still lack a proper way of automatically enabling web applications in a way that when you install a package you get the application running by default unless there are security problems or this kind of stuff. So I just wanted to, if you can comment a little on that. The web application policy or sub policy that we have in Debian and you'll notice that I've started speaking we about Debian is an excellent thing. It's also inadequate. There are a lot of things there that a maintainer still has to do and decide for themselves. So it's a good start and Debian needs to be applauded for making this far progress but there's a lot of things that need to be done still to make this an easy thing. One of the problems here is that for example Koho depends on a few things that really require Apache and it would be nice if you didn't have to specifically require Apache but there was generic ways of doing things like rewrites or virtual hosts and so on. So instead of providing an Apache virtual host specification you provide something more higher level and generate automatically for whatever web server you happen to have. So I actually really like that list and one of the things I was just thinking about is that it would be nice to have this sort of checklist or upstream questionnaire and put a score for upstream. Koho is a 19 out of 20 upstream and a little star or whatever on the PTS or something like that. To encourage, there's the idea of having a basic manual which is what you said, guidelines to be a good upstream generally but specifically to Debian. But also I think maybe doing it in a fun way like having a score and would encourage upstreams that are currently three out of 20 to do a couple of those things that would make life easier for Debian and all other people really. I like that idea. Now we just need someone to implement. Hi, so one of the things you talked about was the difficulty of being able to configure Koho as an installation in a manner that allowed for virtual hosting of the web application. Now that you've gone through that experience, are there design patterns when creating the upstream source that would facilitate this that you could help document so that we can encourage other upstreams to follow suit? I don't think that's on your checklist up here. But it is a very difficult problem to get right and the voices of experience in that area are going to be very helpful. And I don't know if the web package, web app policy addresses this at all because I'm afraid I haven't looked at that lately. But if it doesn't, maybe that's also somewhere we should include that kind of information. Unfortunately I am still entirely ignorant of good ways of making web applications. I think one of the key things here would be that the application should be written in such a way that it can be given a base URL and everything it needs to reference is relative to that. And the base URL shouldn't assume, our application shouldn't assume that the base URL is at the top of a domain either. Joey, I'm sure, has a lot of experience of the URL-mangolin you will have to do to get this working, and a number of other people can also help with this discussion. But yes, totally, that's something that upstream needs design patterns for. Web application authors like every other software developer tend to think that their application is the most important in the world. I'm no exception. And I'm constantly amazed that not everyone thinks I'm the most important person in the world. Yes, I am. However, one of the things that is in which this manifest is that they assume that they can get an entire domain. Having an entire domain makes things simpler to implement, at least a little bit simpler to implement. But it also makes life worse for people who just want to try the program or who don't have the ability to add more domains for whatever purpose. I've had extensive experience in virtualized applications. I used to work for Alfresco, and before that worked for Interwoven. I read a lot of stuff dealing with that. If anyone wants to talk to me about that, I'm happy to have a discussion. One thing that's another interesting concern is if you wish to bookmark the state that you're in in a virtualized environment, there's a very cute trick, and I can happy to discuss that as well. I'm John Cox, JLNCOX. A better question? Just another comment here regarding this particular upstream. It came as a surprise to me to learn that COHA was not yet packaged, given that I have been reading for a number of years quite extensive commentary on COHA, on planet Debian, from a Debian developer. I wonder if you have any insights regarding that that you are interested in sharing. I think what happened, I came... I hadn't heard of COHA before January, and I didn't touch it before March, so I don't know most of the history, but my impression is that the earlier Debian packaging attempt tried to be too ambitious. And people ran out of time for making that happen. Since I'm very lazy, I did it the simple way, and then it got far enough that people got a lot of use out of that. The COHA packaging is... I don't maintain that anymore. I turned it over to someone who works for a catalyst, and they're now taking care of that. Before I left catalyst, I gave them an internal half-day tutorial session on making Debian packages. And that was also quite popular, and at least two people unknowingly to each other have told me since that they now view the entire world as something to package. Every time they have even a simple little script that they might ever want to use again, they want to make a package out of that. And one of the... I want to praise Joey again. One of the reasons why this is so is because DH7 makes it so easy to make packages. I have given such a packaging tutorials before. Sometimes the people have been asking for me for weeks to give one. And then I never hear of them making a single package again because I was doing this entirely... Back then, I was doing it entirely without Dev Helper. And for some reason, people find writing 50-line make files more work to write than copy and pasting a three-line one. Then again, that brings another issue that upstream often does wrong from a Debian point of view. The build system and configuration system for build time configuration tends to either be archaic, highly manual, or require excessive attention from the bagager. Again, it would be nice if there was a very small amount of build systems out there. I'm not going to be saying how many, but smaller is better. Perl is one of the universities where it's fairly nice because there's three popular ones, two popular ones. Makefile.pl and something else. And Python has a couple of standard ones and C-programs have new auto tools. As much as I don't like the new auto tools, they do make packaging life easier most of the time. And one of the things I would like to see before I quit computing is someone who turns out to be the world's best build system engineer and solves this problem once and for all so that nobody else ever has to worry about it. And solves it in such a way that upstream also doesn't need to worry about that. Auto tools tends to require upstream to write too much code. Likewise, some of the other alternatives I know about. It would be nice if there was this group of people who concentrate on making sure that nobody else writes more than one command in order to configure, build and install any kind of application. I know that's somewhat unlikely, so I'll be willing to settle for slightly more different kinds of build systems. Any other questions? So when are you officially going to rejoin Dubbin? That's off topic for this talk. If my hotel internet works, I might do it tonight. Otherwise, I will do it from the country without internet called New Zealand sometime next week. Any on-topic questions? In that case, I thank you. Boring slide, don't read.