 So welcome to the Debian Science BoF. I admit it's better if you've seen my talk before, because I laid out some problems and I do not want to repeat these problems here, but I just want to show you something about the Debian Science Team statistics, and then I would like to discuss some problems which exist to my perception. So who is the Debian Science Team? I made these graphs every year. I'm just updating and we see some people who are somehow leaving the team, because you have some missing things here, so Adam C. Powell is somehow leaving the team and Christoph Prutom is also leaving, but others will coming, so the Debian Science Team has some quite active people, and somehow people who are not showing up anymore, and the team statistics just helps us to detect the people who are leaving us. This is about the discussion. This is a little bit different, but you see common names, which are obviously I'm chatting too much, right? I'm just chatting in Debian Science. I do not do so much because I'm working more on the Debian Meet Team, but if there is something to discuss there, there is the Debian Science mailing list, and they have also a developer's mailing list, which is more related to the package. This is more for general discussion, and this is about the packages. People like Luka Nussbaum and Matthias Klose are not members of the Science Team. They are just reminding us about bugs. It's nice that you are here, Matthias. I guess you will tell something. We have some bug hunters. The information is drawn from the UDD. This is about the people who are fixing bugs in Debian Science packages. It would be nice if we would have some more of them. You've seen this. Adam C. Powell is leaving the team somehow, and he has filed a lot of bugs maybe to his own packages and has closed them, and we have the committers to the version control system, which is also showing that we have quite a large team. If the person who has 10 least commits has more than 1,000 commits, so you can assume we have 50 or more people who are working in the Debian Science Team because we have a lot of packages, about 1,000 packages. My definition of a team is that you wake up in the morning and realize that somebody else has solved your problem from yesterday. This is my experience from the Debian Meet team. I'm really happy that I'm in the team and I have thrown problems into the mailing list and really after waking up this way I solved. I wish it would be the case in the Debian Science Team as well, but my theory why it doesn't work that good is that in Debian Science is quite a diversity of topics. In Debian Meet we are closer to the topics, and so my proposal in my talk was to care for, well, closer topics inside the blends and split off some offsprings from Debian Science to different things like Debian Astro. For me this graph is the most important one if I want to evaluate the team because you see here we have a lot of people, these number of packages are touched only by one person, so the majority of packages is done by one person. That means we have a common team mailing list, we have a common repository, but we have a single maintainer relationship to the packages which we really want to avoid. Only a few packages are maintained by two people. This is quite a bad relation. In my other talk I've shown you for the Debian Pearl team they have very few packages maintained by somebody else and then it goes up and you have per package three to four committers. This is a state I really want to reach for Debian Science, but to repeat what I said before I think we can only reach if we focus on smaller topics inside Science and create more blends out of it. For some history we had this competing packaging team that is now fully merged. This packaging sky comp didn't exist anymore and the maintainers are contributing to Debian Science and I have the usual above end with a link to the wiki page. The copy doesn't work now. Ah, it works with this string. So we can do some notes. It would be great if somebody would do notes about this if we have some ideas how we could work together even better. I have set up this wiki page. It's always the same for every Debian Conf. What task we should work on. This is the result of some previous gobby discussions or some gobby protocols and I took over into the wiki. Yeah. What do you want to do now? Any suggestions? Anybody wants to make a suggestion? What's the most urgent problem? What could be done even better? If not I'll spoil in more questions, but are all in this room familiar with Debian Science? I see new phases here. I don't know. Okay. Anybody else? Do we have general questions? Can we take your mic? I don't know Debian Science before. It is Linux distribution. No. This is funny that in my talk before which you have not seen I have one slide and this slide says Debian Science is integrated into Debian. It's no fork and usually after my talks I will ask why are you doing a different distribution and now it comes a little bit delayed three hours later. So funny. It's just a team inside Debian, team of scientists or packages of scientific software which is somehow organized in a mailing list in using some common repository and trying to integrate scientific software as best as possible inside Debian. This is Debian Science with two sentences. And you can contribute if you're a scientist. Good. That's good. Any other questions? Who's a scientist in here? Hands up. Well, actually what I consider one of an urgent problem is scientists rely on some publications on certain versions of programs which we can't guarantee with our packages system because we update packages and then it's a new version or a version which was not used for the publication. What is your take on this in your experience? Well, we have snapshots. Yes, we have snapshots but one user is using the current version from stable and one other needs is snapshots. Docker images. That's an obvious solution. Actually, there's a science-based container thing. What's the name again? Singularity, right? Please, always to the mic on video recording, people inside will understand you but it's not outside. Most clusters, I don't think there are any known clusters that use Docker because it requires running a root-on-demand and giving users access to that. So if Docker or some other container technique is the answer to the question, what can we do to help users to easily create these kind of containers? Do you think it should make some sense to craft some scripts which draws the correct versions from snapshots and create a Docker image at request? Or should users just do it manually? So Singularity, when you make a Debian container, you put the mirror URL and instead of just putting your HTTP reader, Debian.org, you just put snapshot.debian.org and it will build your container with whatever was in Debian at that date and that will be your mirror from where you retrieve all your packages. Just to explain it to me, I wanted it a little bit slow with syncing. So a user comes and says to me, I need BWA version 2.1 and create a Docker image for me and what do I need to do to provide the user with this version? Yeah, but I would like to support this user. What the user needs to do or what... Well, so first to make clear we're not talking about Docker just because Docker is not reused on HPC systems, but so for Singularity, I think it's easier if you come from the point of view as I'm starting a project today and I want to keep whatever I'm using today for the entire duration of the project and so for that situation, snapshots is easy to implement but then if you're looking for a specific version you would have to find... you would have to find out when that version was present in the repositories. Yeah, this is what I mean. The user can do it himself, but the user needs to know snapshots. He needs to know how to create a Docker image. He needs to know whatever and I would like to make it... not brand that easy, but at least easy to say, please call a program which the arguments are the name of the package, the version of the package and press Enter and then the Docker image will be created for you. This is something... I'm not using Docker or any other container techniques, but this is something you think it's feasible to do and it's sensible to do. Any other opinions about this container techniques? Are you involved with Singularity packaging? Are you involved with Singularity packaging or something? I am involved. Okay. So, hear me. Okay, so... there's also Flatpak and Snappy packages, which is kind of useful if you want to drop a specific version of something on a machine but I haven't looked into either yet myself, but I've been meaning to because they're very lightweight, they're not as big as a container and they have their own mechanisms for updating the package, so it's almost like a Debian package in some way, but... Yeah, it talks about that, actually. But the point is that, I mean, that's good for desktop, but it's really not useful probably for HPC classes. Yeah, I think, well, Flatpak is just another way to create a kind of virtual machine-ish, Docker-ish, container-ish things. I don't care so much about the technique below, but it is possible to provide the user with an easy way to create this stuff. I work with Flatpak, and actually it is relatively easy after you worked around a few, well, differences and it's relatively easy to create one of these bundles in order to run your application. For you. For me at the moment, I don't think it's super simple yet, especially for a scientist. It's super hard because it involves knowing how to compile applications and how to make, requires you to know how the system works and how you can create this bundle, which is not something a scientist would actually like to deal with and want to learn, but I think it's actually, in this case, a good idea to look in this direction. Specifically, Flatpak in itself is right at the moment only for desktop applications and it doesn't really handle the case of console applications well. At least most applications I deal with in the scientific field are console apps. I'm thinking about actually making some changes to Flatpak and submitting a few pull requests to make console applications work a bit better. That could help. In the sense of snappy, creating snaps is actually really easy because you can base on existing depth packages, but snappy has the other problem that is very tied to canonical and it's canonical stores, so I'm not sure how well this would work for Debian in this case at this current time. I do think long-term, those bundling solutions, any one of them would be great for this specific problem. In the short term, we might need to work on them to get them to a point where they are useful for scientific software as they currently are for desktop applications. Yeah, that's my opinion on this. Well, as I tried to explain, I do not mind about it is called Flatpak with its advantages and disadvantages or other container techniques. For me, it's important to say, well, we know what scientists will do if they need a specific version. They download from upstream, compile themselves, put it in their home directory and they do it wrong. I do want to prevent this they do it wrong thingy because I've seen it and it just doesn't work that way and I try to make sure we can provide the users with something which is less error-prone than this process. Would it simple change route as a route environment? Change route, Flatpak, Docker, it's all the same. They need some kind of virtualization but how can we do it? Easy to... Well, you know how to create a change route, I know it. But how can we do say, well, create me this technique with package x versus epsilon and do it for me and run? The question is, what is easy, actually? What level of easy are we aiming for? Is it like something like Python's virtual and easy or is it as easy as... My definition of easy is my mother can do it. Okay. There are different definitions of easy but something like this. I can explain my mother on the phone how to do it. This is easy. I think we could craft a script to make creating routes easy for scientific stuff or to create Docker images or to create Flatpak or to create anything. But we would need to investigate which system works best and what the advantages and disadvantages are and then test whether the scientists would actually use this stuff. So lots of exploration to be done, I guess. What I also saw a lot these days is that upstream, there are packages that use condor packages. Have you heard about that? Yeah, condor is also some technique. I'm not so deep. Maybe if you can explain a little bit. It's some kind of alternative to maintain a parallel packaging system in your home directory. Is this correct? Maybe you could talk in the phone. I'm not an expert on condor, but I know it's like... you can select the prefix into a specific prefix. So you can be your home directory. You can also do it system-wide or you can set up a new environment and have a set of packages installed there. It's a new package manager and it's a new packaging format. And it's supposed to be an operating system independent. So, I propose condor. There is some effort to package condor by the Deviant-Made team, but it's kind of some work because some pre-dependencies. I think Carsten did some research about 50 Python modules to package also, but it's doable, in principle. And if condor would be the answer for biologists, I would try to implement it. Well, no, the point is that the upstreams would have their condor recipes. It's kind of like sidesteps-debian in a sense that... It's sidesteps-debian, yes. That's right. Singularity would be a solution for somebody to mail a container to a research partner and they can just run the thing the same way that the guy did on their HPC cluster and get reproducible results from the thing. I mean, Singularity is in Deviant. And, yeah. I don't think Singularity sidesteps-debian actually it expands it because if someone has a non-Debian cluster, you can actually create a Debian container and take advantage of the Debian packages that are there. So if somebody was running like how we do at our site, we have a Red Hat cluster, but people are making Ubuntu containers, they can take advantage of the Debian packages. Otherwise, the packages we make for Debian only get installed on Debian systems where people have root access. But here you can use them elsewhere. And the nice thing with Singularity is you make your definition file, you just install the packages you want, you have a environment setup that is that you preserve just for your project. One interesting thing about Singularity is that I actually listen to a talk by, I don't remember the name, was it the German Music Group meeting in Namsat? And he was an all-time X11 developer and he's now working for the HPC division at Zuza. And he packaged Singularity and he said that he had a lot of trouble with the Zuza security team because it's said user ID. And so they don't allow it. So that's an interesting point that Debian just got through FTP master and everybody's fine with it. But Zuza said no, no, that's a huge attack surface, we don't allow it, just as a command. So there is nothing comparable to FTP master for Zuza, so you can just upload or there is no FTP master thingy for Zuza, so you can upload and the Zuza security team blacklist or refuse to let it through because it's said user ID. Ah, okay, okay. That's the problem with the same thing you said, you need a docker, a root run demon for docker that you don't need it for Singularity, Singularity itself is said user ID. Okay. Well, it's the same thing with Changeroot, right? You need something to actually get in there, at least on Linux, you need to be root or some similar capability to get into there. Yes, Changeroot can help with that, but still. Yes, Singularity is a set user ID, but it's only for the mounting and creating images and for actually to create the image file but to actually bootstrap the image, you need to be root and so the idea is that you have a workstation where you are root and you do that, you build your container, then you move the container over and you don't run it it doesn't need the escalator so basically it's the author being careful. It still set you ID, but it's right, but I think it's only set you ID to be able to mount the file system but it's so that basically the developer is being careful about minimizing what gets done with the escalator privileges but still, yeah, it's well maybe Zeus is working on minimizing it, using capabilities or something and would be useful. What about to I don't think that we can continue here or we should continue on to this topic. What about test suites? We just reported about the Debian Meet effort to put test suites. What about anybody who would volunteer to mentor an outreach, google some of quotes, student to create tests suites for the Debian science packages. This would be also interesting. For me it's fun to mentor somebody who is doing this. I mean it's a little bit generic with the Debian science so maybe you need to talk a little bit about it. What do you mean with test suites? Well, as Natya is writing Autopaker J-Test for all the Debian Meet packages, all she will manage sorted by Popcon so we get the most used packages equipped with the Autopaker J-Test and I think it is really valuable and we should have it for other sciences as well. What is the scope? Is the scope that the program just runs correctly or are you actually... At least, that is a minimal requirement. Yes, sure. And if there is an upstream test suite you integrate that? Yes. But if there is not? Well, the thing is, if there is an upstream test suite, we use it as Autopaker J-Test. If there is no upstream test suite, Natya has written one. I started doing it for other packages that I maintain, but it is actually one of the to-dos I forgot for Debian Cam as doing Autopaker J-Test. We run all the test suites during the package build, but it is certainly very useful for researchers to know that actually the packages as they are installed are working correctly and I am a believer or converted now and I just didn't have the time to do it for Debian Cam. You don't have the time? It is the same for us and so try to involve interns because it is for an intern is an optimal task because you work down a list of packages and you can, if the internship ends, you just stop with it and then it is some amount of work it is done. This is quite a good task very fit for this job. And what I wanted to say? In Debian Meet we are doing the Autopaker J-Test that way that we also provide a script in user share doc package name run tests or something like this and put also readme.test how the user who installs the package can run the suite as well. So as an example how to run the program. I just got an idea about how this idea of running Autopaker J-Test would play in with the problem of pinning versions. So just a couple of days ago I had a problem where a test suite would break because BWA get updated to a newer version because the upstream author of the package that broke the tests decided to include reference data for his own programs output into the test. So couldn't that be an approach to think about how to write Autopaker J-Test that do this? Check for changes in the output and then give upstream some kind of indication when something is going to break so they can adjust their own code for... So you mean include if fails and mail X so... It doesn't need to be automated. You need to know when an upgrade in a dependency would break the results or change the results of a scientific package. Yeah, sure. There's a sense of tests to notice when they... So I mean if that would be a viable option to just let upstream know and work with them, they maybe even provide some kind of... Usually if a test suite breaks and you get a bug report and you forward the bug to upstream. I'm not sure if I understand the problem correctly. My idea would be make sure that Autopackage tests test if the results change and if the results change. Because the Autopackage tests also get run if a dependency is updated. So then you can find out what dependency broke the results or changed the results. Can you please take that? We have several mics so please speak to the mic. Yes, here's another one. So it's not about making sure that the program runs, it also needs to do the same thing that the previous version did or that the same version did with a different dependency. Isn't that just making sure that the test is running again when the dependency is changed? Well, I was under the impression that that would happen anyway. No, I mean if a package is not changed in Debian and just its dependencies change then in general it's not run. Then disregard anything I just said. No, but this is a good point that if the dependencies change then at least the Debian Autobillers will not rebuild automatically reverse dependencies after an upload. Unless of course it's a library and the library name changes. Sure, but that's all abilities. I was talking about Autopackage tests and the FCI. And I'm not sure whether the FCI does that. That's a crazy idea. Hi, I just have a comment about these auto tests. When they fail does that mean that the package fails to build? Well, at least the tested part fails. Maybe the package works for 95%, but this 5% of the test fails, you can't say, but it's bug anyway. Sometimes these bugs happen in the tests themselves, for example in the GNU Scientific Library when there was a transition from GCC 6 to 7, the tests just failed because the optimizer changed and some of the tests end up failing on some architectures. So you end up with a case of flaky tests instead. It's quite a common thing that not the program is broken, but the test is broken. But the test has to be fixed, which means the package is buggy because the test is wrong. Or the GCC is buggy. Or the GCC is buggy. Whatever. I think there are several layers. For example, in DebiCam we do have the tendency to run the test suite but not failing the package build if there's failures depending on the package because there are some packages where there are failures. So it depends on how the packaging is done. Whether you fail on the fail test and then all the package tests as far as I understand are independent of that. But you would run the same test suite during the package build and then you would run them and then you run it again after the package build. Just a small comment for GSL. In that case it was all the tests passing for a number of months and everything was perfect and then GCC update and GSL is failing and not building anymore. This is the sense of the test to know that things are failing. How did you realize that? Did you upload GSL or somebody that uploaded GSL and then it failed to build or was DEPCI telling you GCC got updated and now it's not working anymore? And then DEPCI automatically rebuilt GSL or it was OBS. OBS builds this. That's why I'm so keen on having test suites on all the packages to know when this fails. But if you don't run the test suite because you have no tests you assume everything is okay and it's not no matter what packages are responsible but we need to have the whole package for keeping sync. We have the problem that it's really difficult sometimes to get all the tests run okay on all architectures and I guess it would be really a lot of work to get everything green and then being at that point saying okay now we'll check whenever it fails we get a buck report or we see it's failing but I had packages which failed the test suite because I was running them on one thread or one core only and they would they didn't even think about somebody only running it on one core because they only run it on 1000 cores so and then well what I'm saying is that maybe we disabled that test or something but it's always a lot of manual work to get the whole test suite green if it's thousands of tests then it's a miracle of flakiness but yeah it's certainly the thing that it should be but maybe it's more easy for some packages than for others. Yeah As I said in the end somebody has to write the test to the first place and yeah how much time was 10 minutes maybe I can continue a bit in this duplicate names issue because upstream has a tendency to use quite generic names which are in conflict with other packages and I implemented for the DVM meet and well it's implemented for all but not realized in other things that we put a copy with the original name under user lip blends in this case meet bin and if the user sets this in his past then he can use a generic name because some scripts are relying on it it's maybe it's a little bit techish but it works like this because in user bin we have a non-conflicting name with it's not that generic but users really want to have this so what do you think about this solution No it's implemented you can use it it's in the p-link package for instance p-link is it's connected to putty and it existed and p-link is also biologically program we have it in here it's made p-link and in user bin it's some other name and if the user puts this or this is put in the past and if the user sets a variable in his environment then this past will be prepended for the system past and he needs to know that the original p-link will not work but this is a user decision but that wouldn't protect you against name duplications within the same blend would it well yes if you have name clash inside one blend then it doesn't help we have this problem anyway so yeah it's just to make sure that all packages are co-installable so we had the situation that putty and p-link couldn't be installed together putty was first and so it wins and so it's a policy not okay to make a conflict between these packages because they do not really conflict and so we found a solution this solution enables users to keep the old known name to run their old known scripts I don't get it but p-link well users who said what would the user have to do the user needs to do wait I think I have an example the user puts a file home.blends and says I'm a member of the deviant mid team that's what the user has to do and it should do because we could not write into any home directories the effect that in ATC profile .d where's this blend didn't I in deviant mid SR so there is a deviant mid SR installed by the mid common package and so if in home of the user is a .blends and it will be passed for the blend string and you get the blend pass and the blend man pass and this is prepended before your user for the normal pass so that particular user would then prefer p-link to p-link and not putty no other users is affected but this user who actively says well this is maybe this is the solution I came up with maybe you know better solutions it sounds okay it's not very discoverable I didn't know about it I think nobody really knows about it but I put it in there should be some user documentation exactly how we should do that but I would laugh if somebody would document this any other things DevTex is also running which is we should find a better design for DevTex I started with this for Debian mid and I stopped at some point in time and yeah didn't continue it these are basically the problems I have seen in the wiki any other problems if not we might stop here sorry and let's stop thanks for attending