 Good? All right. Wow. I'm very sorry for using a sexy title for my talk. I couldn't pick a title so you can pick one yourself. So let's get started. Just a little bit of context first. From the title with package managers, I mean people. I mean a lot of people in this room, not the tools, but the people behind the tools. So it's in a very broad sense. And basically anyone who installs software, I consider a package manager. Just keep it at that. There's going to be a bit of focus on scientific software here, mainly because I'm the lead developer of EasyBuild, which is a tool for installing scientific software. But I didn't try to really focus on that too much. And most of what I'm mentioning is not my ideas. I'm just venting other people's ideas and you'll see why in a bit. So the goals of this talk, I want to present techniques to you that you can use to make your software difficult to install. So I also tell you, show you some excuses that you can use to get away with it because you may need them. And I'll give some examples of taking things to the extreme so you can score some bonus points. And also show some examples of projects that are currently doing a very good job at this. I'm sorry, this shouldn't be there. So common aspects of the things I'm going to mention is creating confusion, surprising people in a bad way, annoying people, frustrating people, and just overall just wasting people's time. That's kind of the goal here. So why would you want to do this? Well, if there's less people that use your software, then they're not going to find bugs and you won't have to fix the bugs. People that use your software may ask questions or ask for feature requests and then you may have to get back to them or actually do some more coding so you don't want to do that. It's a way to avoid getting contributions so you don't have to review them, you don't have to test them, you don't have to maintain them in the long run. That's good. If they can't install your software to begin with, hopefully they will give up quickly and you'll not get any of them. And it's also a bit of a way to motivate more people to use tools like Easy Build that make it easy to install software. We just wrap around all that crap. So the first technique, creative versioning and release management of software. So some ideas here. Don't use semantic versioning so don't see semver.org. It makes sense and that's why you don't want to do it. One cool trick that I've seen some projects do is you have a release out there, let's say version 1.0 but you notice a small bug in it like the minute you published it. See, just fix that small bug and push it again. Same version. People will not pick it up anyway that quickly and if people are not using your software they're not going to notice so that's good. Don't do any bug fix releases. You just have a release and tell people there's a GitHub repository that has all the bug fix releases. Just pull them from there and apply the patch, whatever you want to do, do it there. Or even better, create a web page with instructions, not a patch file, but instructions on how to fix the bugs you know about. Or just don't do any releases at all. Have a GitHub repo, point people to the master branch and tell them it's always stable, you can always use it. And let them come up with their own versioning scheme. If they want to use 1.0 or a date or whatever they want to do, they can do. Also, if you publish a new version, remove the old ones. Nobody wants to use them anymore. The new stuff is better, it doesn't have the bugs, and you want to prevent people from using it so don't do an archive. Right, so some of the excuses for this because you won't need them. It was just a tiny change so it doesn't need a new version, right? Just re-release in place, that's fine. Versions are not that important as they used to be with the whole Git, and every commit is basically a version, so why do you need versions? And people should just always use the latest available version, so you can use the term version without using versions. That's good. Old versions have bugs, you shouldn't use them anymore. Did you really well at this? Don't version your software and then have very strict version requirements on your dependencies, right? And also just motivate the utter lack of a versioning at all, so tell people why you're doing it. So, examples. Openform, some people here know it, right? They even have openform developers in the room. There are three variants, you're one of the other ones, yes. So, one of the openform variants, the .org version has this. They don't do proper bug fix releases anymore, they have a 5.0 version, and they explain it on the website where there will never be a 5.1. So, they just tell you there's a GitHub repository that gets all the updates. Go there, every now and then we have this link that points to a commit, we will update this, this is your bug fix release. Another example, Worf, this is one of the weather modeling simulation, climate research software. This is used on the biggest supercomputers in the world. It's one of the top five applications on the blue water system, if you've ever heard about supercomputers. They have a website, and this is pretty old. I had to find a good example, but they still do this. So, they have versions, they have bug fix versions as well that they release. And then they have known bugs, and they have this web page that says, go to this file, it has this in the file, it should be this, so please fix it, save the file, and then recompile the code. They drew URLs there, you can check it. Bioconductor, who knows bioconductor? Some people do know a lot. So, bioconductor is a bunch of R packages for doing bioinformatics. Let's keep it at that. They have this concept of releases, a bioconductor release, gets a particular version number. Inside that, or a bunch of R packages, what they do, even though they have a version here at the top level, they change the version of the R packages inside of it without changing the 3.6 release. So, it stays at 3.6, and it's like a moving target. It gets better if they update one of these R packages in a bundle, so in a release, they throw away the old version, because they tell you it had bugs, so you shouldn't use it anymore. So, you can't even, if you tell somebody, I was using this version of bioconductor at that day, at that second, somebody else will not be able to reproduce it anymore, unless you give them all the sources that you downloaded, because they just throw them away. Nobody should use it anymore. It had bugs. It was an excuse I mentioned. All right. Don't give people release notes or change logs. Just leave them guessing. If you do have release notes or somebody is forcing on you, just say, minor enhancements and bug fixes. And as an excuse, you can tell people, just use the commit history, right? It has all the changes. You have clear commit messages, like fixed typo and all that stuff. They can look there. Bonus points, yeah, mentioned on the website, they are coming soon, and they're never put in. This is going to be a bit more controversial, maybe. So, vendering of dependencies is something that a lot of projects do, actually, they include dependencies in their software, that you literally ship copies of tar balls or impact tar balls in their software. It makes installation a lot easier, which is, yeah, it's hard to argue with that one, actually. And they know how the dependencies should be installed and how they should interact with the software that they are for. So, bonus points include the dependencies, but don't update them. Even if they get bug fixes upstream, you're not seeing the bug fixes, so you're happy. Patch the dependencies, right, and don't send them upstream. Just keep the patches. You have the patches, you're fine. And only do it for some dependencies, not for everything. So, like, say, half of them you just include, the other ones, well, they have to install. You have to install them, right? A small variant of this is, install the dependencies as the software is being installed. So, pull stuff in from the web, like curl or wget or whatever, install them. Let's say configure, make install all the dependencies in the software is being installed. So, that's sort of like vendering. Makes installation easier, well, maybe. And just to see, you can just tell people all the internet is there all the time. So, why wouldn't I be able to download? Because you're doing this, you don't have to document dependencies because, yeah, you don't need to, they are there already. Or it pulls them in. You should make it difficult to provide the dependencies in any other way, like this is the only right way and not some other way. Again, only for some dependencies. And then change your mind at some point, right? We were doing this before and we expected it to work really well so we stopped doing it and now you have to provide them yourself. And then six months later, flip the bit again and go back to it. Setup tools has done this. Python setup tools has done this. More dependencies is better. Build the stack as high as you can. So, people need to install more stuff before actually getting to installing your stuff. Try to pick dependencies that are hard to install themselves to make them lose time. You don't want to reinvent the wheel, right? So, you use dependencies rather than doing stuff yourself. That makes sense. You get a Jenga tower like this. Use dependencies with different programming languages. Again, another factor of difficulty. People hopefully don't know about, let's say, OCaml. So, try to use OCaml. And then to reverse it a bit, make your software a common dependency and try to rule the world. Now, this is a very good example. Many people will know this. So, the NPM world had a bit of an issue there. There was a maintainer that had very good reasons to just remove all his packages from the NPM repository. Over 250 modules. There was some legal stuff involved and he just pulled everything from NPM. And some people were using it. The left bad thing is something to indent strings. So, it's like 20 lines of JavaScript. It's very trivial to see, but many people were using it. He pulled it off and everything else that was deploying with NPM just broke. You couldn't deploy it anymore. They couldn't find the left bad anymore. The whole internet toppled over. If you want to read more about this, check the register article there. China is an example. Again, from Bioinformatics, these people are good. So, this is an example. This is the dependency graph. So, this is chime. This is what you want. Everything else is underneath. This is the compiler and some stuff in there. And everything else is like a whole bunch of crap that you need. You need Python, Perl, R, Haskell or Camel to use chime at least to its full extent. Some of these are optional, but if you want to use everything, you'll need to do this. They released chime as a VM and as containers because they want to have people use their software. Don't do this. Let them install it, right? This is the recap. Hard coding. There's a lot of this as well. Should hard code stuff as much as possible. Names of compiler commands. There's only one compiler out there. There's GCC. There's nothing else. Compiler options. Try to pick flags that only work with GCC and nothing else. So, stuff will break. By the way, for people that don't know this, if you don't specify an optimization level in GCC, you get minus of zero. So, stuff is going to be bloody slow if you don't give it a minus of two at least like a make file. That's not a joke. Locations of libraries, header files, hard code them to use and include, maybe even your own home directory. And hard code versions of dependencies wherever possible. Excuses, right? We expect stuff to be in a standard environment, standard location. That's what everybody does. And we can't support everything out there. That's impossible. So, we do it this way and you should do it that way too. Choose your tools wisely. So, try to use tools like build tools and configuration tools that people don't know. So, they have to learn first before they can install your software. That's always good. If it becomes too mainstream, switch to something else. Or use tools that are very popular but that nobody likes. Force it on them. Try to use stuff with special behavior. Some of you know where I'm going with this. Try to use tools with special behavior, stuff that controls the environment, like puts you in a box and makes it hard to put stuff in the box or makes it very hard when something goes wrong to figure out what goes wrong, let alone fix it. Even better, don't use any tools at all. Just use your own script. You know how the thing should be configured. Write a Python script that calls a Perl script that spits out a make file. I don't know what. You know what to do or at least create wrappers around stuff that people know so they think they don't know it but you actually do, you just don't tell them. Again, excuses you can use. Modern tools are a lot better. So, throw away the old stuff, use the stuff, whatever is the hype right now. We can't keep living in the past. I prefer using my own scripts or at least wrappers around tools that I use. If you do use tools, try to use them in the wrong way. So, again, to surprise people, try to require a very old or very new version of the tools you use so people can't just use whatever version they have already. You have to use something else. And then if you write your own scripts, give them a name that looks familiar but there's actually something entirely different. Examples. So, Scones, this is from the website, a next generation build tool. Improved cross-platform. All of that is true. It's going to replace make. This resets the environment in which it executes commands. So, dollar part is hard set to this and everything outside of these locations it just doesn't find, right? This is the main annoyance with Scones. So, it's a good tool to use. There's a way around it. So, they have this dollar end thing where you can inject stuff in the environment. Don't tell people that. Basel. So, I couldn't pick... So, these people have a boot. You should check their boot and get some stickers. These are the new stickers. These are the old ones. I couldn't pick a logo. I don't know which one is better. This uses hard-coded locations for compilers. Like, the compiler is at user-bin-gcc. The include files are... include files are in user-lib files are... libraries are in user-lib as well. That's what it does. It does the same thing as Scones does. It takes control of the environment because it puts you in a box. It has full control. Things are reproducible. That's great. Command line options. So, it has dash copt dash config equals opt and dash c space opt. That's three entirely different things. I didn't notice this one, by the way. I got this from Twitter, but... It has a bit weird syntax. It has, like, Basel build config opt slash slash and whatever you want to build. People get confused about this. This looks like a typo. And it's actually on GitHub. This guy was saying, this double slash doesn't look right. There's something wrong there. No, this is how Basel works. So, confused people. Good. Seamake. Very popular. I want to make the claim that nobody really likes it. I've hurt other people, but... Okay, maybe I'm just wrong. It's pretty okay, actually, if all goes well. I mean, it does the job. But when it doesn't, then you're in trouble. You need to figure out what is really wrong. Like, where's the log file of Seamake? Somewhere deep in the directory. And it has, like, everything that possibly went wrong during the configuration, including stuff that is actually okay that went wrong. It's just testing things. And then convincing Seamake to behave if you have to patch Seamake scripts. I think it's very difficult. Excuses to use Seamake, you don't need one. Everybody's using Seamake already, see? You can just use it yourself. Partial installation procedure. So, no configuration. Just do hard coding in the Make file. Don't have a test suite. Don't support installing whatever is built to somewhere else. Just run Make. Bind or disappear somewhere. Dumb, right? You don't need to copy it to somewhere else. Why would you? So, all of these, you don't really need this. This software is pretty trivial. So, you don't really need a full installation procedure. If you do provide a test suite, include tests that are broken. But you just tell people, okay, the tests that are broken, just ignore them. And then if you have built artifacts like binaries and libraries, hide them somewhere deep, and then don't have an installation mechanism to copy them out somewhere else. So, make people hunt for stuff. And preferably in different locations, right? Interactive scripts. This is a good way to annoy people. So, have a configured script that is asking questions. You have to give it a specific answer. So, either words like yes or no, or you give it a list with numbers that list the options. So, make people pick a number and then ask another question. So, try to make it as hard as possible to automate this. Now, you probably do want to have a fallback mechanism, a silent way that you can actually automate this because this will drive you nuts if you have to do it over and over again. Just don't tell people. Don't put it in the documentation because then they can do it too. Interactive scripts are more intuitive. You're having a conversation with your user. That's good. Bonus points. If you give it a list of possible answers, a numbered list, change the numbers between releases. Back to Worf. Worf does exactly this. The weather modeling software, they have this. So, TensorFlow is actually what triggered me to give this talk because they combine a lot of these techniques together for TensorFlow. So, TensorFlow is a deep learning application. It's a very big thing in science right now. It's a hype because deep learning is actually very cool. It can do very cool stuff developed by Google. It was the most forked Github project last year and number five in terms of contributors. That's very impressive. It runs very well on the GPU as well. So NVIDIA is very happy with that. Now, installing TensorFlow is not that hard. They give you binary wheels. You do pip install wheel. Done, right? Well, it works, but there's actually a very good reason that you want to still install it from source. Like this is a small test I did on the, let's say a four-year-old Intel Haswell server. If you use the binary wheel, you'll get not even one image per second on this particular benchmark. If you build it from source, you get a 7x speedup. So you probably want to build it from source. Now, this is CPU only, which is not really the main target for TensorFlow. Okay, but still, not everybody has a very expensive GPU that they can use. So if you want to run this on your laptop, you really want to build it from source. So building TensorFlow from source is nice. It has a configure script. You know how this works? Well, no, you don't. It's an interactive script. That's not auto tools. It's actually a Python script that wraps around, yet something else, I think. It has a way to do a solid install, but this is never anywhere in the documentation. You can set these tf-need star environment variables to answer all the questions, but you have to dive into the configure script to know the names of the environment variables, let alone the values. You have to give it. And really, I tried finding this in the documentation. It's not there. They use Basel as a build tool so they get all the niceness that Basel gives you. Reset the environment, hardcodes the compiler, and include files and libraries to slash user, and so on. It auto installs some dependencies. So it does a W get, basically, for a lot of stuff like swag, and needs a lot of things to build. It does that for most of it, not for Python, not for CUDA, not for QDNN. You have to do that yourself. Well, these two have a reason, but a very good reason. And then in the end, when you do the whole configure build procedure, it spits out a Python wheel that you then have to pip install and just the location of the build. This is the only tool that I've seen that does this. So I was like, oh, you can do this, actually. So... Conclusion's wrapping up. So there's tons of things you can do to make your software hard to install, right? You try to confuse people, surprise people, annoy people, frustrate them, waste their time, and hopefully they will give up before ever getting to using your software. And then they can't complain. If they can't get it to run, they can't complain, they can't ask for more features because they don't know what it already supports, and so on. Lots of projects out there are good ideas, lots of examples, try to leverage them. Good excuses are not that hard to come up with. I've given you plenty, you can come up with others as well. And be creative, right? Just don't stick to what I showed you. Build on that. Thanks a lot. That was my talk. Any questions? Yes? If you get TensorFlow to build and involves deep learning, could you teach TensorFlow how to build others? I haven't tried it, but it sounds promising. Somebody should. Teach TensorFlow how to build itself. No, but that's not going to work. Sorry? There's not enough samples. You need samples to train on. Any other questions? Alright, thanks everybody for being here.