 OK, no EuroPython or any Python conference is complete without at least one talk on packaging and this must be at least a second. So let's see what your key has to tell us. Thank you. If you don't mind, I'm going to wait for a tiny bit for clock to hit 3, 4 and I'm just depressed. No one comes. Actually, before you all can run away, I'm going to take a photo of you. This is creating a zero chance that it happens when I start speaking. OK. Guten Tag, EuroPython or Herzog Willkommen. That's my outcome of two years of German studies while in elementary school. Unfortunately, I can't do better. But welcome everyone to a talk about a tool called DH Virtual N or as I label it, packaging in packaging. But before we look into that deeper, let me introduce myself. So my name is Jurke Bulianen. I am from Finland, but I'm living in Stockholm, Sweden nowadays. I work for a music streaming company called Spotify. In there, I do kind of do two stuff. I am a content engineer, so I build like a pipeline of new music to the service and on the other hand, I also fiddle around a lot with our internal Python stack and answer people's question about Python. If you want to reach out to me, there's my email address and there's my Twitter handle, so please do if you have any questions. Now about this talk. This talk will be in three different sections. First we're going to look into some of the existing deployment strategies you can use on the PNU-based machines. And we're going to look into what actually is DH Virtual N environment and how does it differ from the existing deployment strategies. In the end, we're going to go through an example of how you package software and we're going to package Sentry, mostly because it's not the simple piece of software, so I'm going to show you by example how you can use DH Virtual N to package something like Sentry in production. Now let's start with the tiny quiz. Who here runs a Debian or Ubuntu-based system? Good, you're found the correct answer. Who here deploy stuff using the so-called native packages on Debian system, like relying on the native libraries on those systems? Who here is kind of frustrated with that? And who here uses virtual environment? Like you set up a virtual environment and then you install everything in there when you're production host. Now, these both have their good sides too. If we take, for example, the native Debian packages. So they're stable. The stuff that gets in the Debian, especially stuff that's in main, is stable. It's well-tested. People usually don't change them so that the backwards compatibility breaks. So if your Debian system gets an update for, let's say, Python requests, you know that that request is something that's backwards incompatible with the previous version. Now, Debian has another nice bone side. You can declare non-Python dependencies. So say your software requires SQLite to be installed, or your software requires MySQL to be installed on the same machine. When you create the Debian packages, you can say that your package depends on MySQL and you get that one installed on the machine. Everything nicely contained in one package. It also has pretty neat existing infrastructure. So not only do you have, like, dedicated build tools or separated build environments like sBuild and other CH-root solutions, but you also have the possibility of running your own app repository, which means that you will have your own network of deploying your stuff and production, plus all kinds of CI tools like Jenkins or Team City have at least some rudimentary support for dealing with Debian packages. And the last good thing, I think, in the Debian packaging is that you have a quite nice scripting support. What that means is that if you need to remove a cache when you upgrade your package version, you can write the script that before you upgrade your package, it will clear out the cache. Or if you want to restart your service after you've installed it, you can write the script that after installation, it will restart your service. You can do crazy borderline stupid stuff like database migrations and postings. I don't necessarily recommend them, but you can do all kinds of stuff there. It gives you quite a lot of power. Now that was the good parts. Then if we start talking about the bad parts, you probably have run into this case where you see that, oh, let's say back in the days, can network hard and push out request 1.3 and that has just a feature you need, but unfortunately your ancient web-filled box is running request 1.2 and what do you think, are you going to wait for Debian to package that? What happens? You're just going to rot in front of your computer waiting for the newer request, especially if it's a backward incompatible to come on the current system you're running. So a lot of stuff in Debian, even at the release of that particular Debian or Ubuntu release is already outdated. The packaging itself is kind of complex. So what we're talking here is that someone created a packaging system that is built for building a whole operating system. So it covers all the possible corner cases, all the possible scenarios, if you want to deploy Perl, Haskell, Python, you name it, it has everything covered. That means that the whole system is really, really complex and the documentation is far from simple. In addition to that, all the documentation that you usually can find is also geared towards Debian package maintenance. So okay, you want to ship this thing within the operating system. So this is how you should do. And it does not necessarily resonate to how you yourself would deploy your tiny service in the host. And what I think is the worst part in Debian packages is that you get a callable state. So if you're only deploying all your libraries as Debian packages and then using them, eventually you'll end up having a case where you would like to upgrade one library, but it's also used by some other software on the same box and you don't know if you can do it without breaking anything. We've had this at Spotify a plethora of times when we have rolled out like a new common Python utilize libraries on the host and we're just too afraid of deploying that because we might, we are kind of afraid that something down the line might break even we have been testing it for months. So it kind of slows you down and it's really, really annoying. Now, if you think about virtual environments then, so what you get in the virtual environment is somewhat the opposite. So you get the whole new stuff. You just do pip install and you get whatever is available in PyPI. Like the latest release, you can, and you can go even go to the extent of like you can get, open your git or mercurial and just pull the stuff in your virtual environment and run it in there. So you can always get the new, the stuff. It has become kind of a de facto method in the Python world. So every guide usually contains a word or two how you run stuff inside the virtual environment. You can do the same virtual environment stuff on your laptop as you can do on your servers. It kind of works. It's also battle tested. So nowadays, so many people are running it in production. So it's fairly safe to use that one. But the best part, I think, is that it's contained. So if you take a package and install it inside a virtual environment, a Python package, you can be sure that it won't affect anything outside that virtual environment. So updating a simple packages doesn't mean that your whole system crumbles because something was relying on all the version or that you would interfere with the underlying operating system. Nope, you're only poking the actual virtual environment and you get this nice contained fuzzy feeling when dealing with that. Now on the downside, I bet some of you might have seen this line. That means that you can't have any native dependencies if you're using virtual environments and people install. You need to know what MySQL libraries you have to have available. Like in this case, you need to know that you have to install MySQL Client to find this MySQL config on your host. So it requires you to do some manual dicking through things. Even more, you end up doing source installs, OK? So you can have wheels or even eggs to avoid source installs. But if you don't have wheels for your platform or you haven't set up your own wheel repository, you end up doing source installations. And when you have your virtual environment in your production server, you end up installing all the dependencies for that source installation. On that production environment, which is not necessarily bad in the sense that you'd break something, but you'll just clutter your production environment with developer inheritors and other unused files. But the worst part, what I think is with the PIP installs, is that you're basically executing a bunch of random scripts. So sure, wheel gets around this, but if you run setup.py install, you probably haven't looked into what all those files do or what all those files that those packages depend on do. So you're just relying on a good faith of people and you're running random stuff in your production environment. It doesn't even need to be malicious to hurt you. Someone might just accidentally release a package that wipes your whole ETC or your whole home director or something, and you, by accident, end up crippling your system. Now, this brings us to a question then. What is DH Virtual Man? So DH Virtual Man was about two years ago my attempt to combine the best of the two worlds. So it is a virtual environment that is placed inside the Debian package. It supports both Python 2 and 3. It is kind of version agnostic. I won't say that you can't install Python 1 stuff with that, but it doesn't execute or import any, it doesn't import any Python code, so it doesn't really care if your tool is written with Python 2 or Python 3. You can use it anyway. It even supports using the new virtual environment package with 3.3, so you don't even need to install virtual N to run it. It's also open source. So it's GPL, like all the Debian build tools. It has a good documentation. Now, I'm the guy who wrote it. So I might be a bit biased here. But I think the documentation is good. It's at least better than the average open source documentation. But the best part, which seems to be actually very functional, is that it has a simple tutorial. So if you go to dhvirtualman.readthedocs.org, you find there's a four-step tutorial that you can run through and boom, your package is inside the virtual environment inside the Debian package. Under the surface, it is a depth helper extension. So depth helper is this, how would you make it pretty? Like a pile of Perl scripts that Debian executes when you're building packages. It's a certain fixed sequence of Perl scripts that you install, and then there are different extensions for Debian to know how to build packages for Python packages or how you build Perl packages or how you build batch completion stuff. So what dhvirtualman does, it just injects itself into that flow. So there's 12 lines perl included there that does the magic to inject the dhvirtualman in there. And then it just runs as a part of the sequence. Now, this is kind of like the implementation details. But for you who already have existing Debian build environments, like if you're using sbuild or something else, or just playing dbuild or dpackage build package on the command line, this means that using dhvirtualman is just going to work in your existing workflow. So what I basically did back in the days was that I found the great blog post by Hineck Slavak. I adapted the idea a bit to fit our built environment, and that's why I ended up writing depth helper extensions. Now, Hineck's system works well too, but it uses a thing called FPM, which back in the days didn't fit our build system. Now, in practice, the dhvirtualman is a packaging builder that it creates a virtual environment. You can define what Python you want to use with it. So if you have multiple pythons installed in your machine, you just pick the one you want. It installs everything you have listed in requirements at TXC. And this is the exact same format you get with pip-freeze. It installs those inside the virtual environment. Then it takes your project and runs setup.py install on that. So it just doesn't dump your sources in there. It actually installs your project inside the virtual environment. And then it does a bunch of magic, which is like said scripts, where big thanks actually goes to Hineck about those, and other stuff like rewriting, activate, and so that you can actually run, like, instead of having all your built system paths, it will contain your production system paths, and you can use things like activate or activate this in the production and end up with the same virtual environment. OK? So that's nice and cool. But let's see. Let's take a project. Let's packet something with the H4 term. Let's packet sentry. So who here knows what sentry is? Cool. It's a really good exception tracking tool. We use it in our production systems. It works like a charm. The best part for the example part is that when you install sentry, if you have ever done pip install sentry, it pulls down, like, half the Python package index. It pulls, like, whatever you can find. It depends on a lot of stuff. It's not because it's bad software. It's because it's a complex software in a good way. That sounded a bit bad, well. Anyway, let's see how we do the DH4 term. So first step is that we need to install DH4 term. If you're running a modern operating system, like Ubuntu trustee, or the Debian testing, the DH4 term is actually available inside those repos. You can just say apt-get install DH4 term. And all of a sudden, you have the DH4 term available in your system. It's, as previously discussed, a bit old on Debian. And Ubuntu, but it still works. Then you need to create a Debian directory inside your sentry installation. So Debian directory is a custom directory Debian uses to figure out what stuff should it... Like, from that, it figures out what packages should it build, what stuff should it run on the build time. And in that directory, you have to create a few files. There's a minimum set of four files that you need to create there. And don't be afraid. All of these are covered in a tutorial also. So when you start packaging your sentry, you add a control file. Now, this is the place which Debian uses to figure out what does it need for building. So you can see that sentry requires the Python development headers for building. But for running it, it doesn't require anything special. So in this place, you have to just build depend on the DH4 term and then write the required dependencies like Python and stuff like that in there. But this is basically how I did this today when I built the sentry with the DH4 term is that I just copied over the tutorial stuff and changed the fields that I felt would need to be changed, like package names. Then you need a change log. The change log is a, well, it's a change log, but it's the file that is required for the Debian package to figure out some version. And it just tells Debian that we are packaging sentries 6.4.4. Cool. And the third one, yeah. This is why the packaging is the complex black magic thingy. The third one, you need to define what is called the compatibility level. So the Debian knows how we should build the package, what's relevant to you guys is probably just you echo nine in that combat file here and you're done with that. If you don't do that, it'll pick up some ancient compatibility level and won't build your package. So that's pretty much it. And the last part is the glorified make file, aka rules file, which just tells how you build stuff. Now, if you build Debian packages before for Python, you recognize, you probably recognize this file and you can just see that we changed the Python 2 to be Python version n. This basically tells Debian that we built this package using DH version n instead of the default way of building Python stuff. And that's it. If I add the package build package and it rolls through, you get the nice matrix-like output of stuff building and all of a sudden you see, hey, look at that. It's actually created a virtual environment, puts pip whatnot in there, and starts pulling half the internet down into your package. And once that's done, all that's left for you is just take the Debian package, copy it on your production host, and install it there. Now, if you have defined some build dependencies in the control file, they get installed at the same time. And the best part is that you haven't executed any random scripts on your production system because all of them were done in your build system. You end up with deploying the whole thing without cluttering any production system with any development here or stuff like that. And you have a nice contained virtual environment in your production host. Okay, so once you've done that, let's look at the kind of nice parts of the DH virtual environment then. So what it gives you is that it gives you the non-Python dependencies or the possibility to define non-Python dependencies just the way you could do with normal Debian packages. It also leverages on the existing infrastructure of Debian building. So you can use your existing build agents, you can use your existing CI systems if you're already using them for Debian, or you can have your own app 3.0 still in use. It has the new hotness. So it's not limited by what you can find on Debian. You can actually just install what's the newest stuff available. And as I told before, it's contained. So we end up with the virtual environment in certain place in your production system and that's it. Of course, like any solution, there are also some negative sides to this. The build times can be slow. So especially if you're not running your own PyPI mirror and you're not using wheels, you're basically downloading all the requirements from the internet and then building them, which means that yes, your build time will become longer, but it can be also substantially mitigated by running your own mirror which cuts down the network latency and using wheels which cuts down the build times. It still requires you to dig some requirements. So you need to know what requirements you need, what native system requirements you need to have on those systems. It doesn't let you out of that loophole. So if you're running, let's say, you're parsing XML using LXML, you need to make sure that your control file depends on installing LXML on your production system as well as having the development headers for the build part. And the build system needs to have exactly the same Python. So like, because it's a virtual environment, what virtual environment does, it actually links outside the virtual environment for, well, the linker links stuff on the production system. So you need to have the same Python available on your build system as you have on your production system. But it rarely is a problem if you are already having an existing build that changes the stuff. For the future of DHRT-RN, I'm at least trying that for some of this stuff, like I'm looking into cookie cutter templates. Unfortunately, I didn't have time to prepare them before this talk, but it would be sweet if you could just use cookie cutter and boom, you would have your DHRT-RN packaging done without you don't need to go and echo nine on random files. I'm planning to add a trigger support like if you get a minor update of your Python on your production host, it refreshes the virtual environment to make sure it still runs. And I'm also... This was actually a tip from... Damn, I'm so bad with names. Adam from Discos, when he was rehearsing his talk and we talked about this and he gave me a tip, like, what if I could actually break out the dependency on the system Python, like use PyN or something to incorporate the whole Python into the virtual environment. But that's pretty much it. If you want to find out more, the source is available, open source, under the Spotify umbrella in GitHub, and there is the good emphasis on the word good, documentation, on the read the docs, and then there's a blog post. We posted when we released this. So with that, I thank you for your time. Thank you, Joachim. You've just saved me a lot of time. Any questions to the microphones, please? Hi. Great, so thanks very much for that. The other time I was looking at your presentation, not your, but your colleague from Spotify about using Docker for the deployment and managing the infrastructure. I myself am right now struggling between Docker for dev and production, and right now in my company we are building the Debian packages, actually. So this would be really, really cool to use. Do you actually do that you install some software based on this Debian packages and some other deployments based on Docker, or do you mix it? The current plan with the Python stuff is to mix it. So while Docker is great, it still doesn't solve the problem of defining dependencies. So it basically boils down to two options. So either you write the Docker recipe build file that says, like, pip install this, pip install that, apt-get install this, which kind of does the trick. But then you have the same problem that you would probably need like two different Docker images, one for building your package somehow, then extracting that out and putting it into the other Docker image. So the benefit of the Docker is that you don't necessarily need the virtual environment because the Docker already provides the isolation. But we are leaning more and more towards to build Python software with this and then use the Debian package to dump that one into the Docker images. Okay, and my other question is what do you use for your local APT repositories? I have no idea, but it's something fairly off the shelf. I've looked at it, but it's like, it's ancient installation. Hi. Yeah, I've got a question. Does that support any great mechanism as well? Can you repeat the question? Yeah, does your system have a kind of great mechanism if you want to... I know, like, let's take the example of Sentry and let's say your database needs some migration and you want your package to be able to to provide some, I don't know, whatever type of rate. Your packaging needs to have some changes from one version to another. Yeah, so because it's a Debian package, so it will have all the post-installation scripts and stuff like that that you can run. So if you want to run scripts before removal, after removal, before installation and so forth, you can use the existing Debian infrastructure for that one. So it's not too complicated to do those, but of course it requires some knowledge of the Debian packaging at that point. Okay. One question related to that. Do you use these post-installation, pre-installation files for database migrations? So you said earlier you use something different, but what do you use there? We actually do have some project that used the post-instit files for database migration, but it's kind of... I feel it's kind of scary. Like, if something accidentally triggers a package update and we get like an unwanted migration, even if it would be tested safe. So usually we do that in database migrations with some sort of manual steps, depending on the project. But yeah, you can do post-instit... That project has been working. Fingers crossed with the future. Can you use system Python packages or everything is installed in the virtual world? For example, Alex Amal and Pillow Imaging Library. Yeah, currently by design everything is installed in the virtual environment. But it shouldn't be too hard to add a feature to the virtual world where you can shoot in your both legs at the same time to allow the system packages to be... Yeah, I see your point because Alex Amal or Pillow are kind of annoying. Well, Alex Amal is fairly easy, but Pillow is really annoying to install inside virtual environments. So yeah, it's probably something for future to add a flag that if you want to use the site packages or the existing ones, then why not? But currently no. Maybe you can remove from requirements TXT and just add to... It depends. Yeah. But then if you add it to the depends, it gets installed on the system level, but the virtual environment is built with no site packages flag, so it won't see them. One intermediate step you could take is that you could depend on Pillow, let that install on the system level, and then you could use Python path when you start your software to point that in addition to the virtual environments Python paths. But I wouldn't go for... I wouldn't say sure if it's viable solution or not. But nevertheless, like, if you want that feature, please open a ticket in the GitHub. It should be fairly simple to implement, so I can just build that. Okay, thank you. Or if you want to make a pull request, that's even better. Sorry. Did you hear about FPM, so-called FV package management tool? Yeah, so that's the Hinex blog post, which is actually doing this exact same stuff, but using FPM instead of injecting the depth helper sequence. I wanted to use that originally. It works, it gets the job done, but it didn't fit our build systems. So then I ended up building a depth helper sequence instead. Okay, but if someone starts from scratch and doesn't have any legacy that he wanted to use, then why he could use your product instead of FPM? What's the advantages for him? I don't know if there's any specific reason for it. In that case, I would say go read the Hinex blog post, check out the excellent documentation of my project, and decide on which one seems to be simpler. So I've aimed to cater people who don't really know the PMPackaging at all, so that it should be easy, but it's the same with Hinex blog post, you just follow steps and do stuff. So it's rather a matter of personal preference rather than features, yes? Okay, thanks. Just a quick one, is there any plan on your side to port it to Weezy, for example, using back ports that even the log? Yeah, that could be done. I'm already planning to, because trust is having 0.6, to set up a PPA for trusty, so that you get the newer releases on trusty. This stuff builds fine on Weezy, so I built it on Weezy, but Weezy was already stable at that point when I released this, so it shouldn't be too hard to add it to the back ports for Weezy. Okay, so there are no technical reasons not to do that? No, this is really simple, and we build stuff with the edge version on top of squeeze. So if it doesn't work on Weezy, then we've done something really bad in that case. Okay, thanks. Thank you, I think that's all the questions. Thank you for a very clear, very useful talk, Joachim. Thank you.