 So just to start a bit of context we're going to define what we are going to use for this talk and Just so you know the talk is available at my GitHub repository and everything is there demo examples and this very talk and its sources so In Python the package is usually a trio files Containing a special file the name of the root folder is the name of the package and you can import it under the right Conditions so you can temper with the import paths or something very well a lot of super interesting talk about different fixtures which Modified the way import machinery works in Python and well in some system in some context we actually do use those for legitimate reason for example in nix We use pth files and sites to make it easy to import any package in an isolated way Like virtual of neighbors you to do so So everything we install from any pipe pipe I like server Is a package and every package manager you've used is also something that's provides for package and use these indexes so Now thinking about people like us users We install packages manually we type in our terminal or by using graphical interfaces To install some packages that could be numpy that could be sci-fi that could be whatever you need and We do install those on a computer We're running a certain computer operating system and on this computer there are things those things are some source code as sash keys personal fire professional fire secrets and To us that they might be something that we don't care about that may be something that we forgot that we have But to over people that might become they might become valuable targets And that's really important to think about this because the computer that we carry on contains very important data and some people Try to enforce the fact that you have to encrypt your computer to prevent for data loss and catastrophic catastrophic failures and We can also Extend these kind of ideas to servers and deployments because they're kind of the same except if you replace the human interaction by automation So some valuable targets reachable at install time when you install a package our SSH kids SSR servers Bros or profiles and whatever you name it Because all of those are available in your user. So sometimes you can try to do things to isolate Where you're installing your packages, so you use docker or you use something like that But not everyone does that not every package manager provide for package is a package isolation. So Yeah Just a quick reminder how to publish a package to pipe PA We read down instructions to use our package. So made I some metadata and we perform some operations I think it's really interesting that The most easiest way to describe how to publish a package is that at install time we perform some operations because I sorry historically in order to install package you have to read down a setup dot by which is a fully Complete Python program which exposes install command and you can run everything in this install command and this code gets shipped to pipe PI and Is run arbitrary when you run when you install your package? So maybe we don't really care about The fact that we can run arbitrary code at install time But unfortunately we have some example of annoying things and really bad things done with the with with these pictures so for example date you see is very known a very known package to deal with dates and Some people some of you might remember the Python 2 to Python 3 transition Which was quite painful and some packages we which were not compatible with Python 3 started to create new packages compatible under the name Python 3 dash package and Of course someone published the Python 3 dash that you till package But that was not innocent package so on 2019 Someone published a malicious package behind this which was which were depending on the jellyfish package and playing her with a homo glyph attack playing on the I or L confusion and this payload Try to attack a very specific person Because if you unpack everything gets the code extra, which is on the issue by the way You can see that it was trying to test if someone had a very specific file Which had not a very random name or one was not on every computer So we can infer that it was a quite sophisticated attack towards one developer So it's quite frightening and makes you wonder who Frequent this attack are in fact and Some over talks talks about that There are a lot of malicious packages which uses typosquadding or even like I think Honest typosquadding Some package were definitely downloaded it too much for example DPP client And recently PI PI real rollout mandatory two-factor authentication for critical packages and you might have noticed this last week there were some drama about it because By PI did a great job. I think giving out Free two-factor authentication to critical package maintainers But some maintainer didn't want to enable two factor of the patients survey I agree with that. It's it's something it's like if you are a maintainer of some package for free software Are you really concerned about the security of your users? And I think that's a physical question. So let's put it aside The question we ask is why Arbitrary code execution is needed most of the package we install are mostly declared declaring their their dependencies so we say that NumPy depends on some numerical library and so on and so forth and people or any package manager use this to perform some topological sort and version resolution But our our Our formats our the way how we describe our package does does not take it account the problem that Comes with system level dependencies So when you install something like beautiful soup which comes with HTML parser You need some XML parser and you want a fast XML parser So lxml is one good candidate and it requires to install say libraries So you have to coordinate with system package manager You have also the problem of when you're using numerical libraries such as side power 9 pi you can change the inner Acceleration to choose an acceleration which works better on your hardware So for Intel mkl you will get a better acceleration or inter Intel base processors Open blast is the generic implementation which works quite fine for every processors Which was read by the research and you have the same for AMD and so on and so forth same for TensorFlow and CUDA because you have some specific specifics about GPUs and You have a new purely Python packages also Which are read written in another language so says rest must be compiled and exposed to your Python package and When you people install such a package and you don't have any will for it you need to build it and Python is not going to install a rest compiler for you. So the author is going to install a rest compiler for you And you have many over example of why this is needed So we can argue that maybe that's the wrong model and should do something else extra but that the reality so you have to work with with this and This is possible because there is no real art in our teacher problem but I was some interesting talk about the update framework and and many over Attempts to try to solve this problem, but not all can account for everything. I just mentioned and This is still very a work in progress so one of the problem I think is is hard here is that not only you have to support like Linux distribution we which all have their way to do things Some are source-based you take our clinics gain to and so on for some are binary based they beyond open to extra and They don't necessarily agree and who is responsible to bring you to the final mile of dependency And also what something I don't say is about windows macOS and so on so forth And finally software just have to make assumptions There are no easy way to discover some standard standard sometimes so for example when you do a say library Or native library you can use something called PKG dash config to discover library paths and pass parameters to your compiler and that's requires you to provide a PC file and Sometimes upstream developers forget about them and Downstream develop that downstream maintenance have to add them manually, but they don't necessarily get them upstream So it creates a lot of fragmentation and difficulties to handle everything So downstream motors have to do work around spyton developers spyton package Packager have to do something in their setup dot-bytes so that the experience is streamlined and very user friendly and As a result you have a great success of using something like Docker and France because they provide a streamline experience in a clean state Which would be here a container? So we could decide that yeah, that's the state of thing and we might need to restart Packaging someday and and do it fine do it good and get some 10 years milestone Plan to get us there But this talk is about next So let's do a next crash course What is next so the idea behind next is a functional next language and It's a general package manager, which is able to work with over package managers The provoking idea of next is we break the file system your artist on there So you don't get any slash us or extra you only get your state file So slash home and and so on so forth everything is a sim link of something in the next store So that makes it really easy to reason about what you have in your system And everything is recorded and you have a cryptographic ash. We'll come back to the cryptographic ash later Packages are not really packages in next. We don't have any concept of packages we have concept of derivation, which is a more general concept and We express something in the next language and produce outputs So that can be a package that can be a system You need file that can be a bash script that can be a Python script everything goes So here's an example of five system your artist and are You can find those as being srv us or var that you often find on your system But on next those either don't exist or are sim link of a special next or path so just to get some Example of what is next here is the next language now So this is an expression to build a Python package So you have some import of the package the list of packages we have so that this is a function signature We have a function call we call to build Python package function. We give a name. We give version We give the source notice that we give the ash. That's something that's prevent In next network access is disabled except for Derivation which have ash so that we can already know the results in advance So if by PI PI try to change the contents the result would be rejected because the ash would change So this protects us a bit but not a lot And we have some metadata about the package So the the next path so we say that we have a next path here is the cryptographic ash A nice property of next is that this cryptography ash uniquely identify your your package Either things are input address that means that your package is Function of the inputs if you change them by and you depend on them by your package is going to change Or either content addressed that means that either if not by change Changes, but your result won't change because it's a minor API Bomb or something the ash won't change so that's really interesting because you never built software twice except for very good reasons and That makes it trivial to cash Derivations and provide them to everyone And then you get an ace not a nice name and Get now now you get something that you all used to which is the bat at the bin folder the leap folder extra extra So here's some glossary basic glossary of the next terms. I'm going to go fast on them The next door is the file system where everything is recorded including the derivations the store path is Either outputs produced by a derivation file or a different derivation file itself So you can send derivation to someone else and you can build it for you and you can get the outputs Derivation is the recipe. So when you say I want to build tools like we we've seen earlier You say that this is a bit a python package and all python package have a way to be built which is running the setup dot pi and and then we just give some Parameters that we need so the source the name the version as some matter like some metadata And what is interesting about it is that even a full nix-west Linux system is just one big derivation Which is your kiln or your in-intererity? Yours boot script extra extra and you can have a tree which I will showcase later of your world system And then the mathematical concept of closure, which is that's when you have a binary relation such as software X dependent software Y Often in the packaging ecosystem you want to get the closure of this so we you look at the graph and you say Okay, I'm going to get all the packages I need which depends all on themselves so that I can have the full software package and Working on my computer. So in nix we often talk about closure of a software And that means the transitive closure behind the binary relation of dependencies So now maybe I lost some people of the audience and maybe a question I would I would ask is how many people are using nix here? Yeah Well, that's cool for all the newcomers here So maybe it's a bit obscure so far and it's very normal because that's how nix is perceived by a lot of people and One try one thing it tries to do is to use theoretical framework to get the complexity of software tamed So as a goal we nix as hallway as hallways have had build reproducibility So what I explain about input address or content address so once you build a software once you can retry to rebuild it And you will get the same very package Something later on that would be funny And build operation or sandbox by default so net no network access no arbitrary access to the user-prize system or even installed Dependencies I cannot try to see if you have like I don't know numpy installed if I don't need them by so That's pretty powerful because we help us to see what the software is actually doing in the sandbox and You still have escape hatches if you take some software like steam Well steam is a very bad behaving software because it brings its own libraries because it's trying to solve the problem of libraries So some games are not statically linked They come with a lot of libraries and nix doesn't take the library of overs So we have to make it pretend live in a file system hierarchy standard using namespaces and Yeah, everyone is able to use team on next s even if it looks like quite hard Composition with local package manager is interesting and important nix is not about reinventing the wheel It's not about like taking over peep or poetry is trying to work with peep and poetry So that it can handles the thing that the distribution the Linux distribution distribution has to handle while peep and poetry Focuses on what it has to solve And trivial caching as we have reproducibility Nix provides a very big binary cache Which has everything you contain so even though nix-west is a source-based distribution you don't recompile that much I Think the only time I recompile is because I'm changing something in the c-compiler and I'm recompiling the whole world So I think that's this this chart for repology.org shows something very interesting that Nix has a lot of package projects and non-unique package projects and Of course, it's because we are automating a lot of packaging and we are like eating packages from by PI and so on and so forth, but we're doing that with a lot less Maintainers than something like the art Linux user repository And I think that's very interesting to to to see that because We have a lot of data on what is packaging what is a package and how people to use this package and what does this package does We have a minimal image. We have came very far on reproducibility on that and We have Only two path which are not reproducible. So this is the the website tracking nix-west reproducibility So it's trying every path That is a derivation in the minimal image and trying to see if it can rebuild it twice on a different kernel Different hardware different machine different time And see if there are any differences. So For pie so as you can see Python and rest something are not reproducible Sometimes we're at 100% and sometimes we're not so it's it's continuously tests and as you can see we can look very deep into the problem and see that some Pi C file is changing when you were rebuilding twice. So maybe non-deterministic optimization or timestamp related issue and That's pretty cool to be able to do that So the question now is how we can leverage nix as a tool So has nix is very strict and the word outside is quite forgiving We have to not reimplement the wheel. We have to reuse existing tool like peep on poetry but we have to left nix handle the downloading the native dependency sourcing and capture something in an expression and We put peep and poetry and every tooling that we use like this in an offline mode That means that we instruct peep to say that we're going to do an offline install and should rely on not using network access So for now, I'm going to do use a very simple demo of that So here's a nix common Where I'm going to to create a shell so nix shell is very interesting because it's you're all used to virtual on Well nix shell is virtual of generalized to everything. So you can use any package any binary temporarily in some sort of virtual and once you exit the shell Those package are not in your environment anymore. So what I'm going to do is to have a python 3.9 environment with some known to be annoying package to install sometimes on some systems and Show that it's quite easy and what happened when you do that Of course so you can see that I made a rookie mistake, which is Assuming that bash was available in my path. It is not I have to use on which is the post extender for that and You can see that next is going to compute the closure of the packages I asked so I have some nice information about the download size the impact size and Get everything that I need So it's doing the work and downloading and now I'm creating the environment. So I'm in the nix shell right now So I do have numpy. I do have scipy. I do have tensorflow and pytorch Once I exit this This shell you can see that I don't have those package in my environment So it's indeed working and I can perform some advincet stuff on this thing because I can change The inner the inherent numerical library. So I can do Intel and Cal based numpy or scipy And it's quite easy this package This script can be cached so that I can share it with others and they can have the instant shell and not download Not build a lot of things So it's quite cool and continuous integration becomes a lot easier to test with this kind of fixtures and the second demo is about a self-contained script sometimes we build some scripts one line script one file script, sorry and We don't use External dependency because it's annoying to package them Or to package a one file Script with back with dependencies. So here's an example we request So I just want to do request get European.IU I don't have request in my environment. I don't want to install request I would like to share this file with friends that might use this package Might use this script rather and you can see that my Code tool is not finding request but I can still run it and get an answer because Nickshake has been used as an interpreter for shebungs, which is a feature of the shells and This way you can do self-contained Python script trivially. So We've seen how nicks can be leveraged as a tool for doing Python development We didn't showcase how to do Python deployment, but the idea is the same you build some closer You send it to the remote server and the remote server just use it You can do a lot of advanced stuff meanwhile You can check that the closer is coherent with something you can see in the closer you can generate a so for a bit of materials and a lot of stuff, but the problem with nicks is that it's learning curve is quite insane so not a lot of people are that much gray bird but some tries and Nick's is like very frustrating for new users used to download binaries from the internet and trying to run them We have solutions for that, but it's still not easy and Nick's cannot solve everything so Nick's cannot solve things that were not build for Nick's Meaning that we have a system to determine which packages is hit by CVU, but Have you ever seen a CVU informing you about the very function who is Affected in the security of the library So you cannot like Say that this is the this particular function which is affected This is the ash of this function. This is the ash of the code. So we cannot automate easily Whether we are affecting or not by a CVE That's something which is a work in progress Severe are very granular sometimes they say to you that in your dependency you have a package Who is renewable? But in fact, you're never using the package. So you're not affected and Nick's cannot solve that Tooling that takes log files and produce next expression are not enough. You have a lot of each case is including in the Python Registry sometimes some URLs are not stable And you have to try to predict things and try to find out about hashes and so on and so forth So it's a bit difficult today varies a bug on poetry which Breaks a lot of things and I just got it by it and I wanted to do a third demo, but it doesn't work Limits of the standards are also Complicated to work around You should take a crate like cryptography Which is a Python package for doing crypto We have to do this kind of work around to support it because it brings a rest compiler in the in the play and unfortunately The files doesn't bring any ashes about the the ash of the dependencies of the the rest Project so we have downstream to support that and try to work around behind the fact that if you're in there A certain version we should use the wheel if not we can compile it And here are the dependencies hashed and we can use it extra extra and it's very very complicated So we have been working with cryptography guys on that, but the format doesn't support it. So it's hard And advanced attackers we just moved to bug doors and now it gets very complicated. It's not it's not a technical problem So we need in the future more foundational work to be compatible with Python We disable bytecode optimization because of errors are not determinism We did a we're not using a lot wheels because It's super hard to get right the many Linux ABI problems But at the same time we have things like trust X which solve distributed trust issue when you have multiple binary cache How do you trust this binary cache? What is your policy? Do you want to do a majority vote? You ask every cache. What is the cryptography hash and you say that? If the majority vote in favor of ash and something is not doing it Then you reject this cache. So it's quite interesting. We have the salsa framework Which is a framework pushed by some big carp tech Companies to get high level of assurances in your supply chain in integration with the ecosystem is are being published at the moment so the conclusion is Installing packages in bite is dangerous by construction fixing this is complicated Attacks are running in the wild and we can only measure the public surface raising the cost for attackers quite easy by constraining the attack surface with nix at least and The real win are in fact that we have a lot of data produced by nix We have the integration with the external ecosystem doing multi language project becomes easier with nix and That's pretty much it. I will left you with some references and do some QA if I have time for it