 I'm going to talk about Hex, which is the, or it's a package manager for Elixir and other Erlang-based languages. So before today, how many of you have heard of Hex? Awesome, awesome. And how many of you have used a Hex package? And how many of you have published a Hex package? Cool, we have a few here. So this is talk about, this is a bit more of a technical talk. I'm going to talk about implementation of the Hex client and the server. And we're going to see some of the design, some of the design decisions that we had to make. And I will also go into details about the dependency resolution algorithm that we use. So let's start out with why I wanted to build Hex. So about half a year ago, I started working on it. And the aim was to solve many of the issues with dependencies in Mix. The ecosystem was really starting to grow. And we were getting more and more Elixir libraries. And the only way to use an external dependency in Mix was with a JIT dependency. You can also specify an optional branch tag or JIT reference. And there were a few problems with this. Defining your dependencies with JIT naturally leads to people depending on JIT master because that was the default. And that's obviously not a good thing. The log file help with this issue, it locks your dependency to a specific JIT reference or tag. So it helps by ensuring repeatable builds, but there were still problems that people were depending on JIT master. And even if you are responsible and instead of using JIT master, you'd use a JIT tag to get a proper release. That also has a share of issues because if you have two dependencies in your dependency tree that are essentially the same application or they are the same application, but two different people depend on it. And they target the same JIT URL but have slightly different JIT tag or something. So even if they are compatible, you would have a conflict. I would have to be resolved by the user by overriding. So those are some of the problems that we were trying to solve. And Hicks has proven to be used a lot actually. Today we have over 100 packages and almost over 400 releases. Here you see the download statistics. So you start out a few months ago, you see the week numbers and the number of downloads each week. So we have had quite a bit of growth the last few weeks, which is nice. In total, I think we have had almost 25,000 packaged downloads. So I'm going to start to talk about the client which is the command line interface that you use. Hicks integrates with Mix, but it's a standalone application. So unlike Mix, it is not bundled with Elixir Core and you would need to install it separately. All the functionality is exposed through Mix tasks. And a Mix task is a module matching a specific interface. It has a module name but needs to be namespace on their Mix task as you see. And when Mix starts, it looks through the load path for modules matching this interface. And because it looks through the load path, it can find a task that are defined in your project or in the dependencies you're using. So library authors can define tasks that will be available to you when you're using those libraries as dependencies. So Hicks is installed as an archive. An archive is a zip file essentially that contains compiled modules that the Erlang code server will treat as a normal directory. So instead of a directory with compiled modules, you have a zip file with compiled modules that you add to your code path and Erlang will handle that. And when Mix starts, all installed archives on your systems will be loaded and the task will be available to you. The one big issue with archives though is that they are installed as one big compiled blob essentially. So unlike your projects and their dependency that are recompiled when your Elixir version changes, instead of that, you have to handle that differently. And that's an issue because Elixir does not ensure binary compatibility between versions. So when 1.0 is released, we will of course not have breaking changes for source code for quite a long while, but we cannot ensure that compiled code for a specific version will work with an Elixir version that has changed. So to mitigate this issue, we need a nice streamlined way to update Higgs. So firstly, Mix ships with a task to install or update Higgs. Mix will also ask users to install Higgs if it's missing and you're using a Higgs dependency. And Mix provides tasks for installing archives and that's what we use to install Higgs. An archive is installed by giving a URL to Mix and at the end of that URL on some server somewhere, you have an archive that will be fetched. When we install Higgs, we need to ensure that a Higgs version is installed that is compatible with that specific Elixir version. So we do this by giving a URL to the Higgs API server and the API server will inspect the HTTP user agent which includes the Elixir version and we will redirect to the correct Higgs archive that is compatible. So finally, the integration with Mix that Higgs does allows us to use packages as dependencies is done through the remote converter. I will talk more about what the remote converter does in just a moment. To explain, we will run through what happens when you call the command to fetch dependencies. So first off, we run the Mix converter. The Mix converter traverses your dependency tree. While it does this, it does some sanity checks to make sure that all dependencies are correct and we flatten the tree to a set of unique dependencies. So the commercial starts by collecting the top level dependencies from your Mix file. As it is doing this, the GIT dependencies are being fetched. We skip Higgs packages for the moment because they will be held in a later stage. So we're recursed down this dependency tree and we do the same thing for the children of the dependencies we just fetched. And as we're recursed down this way, we need to converge dependencies, essentially merge dependencies that have the same application name. And that is because the Erlang runtime doesn't allow application and modules to, they need to be unique, essentially. You can't have two applications. You can't have two versions of the same application loaded at the same time. So we will converge dependencies if their definitions match. If they do not match, we will mark them as diverged and they do not match if they point to different URLs for GIT dependencies or if they use different tags and so on. We need to make sure that it's really the same dependency. So the mark must diverge if this happens and we need to present an error to the user. So finally, after we have flattened this dependency tree and merged it into a single list of unique dependencies, after that we need to sort them based on the interdependencies in the tree. And we need to sort them because, of course, Elixir has metaprogramming, so we might have dependencies calling code in other dependencies at compile time. So it's important that we compile all dependencies in the correct order. But before this sorting happens, we're calling to the remote converter, which is a module that is registered by Hex. And here's where we leave mixed land and entering to Hex land. And this is where the dependency resolution, the dependency resolution happens that makes sure that we have a compatible set of Hex packages. And the reason why we run the converter that handles GIT dependencies before we run the remote converter is because we need to find any GIT dependencies that itself has Hex packages as dependencies. So we can include them in the dependency resolver. But before we run the dependency resolver, we need to make sure that we have an up-to-date registry on the local computer. The registry is essentially a ETS table that we have serialized to a file. And an ETS table for those of you who don't know is basically an in-memory key-value database. And this is what the contents of the registry looks like. So we have, you see, ECTA and Postgrex, which are package names. We list all the versions for them and for every package and version pair, we will list all the dependencies and the version requirements on those dependencies. And we need to run the, and we need to fetch the registry every time we run the dependency resolution to ensure that we have an up-to-date copy. Today, we have slightly over 100 packages and almost 400 releases. And this file weighs in at about 27 kilobytes and five kilobytes compressed. So I'm not too worried about the size right now. But the problem is that we need to fetch this file every time we run the dependency resolution. We, of course, use conditional HTTP requests. So if the registry has not changed, we will not update, we will not download the registry. So there's definitely room for improvement here. I know that the RubyGems people I'm not sure if they are using or if they are going to start to use an append-only file for the registry. So they use HTTP range headers to only fetch what is absolutely required. So if only one package has been added, they don't need to re-download the whole registry. They can only fetch that specific line with that information. So I'm going to talk a bit about how the dependency resolution algorithm works. And this is what, so we feed this algorithm with all the dependencies that we have found during the first converter run, all the hex dependencies that we have found. And to explain this, I need to talk a bit about the terminology that we're using here. We use pending requests for hex packages that we have not yet processed. And we call, and the packages that we have processed are called activated packages. So we start at number one here and add all the dependencies that we found during the converging. And we add them as pending requests. So what we start doing is that we take the next pending request. We find the latest matching release, i.e. the latest version of the package. And we compare it against all the packages that we have activated to ensure that we can find a version where all the version requirements match. If we don't find such a match, we need to backtrack to a later stage, which I will explain in just a moment. So if we found a matching version, we will activate that package. So we add it to the list of activated packages. We will add the dependencies, i.e. the children of that package to the list of pending requests. And now we save the state for backtracking. So if we at some point later fail to find a matching version of the package, we can backtrack to this place. And we can choose a different version for this package and hopefully be able to continue with the dependency resolution. So as you can imagine, this algorithm has very high time complexity. So that might be a problem in the future. Today I have not seen any combination of packages that have caused this algorithm to run away in time. The pros with this algorithm is that if there is a solution, we will always find it, which is good of course. Another problem though is that since we're doing this, since if we start to fail with the dependency solution, we will start doing this kind of backtracking all the time. So it's hard to find the place where we failed in the resolution. So if the resolution fails, it's very hard to tell the user where the problem is because the problem is a combination of all the packages you're using. So let's continue with the last stage of the DevSket command, which is to fetch the hex packages. The packages are packaged into torbles, which I will go into more detail of in just a moment. And just like with the registry file, the torbles are cached locally and we will only download them if they actually changed. We've also recently started to do parallel downloads of the torbles to increase the performance. And even before we started doing that, fetching the torbles was much, much faster than downloading JIT projects. So this is what a torbol looks like. So it's named the package name and the version. And the dark green colors here are the files in the torbol. So first we have the version file, which is just the version of the torbol. So if something changes in how the torbol is represented, we need to increment this number. We also have a metadata file which contains the package name, the version, description and so on. We also have the checksum that of course is a checksum of all the files. And we have the contents file, which is all the files that the package author decided to include with the package. And this file is, as you can see, it is compressed. It basically contains all the source code, the mixed file of the project, possible config files and so on that are required to compile and run the package. And the reason why we bundle source code instead of compile modules is because as I said previously, we do not ensure binary backwards compatibility for Elixir. So we need to recompile, so we need to compile the dependency every time we fetch it. Additionally, you may have config files or you may have a specific combination of dependencies that changes the compilation process. So we basically need to recompile every time. So next, I'm going to talk about the server side of things. The server is at the urlx.pm. So first, I'm going to talk about how publishing a package works. So we start in the top left corner where you see the client. When the client calls the Higgs publish task, we compile the tarbol and we upload it to the HTTP API. And the API server will validate the tarbol to see if everything is correct. We check the metadata, we check the package version and so on. And then we need to, and then we upload the tarbol to a content delivery network. And we rebuild the ETS registry file and save it to a file and uploads that as well to the same content delivery network. So we have the HTTP API. It is built with the plug, which is a library for, it's basically a library for interfacing between web applications. And it's built with ECTO, which is a library that is a persistence layer. There's back against a Postgres database. So this is really a case of dogfruiting. Jose built the plug and I built ECTO. And I think the Higgs API was one of the first, if not the first production system that used plug. And even though our plug is a very powerful library, it is not a complete web framework or web library to build an API. And it provides some convenience functionality to make, it provides some convenience to functionality though, such as a router and plugs for parsing and uploading files. But there are a few things you need to implement yourself. HTTP caching, authentication. You need to provide your own parsers for any serialization formats you want to use. But what is good is that the plug architecture makes it very easy to add such functionality to yourself because plug builds on, it wants you to build units of code called plugs that you can compose to build a full HTTP API. And we're starting to see a few web frameworks being developed today and in the future, we may want to move Higgs to one of these frameworks. But what I would really like to see is smaller libraries emerge that build on top of plug to provide some specific functionality. For example, we could have a small library for handling authentication or for handling HTTP caching. And because it's so easy to compose plugs, they could be used to build a full web application, right? So I'm going to talk about the elixir format that we are using, which is a serialization format, just like JSON, which serializes to elixir terms, which is basically elixir code that doesn't have any logic, it just has the literal values. And we use it when we have the command line client that is communicating, when it's communicating with the HTTP API. And the reason why we are not using JSON is because when Higgs was started to being developed, we thought that it was going to be merged into mix at some point later, and then we can't add a dependency to an external library. Additionally, it's important to keep the binary size of the Higgs archive small if we are going to expect most elixir uses to install it. So we're using this format to communicate. And serialization is pretty easy. It is basically a call to the inspect protocol, which inspect protocol takes the data structure and converts it to its string representation. So that's pretty easy. De-serialization is the evaluation of the string. But we can't de-serialize any elixir code. So because we don't, of course, we don't want code evaluation on the server. So first we need to make sure that the string is safe to evaluate. And we need to make sure that we reject all code that is not literal terms. So what we do is we just parse the string and we traverse the AST to make sure we don't have any code that is not literal values. But there are issues with using a actively be actively developed language as your serialization format. Because when the language changes, so does your format. And you never want your serialization format to change. And this happened with the introduction of maps. And with the introduction of maps, we deprecated list dicts. So now to support the older Higgs clients which uses list dicts, we need to traverse the data to convert all the list dicts to maps on the server. And this is not essentially a problem with this specific format because if we for example use term to binary which is the Erlang provided binary serialization format for all terms and it's issue as well. But it's something if I was able to redo this, I would have probably chosen some other solution for the serialization. So here's the Higgs website. It is meant to be a place to find and browse for packages. And being able to easily find a package that solves your specific problem is very important for the package manager to be used. If you can't find the package you're looking for, you're not going to use the package manager. So reason that it's support for full text search of the package description. So if there's a JSON package for example, without JSON in the name, you can still search for JSON and find the any packages that matches that search term. I would also like the ability here to sort after number of downloads so that it's easier to find the more popular packages. And on the website we also have some guides. We have a user guide. We have a guide for publishing packages and documentation for all the mixed tasks that takes provide. So the website part of the server is like the API also built with plug. The design is a simple Twitter bootstrap design which I hope we can change in the future. Since we display user-private content on the website, for example the packages descriptions, we need a way to automatically HTML escape strings in the templating system that we're using. So we had to add this to the EEX templating system and Jose and Chris have recently worked to improve the performance of these templates. So they're really fast now, which is nice. And in the future, I hope that we can extract this to a library with other HTML helpers and convenience functions. So next on to the download statistics. This is a feature I really like. So if you've been to the EEX website, you have seen that we provide daily, weekly and total download statistics for all packages and releases. And I think this is an important feature because it shows the package authors that their package is used. And developers love to have their stuff used and they love numbers and metrics and so on. So I think that's a real nice use feature. And calculating the download statistics, as you've seen the tar balls are hosted on a content delivery network. So calculating the download statistics is a daily job that fetches the, we're using Amazon S3 for the content delivery. So it fetches the S3 logs from the previous day and we parse them and insert them into the database. The Amazon logs are really useful in other ways because they include the user agent. And the user agent includes the Elixir version and the X version. So we can see the versions of Elixir that people are using and the version that Hicks people are using and so on. So that's really nice insight that we're getting. So finally, I want to talk a bit about future of Hicks and things I want to add or improve. First off, we are going to support installable executables. Today you can only use Hicks packages as dependencies. We want to support installable executables through eScripts and eScripts are basically like a code archive. It contains a compiled code that is prefixed and it's a zip file that contains compiled code but it's prefixed with a script and a shebang so that you can execute this file. You can run it from the command line. And initially I wanted to support this just in Hicks but we found that we can support eScripts, installing eScripts through mix directly. And so we use the existing functionality to fetch dependencies in mix. And we have tooling, we have had tooling in mix for some time to build eScripts. So as you can see here, at the top, we have the usual definitions for the penises. At the top we have a, at the very top we have a jit dependency. And at the second line we have a Hicks dependency. So we want to use this way to install eScripts as well. So if you look at the line below, we have the proposal for how this task will look like. So you just call eScript install and the SCM that you're using to fetch a dependency. You will also give to this task, so you give the URL just like you define your dependency. And you can do the same thing for Hicks packages of course. So if we have fetched this project we can just run the task to build the eScript and we can then copy it to your path so that the executable is available to you. And if you want more information about this, there is a proposal on the core mailing list about this. So Hicks is meant to be a package manager for all beam languages. So not just for Elixir, we want to support Erlang as well. And today you can publish packages for any language as long as you have Elixir installed. But that's where the support for other languages end today. In the future we want to provide proper tools for publishing and fetching packages for Erlang users. Also want to provide community-owned packages. So this is an idea we had very recently. So we have problems that jit dependencies of Hicks packages doesn't work because if you remember earlier, the mixed-converger that fetches the jit packages, the jit dependencies runs before the dependencies resolution. So we can't, and because the dependency resolver can only work with known packages, we need to know all the versions, we need to know the dependencies and the version requirements on the dependency. And because of that, we cannot really work with jit dependencies. And we realize that this might be an issue for some projects that are not yet on Hicks. So because the author may not have published it yet or may not want to publish it. So if we, for example, have a bunch of projects that depend on a web server, on a particular web server that have not been published it can become a community-owned package instead. And creating a community-owned package would just be to send a pull request to some specific repository that pull request would contain a metadata file that describes the package. And when a new release is made of the package, you can make another pull request that increments the version and we could have automatic tooling that from the pull request after it's merged, we will publish it as a Hicks package. All right, that's it for me. Do you have any questions? So it seems pretty straightforward to run a private Hicks server. Could you resolve dependencies with multiple sources? So say you have a private server and then you also use the public server. There's no way today to have multiple sources of packages. You can use your own private Hicks server, which could be a superset of the main repository or the main registry. I want to support multiple sources in the future. Yeah, yeah, definitely. Hi, I've got two questions. One, what made you decide to not add Hicks to Mix directly and keep it separate? And two, do you maybe know if your dependency resolution algorithm differs from the one that Ruby's bundler uses? And if so, how? Thank you. So the idea at the beginning was to merge Hicks into Mix eventually, but we're approaching 1.0 very soon for Elixir. And Mix is bundled with Elixir, of course. And Mix is, at this point, more mature than Hicks. So we want to still release new versions of Hicks and we might change stuff and so on. So at this point, we can't really merge it into Mix. And the second question was how the dependency resolution algorithms differs from RubyGems, right? And bundler, so it's very close to that algorithm. It's based on that algorithm essentially. I don't know if there's any major changes. I don't think so. We support some additional stuff that I'm not sure bundle supports. For example, bundle or RubyGem supports because I developed them independently. For example, overriding dependencies. We had to add to the algorithm support for that. So it's still very close, but it differs in a few places. And I think we also support optional dependencies which you don't really have in those, yeah. Something that I miss going from, I used to do build engineer stuff with RPM and DEB is the like aliases or pseudo dependencies of, you can have a dependency that any other package can say they fulfill, but they don't actually have to be that dependency. So you can be like, I fulfilled Java and OpenJDK can supply that or like, so that multiple libraries could implement the same module name and you could pick whose implementation you like the best then. And you don't have to worry about you pick which one you like, you don't have to, or they supply a given registered, I guess, service, is that the term in Erling? So like you supply an endpoint that you can talk to or a PID that they can talk to, and anyone can supply it, but you just say you supply it so they can resolve that dependency and you don't have to worry about, that guy got to it first, he has that name, no one can implement that ever again because the names will collide. All right, okay. So we don't have support for that kind of semantics in Hex and I'm not really sure if I want to add it because Elixir is a dynamic language. So imposing those kind of restraints, restrictions and so on may not be suitable for Elixir, but we could talk about the, I'm sorry, we could talk a bit more later. Sounds like an interesting idea. You talked a little bit about the problems you had with serialization and deserialization and you said that if you could go back you would change things. I was wondering if you had any ideas about what you would have used? No, not really. Using the Elixir, the Erlan provided serialization format to serialize to a binary format would probably have been easier because you wouldn't have to implement such stuff to shake if something is safe code basically. But it would not have solved the issue which is the main issue I have where we have to support older clients which use different data structures basically. The best way would probably have been to include some kind of JSON or probably a simpler serialization format directly implemented in Hex instead. Yeah, just a simple question. Would including Hex in Mix make it more difficult to use Hex for non-Elixir languages? Would you need it all anyway in that sense? That's a good question. Actually, using Hex for other languages is a very good reason for not including it in Mix because the idea today to support Erlang tooling is to have basically build an e-script of Hex that implements the functionality that Mix has to run tasks and so on. So we use the existing code in Hex and basically builds a wrapper around that. And that wrapper code would have to be, I mean, there's no reason for it to be in Elixir core because it has nothing to do with Elixir. So if we were to merge Hex into Mix we would still have to have code outside of the core basically. First, I'd like to really thank you for making this because it's thankless and miserable to make a dependency resolver. I wrote one for a different community and it's miserable. So thank you very much. So I have two questions. One of them is you began briefly to talk about community-run packages. And that sounds amazing because the greatest problem right now in a large application is when you have Hex dependencies that have Git dependencies and you have to document that and that's a tough deal. And I was wondering if you could elaborate a little bit more on that. Will it look like a homebrew Git repository or something like that? Yeah, it will be very similar to homebrew. Each package would basically have a metadata file which describes which is the metadata you provide in your Mix file today when you publish a package. So it's the description, the version package name and so on. And it will also need to provide, so if you want to automate this, it would also need to provide information on how to fetch the project. So a Git URL and probably the tag so that we can automate this. But yeah, very similar to homebrew. That sounds great. Second question is optional dependencies. What does that look like when you run Mix steps Git? How do you put the ownership on the user to know how they're fetching that? It would change quite a bit, right? Right, so plug is a good example of this. Plug is a interface between web applications and web servers. So we can have a common interface for a bunch of different web servers. And when developing plug as the developer of plug, you want to use all the web servers basically to test them and to test all the different adapters. So we mark, but people that use plug as a dependency only want a specific web server. So we mark all the web servers that plug supports, we mark as optional. So unless you add a specific web server as a dependency besides plug, it will not fetch. So plug may support like 10 web servers and you don't want to add those dependencies to your project. It will only add the dependency that you want to use. But what's good with this with optional dependencies is that we can use the version requirement that plug has on that web server. So in the dependency resolution algorithm where we ensure that we have compatible packages, we can use that version requirement as well. All right. Hey, let's, Eric's, you know, he's written Hex and he's a core contributor and he traveled all the way from Europe to Sarah's time with us. Let's give him a big hand.