 Okay, should we start? Thank you very much for coming. This talk is going to be about Jigsaw in Debian. My name is Tom Marble. With me to present is Guillaume Mazogier. And also with us in the audience is Sebeste Lytlu. And we have all been collaborating on Jigsaw for Debian. This has been an interest of mine for some time. And let me tell you what we're going to cover in this talk. We're going to give you a brief overview of what Jigsaw is and why it might be interesting. Then we're going to discuss Google Summer of Code and the project that Guillaume has been working on. Sebeste and I are his mentors for Summer of Code. I'm going to talk a little bit about some of the new features in JDK 7 and JDK 8, including some notes from the JVM Language Summit, which took place at Oracle last week. And the next steps in our progress towards packaging Jigsaw for Debian. So first of all, what is Jigsaw? Jigsaw is the modularization of the JDK. The way that upstream handles version control is they use Mercurial as a VCS and maintain a number of different forests, as they say, for different parts of the project. And there's one of them, which is Mercurial Forest for Jigsaw, which is the name of the modular JDK. The idea is basically to break apart the JDK into multiple interdependent parts for a number of reasons, but mainly because the JDK has grown over time and has become quite massive. Some of you may know due to the inner workings of the JDK that all of the classes, the libraries that you expect when you run a Java program, are in one big jar file called RT.jar. And what Jigsaw does is it breaks it up into many, many smaller jar files. As a consequence, we'll be replacing the class path with the module path, which is similar in purpose. But hopefully, and this is one of the other main benefits, is hopefully this will get us out of the jar dependency hell that we've been living in, especially when we find upstreams that have been including library jars and their distributions and things like this. So here's a very, very simplified dependency graph for what the JDK looks like in terms of functionality. All of the functionality rests on the base module, and there are a number of things which depend directly on that, but there are some parts of the JDK which are a little bit further away in the graph. For example, you'll find that Corba and Kerberos are kind of at peripheral nodes of the graph, and if your program doesn't need those, there's no point in you loading those and parsing those jar entries for the startup of your program. Similarly, you'll see that there are a lot of XML-related functionalities or web service functionalities that may not be required for certain programs. And of course, if you're running a server-oriented program, there's no point in pulling in swing in all of the GUI-related classes. So this actually should make it easier to maintain and easier to debug as well as faster. So as I mentioned, there are a number of reasons why the modularized JDK is interesting. One is the size, the overall size, but most importantly, the size that you actually need in terms of depends for your packages. As a consequence, also the runtime memory needs should be much more modest and startups should be much quicker. As a matter of fact, Upstream has even considered that with a modularized JDK like this, we could imagine Java SE, or standard edition, running even on memory constrained mobile types of devices. And this represents a huge shift from the past where there were, as you may recall, there used to be Java mobile edition, standard edition, and enterprise edition. And now the modularized JDK has a chance to reach down into what used to be only mobile type of targets. So just like Debian handles version dependencies between packages, Jigsaw handles version dependencies between modules. Our challenge in our opportunity is really to coordinate these two different dependency graphs between the JDK itself and then also user line programs and their corresponding Debian packages. Another reason that Jigsaw is interesting has to do with some of the new features that have come along since, for example, OpenJDK6, which is currently in the archive. And I'm going to talk about the new features in JDK7 and JDK8 a little bit later in the presentation. And another really interesting thing for us, hopefully that we'll get out of modularizing the JDK is a simpler boarding and bootstrapping process so that when we go to new architectures, it should be easier to bring up the JDK on those architectures. And maybe we can even simplify, hopefully, some of our build depends for Java itself, potentially even eliminating some of the need for either ECJ or GCJ. So why is Debian such an interesting fit? It turns out that Debian, I think, is really unique among the primary operating system distributions in having such a strong adherence to modular decomposition. And this is really the problem that the modularized JDK is trying to solve. Although it's not an obvious and quick fit because there are a couple of issues, one of them happens to be with version syntax. The current version syntax that is permitted by Jigsaw is very liberal and doesn't adhere to Debian policy, and we've notified upstream of that. But what's really interesting for us is sometimes in Debian we get interested in tackling solutions for us that also will benefit other distributions and even other operating systems. And this is a case where Java, because it's a cross-platform language, needs to work not just for Debian, not just for GNU Linux, but it needs to work on all platforms. And so to get to the point that Jigsaw is today, there have been some compromises and harmonization of some of the different approaches to modularity, including OSGI and Maven and, for example, other distribution techniques like Red Hat package management and so on. The plan of record, just so you know, is that it is planned that there will be compatibility between OSGI and Jigsaw. However, it's not exactly clear how that will be implemented. The original idea of Jigsaw is that you wouldn't have a lot of annotations in your class as OSGI would advocate. But what was really interesting in digging into this, we find that RPM, contrary to DEB, doesn't have a tight specification for the version syntax of packages. And I've confirmed this with a number of people. It's really kind of hard to believe, but I have not found, for example, a BNR grammar that expresses what an RPM package version could be. And I'd be really glad to be proved wrong. If anyone can point me to that, I'd really appreciate it. The reason that's so important is that when we start talking about dependency resolvers, having the version syntax accurate is really an essential prerequisite. So what we might end up doing is we might actually end up advocating some version syntax changes to upstream and or some adaptations or mappings of Jigsaw versions to Debian package versions. Upstream is aware of our project and is looking to us for some really good feedback. For a long time, upstream has made an attempt to package Java modules as Debian packages. Last year at DEB Conf 10, I spoke about this and why that current upstream packaging is inadequate. And I think that largely what they're doing is they're using their own tools for creating DEBs, but they don't adhere to Debian policy and most specifically do not use the DEB helper set of tools. And that's what we're trying to work on right now. And the timing is really important. Why Jigsaw now in Debian is that, as you may have heard, JDK7 was just officially released yesterday and now we're right in the middle of the development cycle for JDK8. And our experiences here could very largely influence upstream for all platforms. So with that, let me turn it over to Guillaume who will tell you a little bit about his Google Summer of Code project. So as you know now, Jigsaw has been selected as one of the Google Summer of Code in Debian, so I'm working on it and I'm still working on it now. The Google Summer of Code, we divided the Google Summer of Code into five parts and we're actually at the third part actually. So the first part of my work was to know how to build Jigsaw. Then we packaged some missing dependencies and send them into the archive. We also wrote some examples on how to use Jigsaw and then we are going to package Jigsaw and test it in Debian. So what we have done so far is building Jigsaw on AMD64. Jigsaw is a big piece of code, so it takes time to build and it takes time to run the test. We also packaged two packages that allow us to run the test. So we packaged GTRNES and GTREG. GTRNES is a library used by GTREG and GTREG is used to actually run the test inside of Jigsaw. After running the test, we had something like that. So 3,484 tests in Jigsaw and only 29 of them failed after running the full test suite. So I think it's pretty good. It's a pretty good result. Thank you. Most of the failures are due to network hard coding hosts which are machines inside Oracle so that we cannot reach by ourselves. And we also identified some failures due to GUNET and the ICT hackers helped us a lot to identify these tests and now they are aware of these tests and they may work on that. So now we have a repository of Jigsaw in Alioth which is a Git repository where you can find Jigsaw and all the scripts needed to build it. There is also a Debian directory which is not used yet because it's only a copy of the OpenJDK7 Debian directory. We applied the patch of Alan Bateman which is working on another version of the patch we applied which consists of using exploited modules inside of Jigsaw so we can have modules everywhere on the system and still use it, still use them, sorry. And then we run the test again to see if the patch didn't break anything. Just a word about that. The exploded modules patch is fairly important because without that you could only build and run against the system module library and not module library on disk so that's why we felt that applying this patch was really important to get some experience with Jigsaw. After that we were able to write some example modules so we are going to see one of the examples just now. Jigsaw introduced a new file which describes what is a module and what use a module inside of a Java project. So this file is called module-info.java and contains generally this kind of things. As you can see here we declare a module with module then its name and after its version number. And here is the next two lines are the module required for required to make the top module work and then we have the main class of the module to run when we launch the module. Let me just interject. How many of you in the audience have used or packaged Java libraries or applications? You're probably familiar with the Java helper tools that are there to help you package your Java libraries and applications and you can really imagine what we'll do to Java helper with Jigsaw is we'll expand it to read the module-info files and we'll extract the depends and the versioning out of these files and put that into the control file. So Jigsaw also introduced due to the fact that we can create modules a new file tree, file source tree I guess. This source tree contains now the name of each module that we can see after the modules directory. All modules must have modules-info.java which like we have seen before what modules are needed to run the actual module and then we can compile a module and we can compile actually the two modules in one command line but we don't need the class pass of Java anymore the class pass is actually replaced by the module pass. So Java C is still the compiler dash D is the directory where the modules will be installed will be compiled dash modules pass is the module pass where the modules of the GVM are and then source pass is where the source are and then the Java files. So with this we are going to have class files in the modules directory and now modules have to be installed on the system of course. So we have to create a module library. A module library can be a user module library actually it's not the system module library and you can create it with gmod gmod create a module library called mlib and mlib will contain the modules that we just compiled. A module library depends is linked actually to a parent module library that we can specify with dash p with a dash p option but here we don't specify the dash p option so the module library will be linked to the system module library which contains actually the modules of the GVM then we install the modules with gmod installed and here we install the two modules called org astro and cone greetings inside the mlib module library. After that we can just run the modules and get our nice hello world. With this we can define a module library for the system and a module library for the user so in the future JXO packaging I guess we will be able to separate GVM modules from applications modules installed by Java applications and here Tom is going to speak to you about the next step we are going to do during this course. Thanks. So as Guillaume pointed out there is a system module library and there are user libraries as well so our current thinking right now as to where we will locate these things as you probably know the JVMs live under right now user live JVM name and so the thought is perhaps the JXO module library will simply be at that path under modules and that just as we put jars for Java libraries and user share Java we will put shared modules module libraries and user share Java modules so that is subject to some change but that is our current thinking right at the moment and one of the other things that I mentioned earlier is we are going to review bootstrapping and how we actually build JXO So another very important thing that we need to do in the work on packaging for Debian is we want to integrate the fine work and patches of the ICT project and I will say a little bit more about that on the next slide. I also want to benefit from the current OpenJDK packaging that has been done that is already adding a lot of specific features for Debian and I think we want to consider adding some new features as well and the one in particular that I am thinking about is coming from what is called the MVLM project or the multi-language virtual machine project which is that project is really geared at supporting different languages on top of the JVM like JRuby, Scala, Clojure and so on and the patch that I am most interested in is TLC or the tail call optimization patch For a lot of these dynamic libraries on top of the JVM they often use recursion as an idiom and with the tail call patch we should be able to do that without growing the stack as much and that should result in much better performance and better memory or less memory use Then of course we are going to take the upstream test suite based on JTreg and we are going to run that again and make sure that we are able to pass completely and show that we haven't broken anything this is one of the reasons that we invested the time up front and making sure that we had the test suite ready to go and then we are also going to performance test it and see actually how well it does versus the classic JDK and then we are going to do our best to push our changes upstream both to ICT and then ultimately to Oracle So I think it is worth saying just a word Any of you are familiar with the ICT project? Okay, so good, there are a few of you that are aware of it but some of you that aren't aware of it ICT is a purely community based collaborative effort which is really I think the best way to describe it is a build harness for OpenJDK Originally when Sun liberated Java and I can say we because I used to be at Sun at the time we shipped Java requiring at build time some binary plugs basically you needed some non-free build depends in order to build OpenJDK and of course the free software community wouldn't accept that and so the solution was to develop ICT such that OpenJDK could be built from purely free software components Fortunately, that need for closing non-free plugs is no longer there, those have all been patched or closed upstream but there are still some things ICT continues to provide immense value to the community because it adds things like pulse audio support as well as Java plug-in and web start much to our disappointment Oracle has decided has not yet decided to liberate Java plug-in and web start and so the community is adopting open source tools to solve those needs and certainly for us at Debian another thing that we benefit from with ICT is support for additional platforms and that's done through what's called the zero interpreter and the shark just-in-time compiler the zero interpreter is called that because it builds on Linux without any assembly code and it's kind of slow but it's a great way of getting on to new platforms and then shark is based on that and depends on LLVM and we have had multiple discussions with ICT about collaborating and helping the community at large not just Debian package jigsaw and as Guillaume had mentioned he had some really good collaboration with ICT on understanding why some of the test failures occurred and closing those bugs so now let me just say a couple of words about what's coming in JDK7 and JDK8 because I think that it'll be of interest to this group and is very apropos to jigsaws you'll see in a second JDK7 adds a lot of new features to the JDK but there's some that I think are worth calling out most notably extensions to the concurrency models with the fork join framework so as Java is used for progressively larger enterprise class applications on very large machines this set of tools will help developers author programs that can be easily decomposed on multiprocessor systems and then coalesce those results so that's really an extension of the concurrency utilities and then there are a lot of other things a new IO library, class loader enhancements new improvements to exceptions and one in particular that I want to say a little bit more about is this thing called JSR 292 or invoke dynamic last week at the JVM language summit there was a lot of discussion about invoke dynamic and I don't want to try and recap all of that for you but I want to give you a couple of highlights the basic idea is in supporting languages built on top of the JVM there is a need to have effectively the idea of a function pointer and the idea is you want to connect Java code that's sort of classically in the JDK with bytecode that was created by your, for example, your JRuby compiler or have your JRuby programs call into the Java libraries without a lot of extra overhead and it turns out that using this new bytecode that was added to the Java language specification we now can call functions across that sort of native Java or basic Java and, you know, derived language boundary very quickly and that reduces the amount of sort of boilerplate bytecode that's required for that function interface but I think the most important and most compelling feature of invoke dynamic is it allows the hotspot compiler to do its magic optimizations across that language boundary so, for example, you could write a really complicated and interesting program in JRuby or, for example, in Scala and if it's wired up using invoke dynamic hotspot could actually see through from the Java code into your Scala code and do all of the cool things that it's doing escape analysis, inlining, loop enrolling method optimization and substitution and so on and that I think is very interesting and I think that this is something to look for at JDK 7. So what about getting this cool stuff in Debian? Well, it turns out that there is already some packaging of open JDK 7 in the experimental repository and that work was done by Matias and Debian and we have some issues right now building on MIPS and MIPSL but just before this talk we were talking in the hallway and we think we might have a potential fix for this so stay tuned we're hoping to close that bug. There's work ongoing and porting that to KFreebsd and part of what we're thinking is that once we get this building on all the architecture is what we'd like to do is rebuild the archive or at least the Java parts of the archive with open JDK 7 with the hope that we can convince the release team to make this a release goal for Weezy. So now JDK 8. It turns out that Jigsaw or the modularization of the JDK is one of the primary drivers for JDK 8. So there are a lot of features on the table that have been discussed in a lot of enhancements to the system but Jigsaw is one that's absolutely critical for JDK 8 so we're really ahead of the next generation of the upstream release here. One of the really interesting things that I learned last week is that the build system for Java and JDK 8 is going to be completely redone right now if you build Jigsaw what happens is you go through sort of a classic multi-stage build of classic Java and then there's a make modules target and you kind of do all the work to create the modules as a second sort of post-processing step that's going to be eliminated. The build will start by building the base modules and then building up additional modules in the order of the dependency graph that you saw earlier. So that should make the build a lot quicker and a lot more efficient. Also the build is going to be streamlined so it's hopefully just a lot cleaner as well I mentioned the multi-language virtual machine patches hopefully we can add the tail call patch I think that it would be really interesting to do that and also to do some performance analysis on the impact of that with and without that patch to see does it bias as much as we think it does in fact on that point I just wanted to say that last week there was an awful lot of interest in performance measurement for Java generally but specifically about invoke dynamic and some of these other kinds of patches for example in discussing this with upstream some of the Oracle people last week they said we would love to make the case for adding the tail call patch that already exists to upstream or to the JDK8 builds but people need to be convinced that this actually is going to lead to a big performance improvement so I think that if we can measure it and we can demonstrate that it could very well get integrated by the time the JDK8 is released and also just measuring the performance of invoke dynamic is quite tricky so that's I think why there's so much interest in the broader community about performance testing and one of the issues that did come up is that Java is not the easiest language to do job control with when it comes to let's say designing a performance test harness and so there's a lot of discussion about reviewing or extending process builder to do better job control especially on platforms like Windows that do a particularly poor job of that. So with this let me call your attention to a couple of points. Guillaume's examples are largely taken from the jigsaw quick start guide which is on the open JDK website also there are most all of the presentations from the JVM language summit from last week are online at this page and if you're interested in jigsaw or Java generally in Debian I'd encourage you to send a mail to the Debian Java mailing list and with that that concludes our presentation. Other questions? Has anyone tried to package a Java upstream that has all kinds of build dependencies baked into their server? Okay. You can see the veteran Java guys have all struggled with that. Wouter? So one of my upstreams has decided to reimplement the JDK application in Java and I'm not entirely familiar with what the current state of things is. They're using Maven and everything which I know wants to download stuff. It's actually a recommended way of doing that in Debian that you can point me to or point people to. For doing I didn't quite hear. The thing is so that they are reimplementing some graphical application that now wants to in Java and then I want to use Maven in the build system. As I understand it Maven wants to download stuff during the build. This is also a very common problem or challenge in dealing with Java upstreams is they typically like to use Maven and what Maven is doing is it's doing two things for you. It's acting as effectively your make file, it's your build instructions, but it's also downloading build dependencies at build time and that's the part that's really problematic for Debian. As many of you may know Maven 2 has been packaged for Debian and my understanding is that that feature of downloading things has been disabled so that obviously we don't violate policy on our build demons. So this is an ongoing challenge. I think that one approach to doing that is to sort of basically manually tease out the build depends within that build file so that you don't need them. Another approach would be to use Maven as it's packaged in the archive now and I'm wondering if I could ask Torsten to maybe say a comment about Maven support. Torsten do you have any additional thoughts to add about how we should handle that? No. I cannot add something useful here. It would be certainly interesting to build or to get other upstream project to use it as a way to modelize their project or to use it as build depends and as run time dependencies. But I don't know if some other project is already thinking about it. Do you know something about this? I think it's really too early to expect upstreams to do that, but I think that we need a better approach of handling this case where upstreams are using Maven because it is a very common problem. Other questions? Okay well thank you very much for coming. Thank you.