 Okay, welcome back to the Java track. We have now Austin Bernard that is going to talk a little bit about his dreams with Java. And, well, Austin is one of our key contributors. He does a lot of sponsoring and replying to emails and stuff. So, really great to give him a talk today. Yeah, thank you. Yes, I want to talk a little bit about Java packaging nightmare and what is so special about packaging Java or software button in Java. My topics will be first a bit about the motivation of the talk, about the history of the Java, the current state of Java in Debian. We have already heard something, but I will give you a half overview. Then the biggest part will be the problems in upstream packaging and some proposed solutions how it can be fixed. And a short slide about packaging with Apache Maven in addition to the Java helper. Question and sounds. But I won't talk about the programming language Java or any details about the programming language. There's nothing wrong in the language itself. It's more or less the open source community in the Java world that's a bit different than what we know from other languages. The motivation for my post was a blog post by Vincent which is a contributor to Debian. With the title, the Java Packaging Nightmare. I don't expect you to read all the text now. I've highlighted some of the most important pieces and he writes about the problem of the upstream quotes that he does have to disable parts of it to get it compiled at all. That's a lot of work to find and build dependencies of some code and the dependencies of the dependencies and their dependencies. So it's quite a recursive approach to get it fixed. And yes, the major problem behind that is that Java is distributed often in binary platform independent format in a binary platform independent format which makes it very easy to copy the JAR files and to use them, but that's not acceptable for Debian main. We cannot just copy binary JAR files into our main distribution. We must build everything from source. We must check every source package for correct licensing and so on. Some of the history of free Java. The SanJDK was non-free software for most of its time since it was released in 1995. But a lot of free software has been already written in the past, something like Tomcat which was released in 1999, JBoss Eclipse which was released in 2001, Hibernate just to name some of the free Java project that exists for some time. There have been several alternative Java development kits been written. Development kits, runtime environments and the virtual machines. Some of them are Cafe GZJ which has been released in 1999 too, quite old. Class path, Eclipse compiler. All of them got better and better with time but never as good as the original SanJava stuff. So we always had the problem that some older code could be built easily with the free alternatives but newer code always started using features from San's implementation and that's why we always had the problem building a newer code for Debian. Everything changed with the relicings of the SanJava development kit in 2006 under the GPL and that's the name OpenJDK. It wasn't directly usable for us. Here's a question. I just wanted to point out that at the end of 2006 I was trying to announce the choice of GPL as the license for OpenJDK but the code was not available until 2001 and 2007 and I'll add that Gen2 packaged OpenJDK within three hours of their code drop and the first build of ICT was within 48 hours of the first code drop. Yeah. The other thing I would like to say was this article the Java trap by Richard Stolman that kind of put a hold on free software and Java at the very beginning. That's fairly relevant to free software and Java. I mean the release of the OpenJDK kind of opened up that issue. Yeah. As you already told us, the ICT project started in 2007 and submitted additional patches in the build system which we are using today. Even if our packages are called OpenJDK but they are using OpenJDK with the ICT patches. And we got some free license code libraries for the Java Enterprise world which was a glassfish relicensing which happened a bit early I think. And some other projects like Apache Geronimo that provide similar core libraries. Most of them are important to build dependencies for other stuff. So even if you don't know what Java Enterprise is it might be still important. Such libraries might be still important for other tools and libraries to get them packaged. So the current state in Davion is that the Java packaging team is maintaining more than 500 source packages today. And there are other people maintaining Java packages but I haven't counted them. It's quite growing, fast growing. I think we were about, it was about 400 source packages at the beginning of this year. So we are now more than 100 more. Our tool chain is in a good shape at least on our major architectures. We have some popular packages in Davion, Tomcat, Eclipse, the Jarjok, Jukebox as a desktop application and some libraries like Hibernate and some more stuff but I think that are the most important ones or the most popular ones. We have no Java Enterprise application server but some core libraries from GlassFish, JBoss and Geronimo. We have no major web application as far as I know. Something like a library is not packaged. It's quite horrible to get all those libraries and plugins built. We have a good cooperation with Ubuntu but not with other Linux distributions as far as I know and we have almost no interest by upstream developers except in some cases like Tomcat, maybe Eclipse, I don't know. But in most cases from all those more than 500 packages, almost nobody of the upstream maintainers is really interested in Davion. Those 500 packages or are you just talking about that? Across all those 500 packages, I think, if you have contact to 10 upstream, I think it's quite a lot. Why do you think that is? Why is that so consistent across so many different Java packages? In the case of mine, they just don't consider Linux a very interesting target for their work and so it just doesn't seem important to them. One of the upstreams I've got there has been wonderfully responsive when I had questions for them but as long as they're not specific to things that are platform oriented. One of the things that I found when we talked to people that I found is that Java upstreams basically or of the opinion, while I'm writing in the platform independent language, are the jars on my website. People can download it. Why do I need to do all of this perceived by them extra work because they aren't thinking forward in terms of how this actually helps them in the long run in order to support your specific thing. If the people want it on Linux then they can go and download it and they just need to pull the jar off the website. So, and I mean there are a lot of upstreams as well who don't think about all the licensing issues and why have a few, they just bundle stuff in a jar and then leave it there. Yeah. The upstream even doesn't want to have the issue of Debian stable that stays so stable. And they say no, we want people to download it from our website, blah, blah, blah. So Debian hostile about the idea of packaging stuff. And response to back reports is quite low. If you get ever an answer it takes long and so we often have to fix Java codes by ourself but even case it's probably much easier for the upstream developer to do that but because we do not get any answer. To try to understand kind of the environment that makes it so that there's so little interest in doing this, thinking about a company like LifeRay that has quite a few people working in the company and a pretty active community and the install is actually pretty easy if you just install LifeRay. What is it that is sort of the explanation within Debian as to why something like LifeRay such a vibrant project like that wouldn't have a package, wouldn't have a lot of upstream support. I don't mean that as an invective, it's you know try constructive. In case of LifeRay I know that they are not even interested in getting patches or something. So that's not really an open source community but that might be a different problem compared to other. LifeRay is a project that for them open source is marketing. It's a marketing argument. I have seen that they don't even accept patches and developments from outside. Yeah, I actually know those guys and it's not a marketing gimmick. They really are committed. Yeah, I've seen it otherwise. The patches from our company haven't been accepted at all. Not even with questions how we could improve them. Just don't integrate external stuff. It sounds like some of the common threads here apart from sometimes a lack of interest or motivation for free software in general but for Linux as a platform, for Debian as a platform. See it sounds like in particular because it's so easy to install already as you say what value does the Debian packaging add from Debian's perspective a lot. So that almost seems like a marketing or education task for the Debian community because I can assure you that the people at LifeRay are not cynically calling themselves open source. Maybe they have operational problems. So it just seems like a little bit of introspection here would go a long way. That's possible but I don't think anyone actually is attributing malice here at all. I mean that's not necessary to sort of explain the disconnect. My observation is I think you're hitting a lot of this on the head map. A lot of it has to do really with this notion that gee what value is this really adding and if you are someone who lives in Debian space and you understand how wonderful life is if you have a strong dependency system and things that are carefully integrated under a consistent policy that every time you get anywhere near Java things you just immediately have this revulsive reaction of oh my god this is all so broken all the time. And I think if you're coming from the Java universe you're accustomed to sort of well I just grab all these things and put them here and they all just work together and why do I need all of this other stuff that I don't completely understand. And it's just a worldview disconnect. I don't have to believe there's any intentional malice or anything like that there. And it might be in a case like this upstream if you have a personal connection that can go a long way just to say hey you know people are very interested in life ray and you know the way that they do this is to have a very clean strong dependency system it actually might make the product healthier and if you are really wanting to go beyond the marketing of open source but really want to integrate let's have a discussion about it. Yeah I think if there's some material that the Debian community has put together really building the strong case you know in a sort of a way that can be replicated across these various up streams I don't mean to single life ray out in particular maybe that would be something that a lot of people could do just take that material to say here are the advantages boom boom boom boom boom and maybe even with some case studies or something if something like that existed that might be something that a lot of companies could be presented with and get more participation like that. I think it's useful to have that kind of information written down in an organized way but as was already said I think a personal connection there counts for a lot more than a document as well. I agree but having a personal connection with a document is obviously the best because then it's something that anyone you can replicate anytime anyone has any good connection you can say here's this thing and you don't have to have an amateurish case built for why it's such a great idea you can have a collaborative effort put together with like sound business reasons and sound practical reasons and then have a person who happens to have a connection but may not be an expert at doing the sell of why it's such a good idea to do it be able to hand off something that really is very carefully thought through from a number of perspectives. Yeah so take advantage of the opportunity that we have to work on that and I apologize for continuing to derail your presentation but since we do have an opportunity here maybe with this project in particular I'd ask two questions the first would be you know if we asked people in the life ray projects why would Debian packaging your software be valuable to you? What would they say? And the second question would be could you ask them that? Because it can be hard to develop materials like you're talking about sort of from the inside out when we're swimming in this Debian sea and we don't know that we're in it whereas getting that outside perspective can make it easier. The other part is there've been lots of presentations given over time in lots of different forums about why a strongly dependency managed distribution of software is a good thing. The problem is none of that is specific to the Java experience. And it's clearly not reaching folks in that community so the disconnect here is it's that same sort of we're sitting on opposite sides of the river and how do we get across it to work with each other? As I sit here going well why is it my job to convince them that doing things better is better and on some level it figuring out what the sort of mutual motivation is to actually come up with a more targeted piece of communication that would actually resonate with that audience is where the challenge is. I'm sure I've read in the Debian Wiki document on how to, what are the benefits to upstreams to working with Debian? That document 99% sure has been written. I'm sure I've read it. I think it's just who's gonna connect the people at Life Ray or wherever to the documents that Debian already has. There's a pattern in the Java community which tells us that there's something different, a different need that needs to be addressed here beyond. But this isn't hugely different from the Perl CPAN versus Debian packaging. I mean it's not the first time we've run into this. Let's continue, let's let the speaker continue. Oh, Torsten, you'd actually like to talk some more. Let's continue. I just used Life Ray as an example that we don't have major web apps in Debian. But it would be nice to have it. Sure. Okay, about some details about our toolchain runtime, our current toolchain runtime. OpenJDK is integrated with the ICT patches and it's our default JDK since the last step conf on most Linux architectures. It provides several virtual machines, the Sun Hotspot virtual machine. The port is available to zero interpreter and the experimental shark virtual machine and the Kakao virtual machine from ICT. We are using GZJ JDK on free base day heard in the Linux HPPA platform. We have the ICT browser applet in Squeezy, but not in stable. And we have the non-free Sun JDK from Sun but it's not used for packaging as far as I know. And no longer used as for packaging. We have used it in the past for packaging and control but I think all of them have been moved to main with using OpenJDK or default JDK as the JDK. We have several packaging tools. Java helper and Maven and some homemade make files or scripts that are the tools that build or that compile Java code into the class files. And we have some different specific stuff that modifies the Java files or installs them or whatever, something like CDBS, several Maven helper packages for some and eclipse the help of its part of Java helper but a bit specific for the Debian packaging. Okay, the Debian free software guidelines we have, I think two problems with Java code. The first is free redistribution. We often find undocumented licenses or licenses that don't allow redistribution in Java code if you look closer. That must be removed for Debian main or it must be reverted or replaced by something new. Or we find binary jars without source code. That's the second problem. Sometimes we have discrimination against fields of endeavor. Sun has released in the past source code that couldn't be used in nuclear facilities. But they have fixed most of their licenses. So today you don't find a lot of codes that uses this clause anymore. And there are some in the JSON library, there are the famous should be used for good but not for evil clause, which is just stupid. That are the most violations of the DFSG we see in the Java world. Yeah, licenses of source code are incorrectly documented. I have already told. Sometimes you only find binary only jar files and no reference versus source can be found or where was it downloaded in the first place or some source control system or some address for source control system. There are no license or copyright information in the binary jars. And they are sometimes obviously non-free if you could look a bit closer. Missing modularization is another problem. I will come to it in the next slide or later slide. The official mail repository has no notion of a source package so you can find a lot of binaries but you cannot find out easily if several binaries come from the same source. It's not, just cannot find it out. You can find sometimes source jars with source code but no build system, how useful. You often cannot build them in a simple way like with Java because dependencies are missing or something like that. Or the several source jars come from the same source tree but they are packaged and scattered about around the mail repository so quite unusable. The dependency handling in mail is not usable for Debian not fully usable for Debian and we often find code duplication. Some license failures I have found in the past. The Jbo's web is more or less a fork of Tomcat. There's a top-level license file that mentions LGPL version three but most of the code is actually a Apache license or CDL or GPL license from some Suncodes as part of the just declared as a foreign license in the top-level directory. JRuby is a release 151 shipped with a binary file Constantine jar where no source could be found anywhere in the web during the time of the release. This has been fixed now. It's now available on github.com but during the release it was not available. And Hibernate ships Java persistence 2.0 API and this API cannot redistributed in Debian main because no license is granted for us distributing the specification to third parties. Fortunately we can use the journey more implementation of the Java persistent API from the Apache Software Foundation. Missing modularization. Building modules or depending on modules during build time is quite easy in the good old Z and Z++ world. We have tools like auto make package config and you can run configure with minus minus enable feature A or disable feature or B. You can use package config on your system to find out if some package is available or not and if it's available which arguments you need to build a program that depends on this feature. The same is can be used with Zmake. It's a bit different syntax but more or less you can switch on or switch off features. But such a concept is almost non-existent in the Java world. You have no way to switch on or off some feature in your software even if you're packaging a multi-module system like Hibernate has 10 sub modules. There are some technologies that could be used in many of the profiles could be used. But they are not very popular. They can configure modules to build, sub modules to build and or some plug-ins. You can switch on or switch off plug-ins or configure them and they can be activated by property. So you could just set the properties to switch on a profile or switch off a profile but almost nobody uses this functionality here. In Debian source patching is the best practice. We are just removing parts of the source or changing the build system during build time but it's not really a best practice. It's set what we are using. Here is some example, some XML snippet from Hibernate, the POM XML file where such activation, module activation is really done. What it does, if the property disabled JDK6 modules is not set to true, it builds the entity manager in the GDBZ for testing modules. That's quite good. But it's an example you find very seldom in code. So we can just, the property disabled JDK6 modules if you don't want to build the sub modules because they are maybe too problematic or so. I would wish that we would have more such options in the upstream build systems but they almost do not exist. Another option would be to exclude individual Java files from building because they depend on some non-package library that can be done in Maven quite easily with configuring the Maven compiler plug-in in the way it excludes and such. Yeah, it could be put into a profile and activated by a property as well. What we do not have in the Java world is a preprocessor like CPP. What we could do is filtering source, source filtering before compiling the source but that's a bit awkward. I think I haven't seen a package where we use source filtering. So, but what you will see often is excluding some Java file from building because it just does not build with our build dependencies in Maven. Yeah, I've already told you that in the Maven repository, we often have the problem to find the source code for some Java or POM file. This is strange because the POM XML file already has elements that could be used for more information for some metadata about the package. Or they could just use some XML comments to put the information inside. My proposal for upstream would be to put in every POM XML file, either the source information or a reference where I can find the source information. And what we need in Debian is the name of the package, a short description, a homepage link, access to the source control system. To the issue tracker. The copyright holder would be interesting. What you see here is the name of the elements that already exist in the POM XML schema. It could be used for that. They don't have any element for the copyright holder, but you could use a comment. And the license, there's even a license section. Of course, it wouldn't make sense to duplicate this information, every binary module. That's why I would suggest that they either use the parent element to reference the source POM or mention the source POM in an XML comment again. So the advantage would be that whenever we find such a binary jar, the POM file is embedded in the binary jar by Maven, we could find either the reference to the source POM or all the information we need directly into the XML file. That would be quite nice. And that would be a question you could directly ask upstream developers if they would change their POM XML files in a way that would become useful to us. And of course, an upstream developer should not only check their own source code, they should check all their direct dependencies too if they are correct. Okay, the code application is a problem which we can find quite often even in the Java world. There's a nice Wiki page in the Fedora Wiki about it. It even references the Debian policy. So it's worth reading that. I found four ways for embedding code in Java libraries. The first one is just a plain source copy of some foreign source code. This is easy to handle. We can just remove it and package it separately and reference the separately packaged file in the build process. Another way is including a modified copy of the foreign source code that this is awkward because it opens all maintenance problems like security updates and such things. There is only one way, just ask upstream if they can merge it with the upstream of the foreign code or we could build such source depths, depths that are just containing source code and built depend on those source depths and patch them during build time. This is I'm aware of one package where such exporting is done. And the advantage would be that we could, if there's a security problem or any important update, we could just rebuild the Debian package and we would get the newest version embedded into the binary package. Embedding content of other JAR files can be found quite often because JAR files are just zip files. So they are unpacked and added to the other JAR files. That happens very often and that's simply stupid to do. It would be much better to put the original last files into the class path and use the classes. Either we change the packages to not embed it to foreign JAR files or just rebuild them if necessary. It's not a big security problem or something because just rebuilding the package will just re-embed the newest version of all the dependencies, so it's easier to fix. Sometimes we see build processes that include or embed modify content of other JAR files. That's really the worst thing we can see. So using tools like JARJAR to rewrite class hierarchy and with Java packages, now it becomes a bit difficult. Java packages is now means the directory where the classes are located in Java. It's not the Debian package, it's the Java package. And we see cases where such .object.asm classes are just moved into the jruby.object.asm tree and this is harder to detect and we will get embedded code with another name and which is hard to fix. I think we should try to find such cases by just comparing the name of the classes itself without the package prefix, but it will take some time to do such an analysis, but I think we should do it in Debian to avoid such code duplication. Okay, we could just remove some tools like JARJAR, but I think that wouldn't be the correct way to handle it, I think. Okay. You got me? I was wondering, you were saying that for security updates, it's easy to just rebuild the package and embed everything. Again, how easy is it to build a package that's got a bug in its own code and use all the versions that you used the first time you built it so that you're not changing everything else in the package? No, I was talking about you have a foreign package A and another package B and B embeds everything from A. Exactly, so B has a security flaw in its own code and it was built for stable and you want to do a stable release and find the old version that happened to be on the day that you built it last time, where do you find it? It would be horrible for stable, yeah. So even for stable, it would be horrible. That sort of thing is the reason why we've got this disconnect. We would need a lot of bin and NMUs for fixing such things and yeah. It's easy and unstable, but it would be more difficult for stable. Okay. Now I have a slide about using Apache Maven for packaging in Debian. Maven is a build tool and a repository. We must keep that in mind that now I'm talking about the build tool Maven, primarily. It is still quite new in Debian but we have now many of the artifacts that are the packages called in Maven language that are shipping the POM files now in the Debian package which is required for using them for Maven or which is required by Maven to be used. But aren't still the mostly used packaging tool in the Java world often in combination with CDBS. But sometimes Java helper is becoming more important in DH. We are using the Maven offline mode during build time. Maven itself uses downloads just everything from the Maven central repository during build time but we switch off this online mode for Debian, building Debian packages. We are setting up our own Maven POM repository in the user share Maven repo path. So we can access, so whenever we install a Debian package that ships a POM file, we get another artifact into the Maven repo repository and we can use this offline Maven repository for building other Maven-based packages and that's the way we are doing it in Debian. They don't access any external repositories. We must preprocess the POM XML files because they aren't directly usable by, or the dependency handling is not directly usable. Or sometimes we need to patch out complete artifacts or plug-ins that are used by the upstream developers, but that aren't needed by for the Debian build. Everything is explained in the Maven repository for space vacation. I think we should put it at some time in the Java policy, at least as an addendum. And we have three helper packages for Maven. That's the first one is the Maven repo helper. This helper will just install the POM files and the Java files into the correct location, creating Simlings and doing the preprocess step. But it does not require Maven to build any package. So we could still use either R and or Java helper to build a package and use the Maven repo helper just to install the POM file and to preprocess the POM file to comply with the repository specification. The tool to really build Debian packages in Maven is the Maven Debian helper, the last tool. That's a helper around Maven that just calls Maven away or switches on the offline mode and calls the Maven repo helper to preprocess the POM file. So most of the stuff is quite automatic, so really short rules files. And the difference to Java helper is that Maven is used to build the package. One advantage of Maven is that the directory structure is always the same in all packages and all upstream packages. You always find the source code in the SRZ directory or in the SRZ slash Java directory, this is Java source code. And compiled code is put into a target directory. So the directory structure is always the same, which is quite nice. So we can use other build tools to build Maven packages without Maven. And this is done with the Maven Ant helper. This helper uses Ant to package Maven or to package or to build Maven. It uses Ant to build those packages that are using Maven upstream. We have used this helper to bootstrap also Maven thing because it's too difficult to build Maven packages. And that's why we have introduced this Ant helper which can use Ant to build the basic Maven parts. Yeah, do we have a microphone? I heard someone say at one point, having XML files configure complicated build logic is very yucky and I agree. So I was wondering if anyone here had reached out to people at Gradle because Gradle is being used by some bigger projects now and lets you do some fairly sophisticated things. So I'm wondering what the relationship between Debian and the whole build infrastructure thing is and the Gradle team, if any. We haven't packaged Gradle yet. Nobody has done the task. The motivation behind the Maven and Debian helper stuff is that there are a lot of upstream packages that are using Maven and I think in most cases it makes sense to use the upstream build system with slight modifications and preprocessing stuff but more or less we are using the upstream build system and I think that's often the easiest way because we want to, sometimes the Java code is generated from something else we don't know or the test code is run in some special way we don't know that's why we are using the upstream build system is the most easiest way to build a package, to build a more complex package. I agree with Matthew that if you have a very simple library, Java helper will do it but if it's more complex with sub modules and everything. I was specifically referring to the Maven and helper where you had a structure that was known and you didn't feel like using Maven to build it. It was to build stuff also, it's a Maven infrastructure. Right, and just suggesting that Gradle would be an interesting alternative. Yeah, maybe, yeah. I haven't looked at it, looked at it. Pretty awesome. I know that Groovy is using Gradle, I think. And Grails has moved to it as well. Yeah. Okay, I think, I think it was even the last slide. Something happened to my notebook. So, I'm open for questions. I was saying that with respect to reaching out to upstream maintainers, we can try to get the VM publicity team involved and try to make some... The main thing we should be offering these people are bug reports and even patches and bug fixes. Yeah, might be an option. I think we should try to cooperate more with other Linux distributions because they probably have the same problem as we have. And if there are more than two Linux distributions taken care of Java packages, I think upstream will react to the requests. But that's, yeah, we need to find a way to get in contact with Java maintainers from other distributions. That would be an interesting thing. Thank you very much. Not a question, but it sounded like we, you had stimulated some pretty good discussion around trying to solve some of these problems. Might it be a good idea to arrange a bof or something to follow that up and agree some actions? Okay.