 Hello and welcome. Yeah, we have some amazing talks going in parallel about Cassandra and CloudStack and all the new stuff we built our applications with. But you all have decided to come here and learn something about the Apache Commons project, which is nice. And I hope to show you that Apache Commons still is a very important project for the Java ecosystem. So this talk is called Apache Commons State of the Union. And I'll try to give you an impression of where we are coming from, where the project is now, and where we are going to. And I won't have any code in the slides. So if you came here to see some code, this presentation is not for you. But we have some Apache Commons related talks later in this room about Commons Text and Commons Crypto. And I, myself, will give a talk about Apache Commons Lang and String Utils and that stuff you all probably know. And that will have more code. So this is more an organizational talk. OK, who am I? My name is Benedict Dritter. I'm from Germany. And I work as a software craftsman for a company called CodeCentric. If you never heard about software craftsmanship, I advise you to have a look at the stuff that's on the internet. It's a way of thinking about developing software. It's a value system for developing high quality software. And if you want to learn more about it, meet me after my talk, and I can tell you something about it. I'm a member of the Apache Commons PMC for several years now. Some of you may know me by my Apache ID, which is BRitter at apache.org. I'm mainly a Java and Scala guy, but I also have done some front end work with AngularJS and all the stuff that's fancy now in the front end. I work as an IT consultant. So I go around to our customers and have them develop software. And I've recently become a podcaster. So I'm hosting a German podcast about software craftsmanship and agility. But that's probably for the English audience. Not the right podcast. But if you're also a podcaster, just come meet me and we can share experiences about podcasting. That's very interesting. If you want to learn, or if you want to follow me, or see what I'm doing, you can follow me on Twitter, or follow my GitHub account. I usually only tweet about tech-related stuff. So yeah, do it if you like. Or don't. So here's the short agenda for this talk. I'm going to give a brief introduction of the Apache comments project. Most of the stuff people probably will know, but maybe there's something new in that for you. And then I'm just going through the past. Where's the project coming from? How did it evolve to the state? Where is it now? I'll talk about the present and the challenges we're seeing at the moment at the Apache comments. And then I give a short view of what I think is the future and what are the challenges for the project in the future. So Apache comments, what is that? The projects consist of a number of subcomponents. So unlike other projects, like for example Cassandra, which is a project that only maintains the code base for Cassandra, Apache comments consist of a number of individual components that don't really have that much in common other than they are used for a great group of developers. And we are a community of individuals maintaining a number of components, and that's very important for us. I'll talk about that in more detail when I come to the past and why the Apache Jakarta project eventually had to be split up. So I've put the mission statement of the project on this slide. This is my interpretation of the mission of Apache comments. It's not the official mission statement of the project. So I think our mission is to provide a place for other ASF projects to come together, to collaborate, and to share code. And I have some examples where this has happened already for you. So for example, we have the Comments Compress Library, which does exactly what the name says. It implements some compressing algorithms in Java. Obviously, that won't be as fast as it can be if you do it in plain C or C++, because in Java you always have the index checks when you get to some arrays. And that's what you do a lot when you compress stuff. You have byte arrays and copy stuff around. So Compress is a Java-based compression library. And for example, we have the Apache Maven and the Apache Ant project. And they both need compression algorithms because they ship their both build tools. I think you probably know them. And they have to, for example, create JAR files or create zips with all the distribution files or tar balls. And so they don't want to implement the compression algorithms themselves. So they said, OK, let's share this code at the Comments project. So we have just this one place where we maintain compression algorithms. Makes sense, I think. We have a very recent example, which is Comments Crypto. We will have a talk later. I think after this one, yes. Yeah, there's Dapeng Sun who will talk about Comments Crypto. And it is a library which was implemented originally at Intel. And they needed it to optimize, I think, the communication between nodes of Hadoop and Spark clusters. So they want to communicate via cryptographic algorithms. And Intel started this project to do really, really optimized cryptographics on the JVM. And they brought it to Apache Comments so Hadoop and Spark can use it. And maybe other projects I'm not aware of. Then we have Comments RDF. RDF is, I think, a standard in the semantic web area. It's the resource description framework. So you can describe terms in the web. And that is an integration layer for various semantic web projects. For example, we have the Jena project. That is the one with the two triangle-y things. And we have RDF4j, which is an Eclipse project. So Comments RDF is an integration layer for different libraries which implement the RDF standard. And as you can see, since all the others that were examples for other Apache projects that want to share code, with Comments RDF, we even share code with the Eclipse foundation or Eclipse project. So the question is, how important is Comments not only for Apache and Apache project, but for software development on the JVM in general? And I found some statistics for you on the internet. I made the screenshot, I think, yesterday or the day before. It's from maven.org, search.maven.org, which is a web UI for the Maven Central Repository, which has all the just that Maven downloads and some other tools like Radle and they also use the Maven Central Repository. And if you have a look, you can find some of the Apache Comments components in the top most downloaded artifacts in the last month. And I've taken the liberty to rule out the stuff that Maven downloads for itself. Because if you install Maven, you just have the Maven core and you need to install some plugins and some Utah stuff. So if you rule that out from this statistic, you see that there's only JUnit, which is the testing library. The only artifact that has been downloaded more often than Common Slang and Common Collections. To be clear, you don't know how much of that is because of transitive dependencies. So looking at this statistic, it doesn't mean that these numbers are coming from direct dependencies because people are using Common Slang. It could also be that it has been downloaded because a project has a transitive dependency to Common Slang. But I think it gives you an impression of how many people use it. Then I have this chart. It's from an analysis of projects at GitHub. And they analyzed, I think, 30,000 GitHub repositories. And I don't know how exactly they did it. Probably they passed the POM files or whatever. Anyway, they found out which projects are dependent the most at GitHub. So it's more while the one chart is Maven, Centric, and POM files, Centric View. On the dependencies, this is more like, OK, which code is referenced from GitHub projects. And as you can see, there are no Maven related dependencies, just plain old Java libraries. You probably know a lot of them. Obviously, you need logging in almost all applications. That's why we have SLF4J API on top with 30%. And we also have JUnit again. And if you go up a little bit down, you see Comments.io, which is a Commons library for input output processing. And we have Common Slang again. And the analysis of the GitHub repositories had 13 Apache Commons libraries in the top 100 referenced projects. So that's pretty important for the Java ecosystem, I'd say. And here are the sources for the metrics. If you want to look it up, I'm not making this up. It's really out there on the internet. OK, so where are we coming from? How did this all start, that this project became so important? Probably most of you have heard about the Jakarta project. Who has heard about the Jakarta project? It pops up every once in a while. And people don't know, OK, what's Jakarta? What's Jakarta Commons? What's Jakarta Commons, Lang? Jakarta was founded in 1999. And it was an umbrella project for Java-based project at the ASF. Because in that time, they didn't have any Java projects. Java was just in the rise. I don't know what was the inception of Java, 1995 or something like that. But I think nobody used it in 1995. But 1999 seemed to be, I don't know. I was in elementary school back then. But in 1999, they founded the Apache Jakarta project as an umbrella project for Java-based project at the ASF. And what the ASF learned by Jakarta is that umbrella projects are bad for communities. Because you have this one gigantic project that has a number of sub-projects. And instead of having a big community that maintains all this stuff, you have a community of sub-communities that don't really interact with each other. So that was the reason for the ASF to say, OK, this doesn't work for us. We have to split this up. And we have to make individual top-level projects of all the Jakarta projects. And that split started in 2005. And that's the reason why maybe you still hear about Jakarta. But Jakarta has been moved to the Apache attic, I think 2011 or something. So there was a transition phase when more and more stuff got split out. And yeah, some of the notable Jakarta projects, I have put them on the slide for you. You know Tomcat. There are a lot of talks at the ApacheCon about Tomcat. It's an application server written completely in Java. We had Maven and Arndt on the other slide. You probably know Log4j. That's the old logo of Log4j, Log4j1. It has since then evolved into its own top-level project. And we have Log4j2 now, which has a new shiny logo. Some of you might know Struts, who's done Struts development. OK, Christopher is shaking his head. He doesn't like Struts. But by the time, it was a pretty good web framework, I think. OK, so Struts also originated at the Jakarta project. And you all probably know Lucene, which is a full-text search engine. And there are a lot of other projects, which Java projects, which started at Jakarta. I didn't find a logo for Poi, which is an excellent word processing Java library. And there was also Jakarta Commons. So this project, they already needed a place to share their code. So they had Jakarta Commons inside of Jakarta, which was what Commons is today, a project maintaining these utility libraries. And back from that time, we already had Jakarta Commons Lang and Jakarta Commons I.O. And Apache Commons, the project, as we know it now, was set up in June 2007 with 20 PMC members, which is a lot, I think. And some of the guys are still active today. So when you go on the mailing list, you will find some names that you know from the Jakarta time. And there were some other components like the bytecode engineering library and BSF, the bean scripting framework, which eventually they were Jakarta project on their own, but they joined Commons in the end. So they are also Apache Commons projects. OK, so that's where we're coming from. We had Jakarta and Jakarta Commons. Jakarta had to be split up. And that was the time when which gave rise to Apache Commons. So where are we standing today? Today, the Apache project is structured in three areas. We have the Apache Commons sandbox, which is an area where we play around and try new stuff out, kind of like a little incubator, but not with that restricted process. So any committer can come there and just say, I have a new idea. I want to try something out. And we'll put it in the sandbox. And eventually when enough people care about that code, we make a proper component out of it. Proper is the area where all the production-ready libraries are now. So I've already talked about Commons lang. We have Commons net for networking. We have Commons compress, Commons codec for encoding algorithms. They all are what we call proper today. And traditionally, these have been separate folders in SVN. But since we are currently in the migration to Git, and you don't have like an SVN where you just put a folder structure into the repository, we have split it up and created Git repositories for all the components. So we don't have this folder-based distinction between the sandbox and proper and dormant anymore. So it's just a marker on the website. So if you see something, OK, this is a sandbox component. Don't be too sure that it's production-ready. But the proper components, they are all battle tested. We also have dormant, which is our personal attic, I'd say. So if we have components we don't need anymore or that are not useful anymore, because for example, there are JEE standards, which makes them obsolete. Then we move them to dormant and say, OK, there are no releases coming anymore. Nobody is working on them anymore. And that's dormant. All our components today are Java-based, because as I said, we came from Jakarta Commons, which was a Java project. And all the build processes are currently implemented with Maven. But that's not a rule or something. It's just the way it worked out for us. We have some machinery for Maven. We have our own build plugin and our own parent form, so it makes sense for a new component to set them up with Maven. But I think if anybody would say I'd like to make Gretel, he would have to discuss for three weeks, and then probably he could do something with Gretel if he liked. But not everything is good and nice at Apache Commons. We have some components which are proper in the proper area, but which haven't been released. That's kind of a strange state, because we expect components in proper to be released regularly and be updated regularly. But for example, we have Commons OGNL, which is the object graph navigation library, I think. It has some script language to navigate object graphs. And that was brought to Commons a few years back. But for some reason, development continued on GitHub. So they have the 3.x release line on GitHub, and they are working actively on it. And we have a kind of that repo at Commons for the 4x release line, and nobody's working on it. I don't know currently why that's so. We have Commons Imaging that was previously called Sanselan. I don't know how to pronounce that, to be honest. And my feeling was that they had troubles to get through the release process. Because at Commons, I said we have libraries which are used by a lot of users, so we have to make sure the stuff works. And that makes the release process a little bit tedious. So when I know I want to make a Commons lang release, I know, OK, I need the full Saturday morning to prepare all that stuff. And Imaging, they had some releases in the incubator and then moved to Apache Commons. And they had like, I don't know, 10 release candidates for the 1.0 release, and then eventually gave up. But lately, one of the guys started working on that code base again. So you never know. Maybe we'll see a release soon. Then we have Commons Functor. And I think we rushed into that prematurely because it was set up in, I think, 2011. And the idea was to build a library which helps you to do functional programming in Java. You probably know Google Guava. They have some functional concepts, so you could use the stuff that you have in Java 8 before Java 8. And they worked on it. But we never pushed out a release. And then in the light of Java 8, the question is, do we want to put that effort into that library because we have functional programming in the Java platform? So that's an example of a library that we would probably move to dormant because nobody needs it, nobody's working on it, nobody's asking questions about it on the mailing list. So why would we maintain it? And that's a very important part where I'm working on currently and that is do housekeeping and move stuff into dormant because in proper, we have, I think, 43 components. And they are not all that important anymore. Stuff I've already moved to dormant is Bitwix, which was an XML parsing library. And we have, I think, four or five or six XML parsing libraries at Commons. And this is another example of stuff that is already in the platform because with the XML standard in Java, we don't need any utility libraries anymore because the problem parsing XML has been solved for Java. Modela and primitives are some other examples for components. They were lying around there for years. Nobody committed, nobody raised issues or anything. And I think then that's the time to move that stuff away and put it out of the way and make room for new innovative stuff. So that's the housekeeping we do every once in a while to stay up to date. Another interesting thing we are currently doing is migrating to Git and integrating with the GitHub. I think probably other projects would love about that because they just said, OK, we do Git. Let's do it. But that's not at Apache Commons. We are pretty conservative. So finding condense with all the developers and with all the components, it takes time. And we usually don't do one-for-all decisions. We don't say, OK, all the components have to move to Git now. We just do it on component basis. So if one group of people who are developing, I don't know, Commons Math, for example, says, we want to go to Git, then they can migrate that code base to Git. But we don't migrate everything to Git at once. And that's the reason why this takes very long for us. But all the components, they have Git mirrors, even if they are still on SVN. So if you like Git more than SVN, you can clone our Git mirrors. And everything is mirrored to GitHub because I think GitHub is a real game changer when it comes to open source development. It has really changed the way you are working with your contributors, the workflow, the code review. It's a really important platform, I think, not only for Commons, but for the ASF in general and for open source. To make it more attractive for people to contribute, I found out it's very important to have a nice readme file in the GitHub repository. I don't know if you're using GitHub, but if you put the markdown file called readme-md into the top level folder of a repository, they will render it as a nice little website readme thing. And if you come along a repository where you don't have a readme file, that usually you usually think, OK, nobody's working. They don't even have a readme file. So what we did, we implemented in our build plug-in a generator for readme files. So we have readme files on all the projects or all the components. And furthermore, what's really nice about GitHub is that you can put a contributing file into your repository. And that will show up when somebody plans to contribute code. So if he opens a pull request, he gets this list of things he has to look over. And we have written into that file that they have to sign the ICLA and look through the Jira and create a ticket and all the stuff that you have to do when you want to contribute to a Apache project. A very important thing we did a while back already in 2014 was to open our project for all ASF committers. So if you are an ASF committer, you already have right access to all our repositories. Because I think if we are a place for other projects to come together and contribute, we can't log ourselves to those people. We can't say to them, OK, you want to share code, but you have to show us that you are good enough to bring code to us. I think it's a policy for a lot of ASF projects to have people who are already inside the ASF gain merit for that project. And I don't really think that makes sense, because if I have shown, for example, at the Apache Commons project that I'm a good committer, somebody who knows the Apache way, why do I have to do that again for the next project I want to work on. So that's why we discussed if it wouldn't make sense to grant all the ASF committers right access to our repositories. And that vote obviously went positive. And yeah, we talked to Infra, and they set everything up. So everybody can just come and start working on code, if he likes. One thing to note is that while we are a commit and review community, we like people to announce their thoughts and say, OK, I'm planning to start working on Commons. Here I am. I'm doing this and that. So we don't see on the commit mailing list commits from people we never heard about before. But other than that, there are no requirements for ASF committers. Then we had an event, I think, last year, which was kind of shocking for the project. And that was the fork of Commons math. Commons math is a component which implements mathematical algorithms, algebraic algorithms, optimizations, and a group of committers. And even some of them were in the PMC. They decided that they don't want to maintain that code at Commons anymore. And they forked the code outside of the ASF into a project called Hipparchus. Hope I pronounced that correctly. And left Commons. And we had like, I don't know, two or three people left who weren't invited, which was kind of strange. And the PMC received that. The PMC talked about that. And we think that was a collective failure of the PMC because we didn't saw that coming. We were very surprised by this development. But anyway, this is open source. And it's perfectly fine for people to say, I want to fork. I don't want to be at the ASF anymore. I want to make my own project. So we let them go, of course, because our license allows to do that. And that is a good thing. And we are currently, we thought, OK, what will we do with the math project? Because it's a really big code base and a really complicated code base, where we have to be in this mathematical stuff to really work on it. But the guys who were left, they attracted new developers, which is a good thing. And they are currently in the progress of splitting math up because it has become too big. And we think it's better to have very small and focused components. So if you need something for some random number generators, you would probably only want to use commons R and G, and not commons math, because it brings all the optimization stuff and statistics and stuff you don't really need. They have split out commons numbers, which is a library for, for example, working with complex numbers. And the plan is to split up more and more and more stuff and make it smaller and focused, because hopefully it will make it easier for people to come to the project and collaborate. And to be honest, my personal opinion on this topic is that all the mathematical components, they should form their own TLP, their own top-level project, because this stuff is just too specific. So we have a lot of components where people are working on several stuff. I, for example, I work on Lang. I work on CSV. I have worked on collections on a number of components. But math, it was always above my head. I don't understand that stuff. And I think it's very specific. And that's the reason why they could easily form their own top-level project and then maintain the code the way they like. Another interesting thing that's happened, I think, also last year or at the end of 2015 was the remote code execution vulnerability, which was found in a number of Java projects and where Groovy and Spring and Apache Commons Collections have been seen as the root of the evil, if you want to put it like this. And what they did, there was a group of researchers, I think, and they found a gadget chain, what they called a gadget chain, to remotely execute any code. So they used, for example, Commons Collections. And we have some classes there to decorate maps. So if you have a map, you can decorate it with a transformer, which will, when you access a value, transform that value. So for example, you have a map of string to string. And you want to have the length of the string. Then you would need a transformer from string to int, mapping that value string to the length, for example. So you have a map of string to int. But you can put any code into inside this transformer. And for example, you can access the runtime and say, accept this command and then do whatever, rm-root. And the problem was that this transformer, it was serializable. So you could send an instance serialized over the net. And if the other side deserialized it, it would execute this code, because it was inside that. So there was a big fuss about this. And people said, everything is going into pieces. The world is burning, because Apache Commons Collections is unsecure. But our take on that really was, OK, if you accept binary stream and deserialize it without sanitizing it, that's probably your problem, not our problem. What we did anyway is we deactivate, we added a feature toggle to deactivate deserialization for Commons Collections and the affected code. But the funny thing is, on the last Apache con in Sevilla, there was a guy from Vera Code talking about this topic. And he made an analysis of who has upgraded to the patched version. And he found out that while we had the patch out, nobody really updated. So the vulnerable library is still out there, and people are not using it. So what can we do? We provide a patch, but nobody's using it. OK, so that's kind of where we are standing now, what the tasks are that we are working on. So let me get briefly over what I think is the future of the project, what our challenges will be. And I think we have three main challenges. We need to stay up to date with what is happening in the Java ecosystem. I've put a picture of a jigsaw on that slide. And you probably have heard about Java jigsaw project in Java 9, where they introduce a new module system. And well, if you want to use Apache Commons Lang in your jigsaw-based application, we need to provide you with the meta data so that you can use it. So we have a very, very lengthy thread on our mailing list on how we need to evolve our components, how we can provide that meta information that is needed. And we haven't come to an end, I think. And at the moment, I don't really know what's going on with jigsaw, because I think it has been, the vote has been negative from the community process. I don't know what Oracle will do with it. Maybe they just do it anyway, because they like to screw things up. Oh, OK. Well, repeat that for the recording. It looks like it has been, they voted it down. Red Hat and IBM said they don't support it for Java 9. OK. Thank you for the comment. So but anyway, we have to follow new inventions in the Java platform and have to stay up to date. But on the same side, we have to stay stable, because so many other projects are relying on us. We can't break this stuff. And that's really challenging on the one hand to introduce this stuff, which is only available in Java 9 or Java 10 or whatever, but also support projects which are still on Java 7, for example. We even have libraries like comments logging, I think that requires Java 1.2. So and we don't have any plans to update that, because it is used in such a wide area of application service and other projects. So that's kind of tricky. And the last challenge is to stay up to date. I've put, as an example, the logo of Scala, which is an object functional programming language for the JVM and JavaScript on the slide, because I think although we started as a Java community in the Jakarta project, I said my view is that we should be a place for other projects to come together and share code. So if we say we only do Java and you look at the language statistic at the ASF, we don't only have Java at the ASF. We also have Scala code and we have C code and whatever code. So if you want to be this place for the other projects, we have to open ourselves to other languages and to also stay relevant, because I think Scala and JavaScript, they are very relevant languages at the moment. So maybe we should be the place for that languages as well. And that may be a chance for even attract new people out at the ASF that haven't found a place to maintain their code. And with that, I come to the end of my presentation. Here is my contact information. If you want to reach out, the best way to reach me is to send me a message on Twitter. I usually answer that. And I thank you for coming here and hear about the Apache Commons project. And I wish you a nice conference. Thank you.