 Welcome to you all to my talk about Apache Commons beyond string utils. We have already seen over the course of the day what's beyond string utils, because we had the talk by Rob about Commons text. And Dapeng soon has given a talk about crypto. But nevertheless, I'd like to show you what there is in Apache Commons that you might not already know. So this talk originally, I developed it originally for what we called Apache Roadshow. That was a tour throughout Germany, which I did with four of my colleagues, where we went to several Java user groups and presented our Apache projects. And everybody had a lightning talk where you could talk 10 minutes about this project and explain what the project is about. And since I believe people who go to a Java user group already know what Apache Commons is, I thought it might be a good idea to give it a different twist and talk about the stuff that maybe the usual Java developer does not know. So and then I extended that talk, and this is what you're hearing now. So who am I? My name is Benedict Ritter. I'm from Germany. I work at an IT consulting company called CodeCentric. I'm a member of the Apache Commons PMC. Some of you may know me by my Apache ID, which is the Ritter at apache.org. I do Java and Scala at work, and a little bit of front-end stuff. Can't get your way around that most of the time. Yeah, and I've recently become a podcaster. So I'm hosting a German podcast about software craftsmanship and agility. So if you are in the podcasting sector, meet me after my talk and we can share experiences. And if you want to follow me or see what I'm doing, you can just follow me at Twitter or visit my GitHub profile. So enough about me. Let's have a quick poll who has never heard of Apache Commons or has never used one of the Apache Commons components. That's good. So I don't have to explain what Apache Commons is because there was the State of the Union talk in the morning about Apache Commons where I talked about that. So I can keep that short. And we'll dive right into the agenda for this talk. I'll give you a very brief introduction of the project because although most of the people know it, there are some details about Commons that probably not everybody knows. And then I'll talk about selected components and show you some code examples of stuff you might not already know. And in the end, we have some time for a question and answer if you'd like to ask something. So first question, who is this presentation for? Yeah, mostly JVM developers because the Commons libraries they are all Java, all written in Java. So you could easily use them in Java of course. You can also use them in Scala or Clojure or any other JVM based library but you will have a hard time using them in Rust, for example, it wouldn't make sense. The talk is for beginners, intermediates and pros. I think there's something in it for everybody and maybe for potential contributors. So if you're using Commons components already and you're thinking about giving something back to the community, meet me after my talk and we can talk about that. Why is this talk important? This is what I call Ritter's Law. It's my law, I've made it up myself. And I think everything that is not part of our domain or business logic, it has already been implemented by something much smarter than us. So every technical, plumpering detail, it has already been implemented. Parsing CSV files, it has implemented. You don't have to implement your own CSV parser, for example. You don't have to implement your own networking protocol. That's not part, at least if you're an application developer, a business application developer as I am, that's not part of your job to build this stuff. It is already there. Our duty is to solve business problems and focus on that. So there's also an extended Ritter's Law, which I learned after I got into IT Consulting and that is everything that's not part of your core domain or business logic or the crazy legacy system you happen to need to integrate, it has already been implemented. Because sometimes there are these crazy systems that you have to talk to and usually there's no library to talk to that system and that is the case where I would say, okay, you have to implement this technical plumpering. But other than that, my take on that is you should focus on business functionality and not on input output or codec or compressing stuff. There are libraries for that. And Apache Commons is a project that maintains a set of useful libraries. So let's talk about the Apache Commons project. Where does it come from? I've talked about that in length this morning, but I'll give you a short recap of that. There originally was the Jakarta project at Apache, which was an umbrella project for all the Java-based projects back in the early 2000s. I think 1999 was inception year of the Jakarta project. Apache looked much different than today, so they don't have that much Java project. I think when you look at the language statistics today, most of the projects or a good bunch of the projects are Java or at least JVM-based. But back in the days, there simply haven't been any Java projects, so they thought, okay, what is Java? Let's just make a project and put it all into that. And that was Jakarta. And we had projects like Tomcat and Maven and Ant and Struts, they all were Jakarta sub-projects. And inside Jakarta, they needed a place to share code among these projects. For example, Ant and Maven, they both needed algorithms for compressing stuff, so they created Commence Compress to put the algorithms to creating tar balls or zip files. And what the ASF realized is that umbrella projects, they don't work very well because an umbrella project, you have a group of communities that are pretty isolated from each other because you had the Jakarta-Tomcat community and the Jakarta-Maven community. And then the question raised, okay, why do we need this umbrella at all? Just why isn't Tomcat just Apache Tomcat? And that was the time when Jakarta was split up into several top-level projects and so Jakarta Commence became Apache Commence. Yeah, and it's the successor of Jakarta Commence. So what is Apache Commence Mission? This is my mission statement, it's not the official mission statement that's on our website or something, that's my view on things, what I think what we should do, and we should provide a place for other ASF projects to come together to collaborate and to share common code. So if any of you are already committers in some project and you think you have some useful code that could be useful for other ASF projects, just talk to us, come to our mailing list and we'll see if it fits into Commence. A quick overview over Commence. Most of you probably have heard these terms already. At Commence we have an area we call proper. These are the major components like Lang and IO and math and net and codec, all the stuff you already know. So if you take one of these, they are usually pretty battle-tested and you can rely on them. Then we have the sandbox, which is a playground for trying out new stuff. Whenever somebody thinks this might be a good idea, I just code it up, they can create a sandbox repository and just do it. I think the sandbox comes from a time where we didn't have stuff like GitHub where you could easily share your code and people needed some infrastructure to host code and share it with other people. So today, usually when somebody wants to try something out, he just creates a GitHub repository. But we still have this sandbox and yeah. And there is Dorman, which is kind of the attic of Commence. So at the ASF we have the attic project and whenever a top-level project isn't any useful anymore or there's nobody interested in maintaining it anymore, it is moved to the attic and we have kind of the same thing at Commence, which we call Dormant and we move components that nobody needs anymore to Dormant and then there's nobody working on them anymore, there are no more releases. And yeah, that's it. That's the end of the lifecycle of a Commence component. So it's time to have a look at code. That's probably why you're all here. So I'm going through Commence lang and probably everybody of you has already used Commence lang. So I'm starting with a very simple example and this is the well-known string-utils-is-empty method, which you can ask if a string reference is null or empty. So what you see in the first line, the string equals null or string is empty. You can combine that and just call string-utils-is-empty, which is a good thing because you get rid of some of the duplication you have in the first line because you know the problem with duplication. It's almost, in every place it looks alike but then you have the place where somebody doesn't call string-is-empty but called string.lang equals zero and the next one compares to an empty string and you just don't want this diversity in your code base. So you just use the is-empty method. But let me give you a craftsmanship advice or a design advice. I think if you have an overuse of this is-not-empty, is-not-blank string-related methods that may be a sign of a code smell inside your code base because it seems that you're encoding a lot of information inside of strings on the one hand so you're working a lot with strings so that might be a problem because you're not building higher-level concepts instead of using strings. And the other problem might be that you're having an animatic domain model so you probably have a lot of beans with gatters and setters and because these beans, they can't check their own state. You always have to check, is this property really set or not? So think about a customer which has a customer ID which is a string for whatever reason and because there's a getter and a setter for the ID, every time you get the ID, you have to check, have to make sure that it's really set. So that's bad practice, I'd say. It would be better if a customer only makes sense if it has an ID to create a constructor, pass that ID to that constructor, make sure it is always set so that you can be sure when you get an instance of customer that there is always an ID. Just an advice, sometimes when I look at code bases that overuse these methods, that's a sign of these code smells. Okay, but let's get beyond string utils and as you can see, there are a lot of utils in common slang already. So for nearly, for a number of the well-known Java classes, there are accompanying util classes. We have a utils class for arrays, for example, which is pretty cool because it has methods like add an element because in Java, you know, arrays are fixed size, so if the array is full and you need to append another element, you have to allocate a new array which has the size plus one and copy all the stuff from the old array to the new array and put the new element at the end. You can do that with one method call when you're using array utils. There are several other utils, for example, date format and date utils, which I'd say you shouldn't use anymore because we have the Java 8 time API, which is much, much better. But still, if you need to use date because you are talking to some legacy API or have another library that needs dates, then you can have a look at date utils and it hopefully will make your life easier. We have string escape utils, Rob already talked about that. It has been since moved to commons text, so we are in the progress of splitting commons lang into several more focused libraries and commons text is the first one of these libraries which is related to text processing algorithms. And word utils has already been moved? I think yes, no? But we are about to move it to commons text. Yeah, okay. So let's have a look at one of the utils maybe not everybody knows that is system utils and system utils provides very easy access to the Java system properties. So you all know when you run a Java program, there is the runtime which is represented by the system class and you can ask it for properties and can introspect the environment you're running and you can ask it what operating system am I running on and what architecture am I running on. The question is why would you do that if you are writing Java because the idea of Java is that you don't need to know that but anyway, you can ask your runtime which operating system you're running on and you always have to do a look up by a string key. So get property for example, I think os.name and it will return you the name of the operating system but in system utils we have some constants that make it easier for you. For example, if you wanna know am I running on Windows 10 then there's just a Boolean constant. So you don't have to do this look up and get the string right everywhere, just use the constant. Another example is the Java home which is the folder of the Java installation. You can look it up by systemgetpropertyjava.home but you can also use our constant for that which does exactly the thing but I like to have constants in my code. I don't like to have string keys that are passed into the get properties method so I like this more. Another example is the getJavaHome method which returns a file because if you're asking where's my Java home, give me a string, you probably want to access that directory or that location so it would make sense to have a file object. So that's why we have the constant and the method. Another cool thing is the which Java version am I running on that's often useful when you're writing libraries which have to support several Java versions and you can ask systemutils if you're running at least on the given Java version. For example, here I've added an example where I'm asking it am I running at least on Java seven so can I use the features introduced or the classes introduced in Java seven. And why do I need it? This is an example I took from the integration test library of the integration test project of the Maven Surefire plugin. Maven Surefire is the plugin that runs tests in Maven and they have an integration test project and because I'm working on the JUnit5 support in Maven Surefire I need to make sure that a test is running on Java eight because JUnit five requires Java eight but Maven Surefire I think it's built on Java seven, Java six I don't know exactly. Anyway, I can only run those integration tests if the tests are run on Java eight and this is an example where there is Java version that least method came in handy for me. So maybe you are writing an internal library and you also need to support several Java versions. You could use this one, this method for checking it. Then we have builders for the common methods defined in object. I think this is another example of an API that a lot of people probably know because it gives you a helper for implementing equals for example. You know when you implement equals in Java you have this kind of preamble that you have to add in front to make it more efficient, let's say it like that where you say okay, if the object you are giving me is null then I know it can't be equal to me because I'm not null. If the thing you're giving me is myself then it's equal. And the next one is if the thing you're giving me has a different class, it can't be me. Interesting detail, I'm checking that get class unequal, this get class instead of checking instance of any idea what might be the difference between that. Yeah, you're. Yeah, right, that's exactly the case. The question is can classes that are inheriting from my class be equal to me because I cannot check their properties. So if the subset of the properties that I inherit to them are equal I would consider subclasses equal if I put an instance of check there and the question is that really equal or not? So I usually do it like this because I say okay, if they don't have the exact same set of properties they can't be equal to me but sometimes that's also something you might consider for your application, for your domain model, whatever. And then we come to the stuff I wanted to talk about the common slang classes and that is the equals builder which you can pass values and it will compare them for you and you don't have to make sure that for example if this name is null but that name is not null if you implement it by hand you have to do all sorts of if else control flow to make sure that if both is null it's equal if one is not null it's not equal and stuff like that this is all already implemented in the equals builder and there are builders for hash code and compared to as well so you could use that if you like. Then we have the str substitutor which Rob has already talked about we have moved it to common's text but it's interesting nevertheless and it's still part of commons although it's not a no longer part of commons slang and you can think about it as a template engine in one class so you probably have seen you probably all know Maven properties which you can resolve in your POM file by putting their name into a dollar and curly braces notation and the string substitutor does exactly that and here in this example I'm substituting system properties again I'm using the Java version system property and the os.name system property and when I call this replace system properties method and pass it a template it will replace this properties for me but I can also add custom substitutions and here I'm creating a map with just one key value pair the key is custom key the value is some value and then I create a new string substitutor with these values to replace and then I can say str substitutor replace and pass it my template and it will replace custom key with some value and the second one that is the syntax for adding forebacks I think it's the same way as in bash any Unix pros here in bash you can also use this double colon minus syntax if you want to define foreback values and what I'm seeing right now is that this slide has an error there are the dollar signs missing but that is configurable in string substitutor so if you want to have other markers then you can configure that if you want and we have so much more in commonslang other than string-utils because I always get the feeling people know string-utils and that's pretty much all they use from commonslang so why do you pull in this library if you only need this one class you can just copy it or whatever but we have mutable variants of the primitive wrapper types so you know when you create an integer object in Java that is a mutable you can't change it anymore maybe you need a mutable variant of that so we have the mutable integer so you could ask why would I need that that comes in handy for example if you have anonymous implementations and you want to pass values between the autoscope and the inner scope because when you want to use something inside an anonymous code block all reference to the outside have to be final so if you want to pass an integer out of that how do you do it what people often do or what I have seen often is use an atomic integer because it's kind of the same it's a wrapper type for an int value which you can set and get but that feels strange because an atomic integer that has something to do with concurrency and I wouldn't expect that if there is nothing about concurrency going on so I find it more clean or I find it cleaner to use for example a mutable integer we have mutable and immutable pairs and triples so if you know Scala for example then you probably have worked with tuples a lot think of it as two values that you don't want to give a name for example a point in a coordinate system you could express that as a pair if you like we have contexted runtime exception and contexted exception which is pretty cool because often times when you have code that needs to access some API which throws checked exceptions for example the file API the IO API from Java you always have to catch this IO exception and now you're down there 10 levels deeper working with the file and getting this IO exception and you don't know what to do with it so what do you do? you lock error or something bad happened return null so because the problem is at that point in the code you cannot there's nothing you can do because you don't know from where you have been called and the code weigh up your application for example the controller that initiated this whole processing he knows how to signal this error for example by returning 400, 500, I don't know or showing something to the user but the controller doesn't have those details that are down below so you can use a contexted runtime exception and pass context values of the failure you just happen to see for example the file wasn't there or the file was not readable or the file system was not available or whatever and you can use that exception and pass it up the whole stack so instead of catching that IO exception and logging an error and returning null it's probably better to catch that exception and then throw a contexted runtime exception up the stack the good thing about that is when your method which is accessing the IO API returns you know that it has worked because otherwise there would this runtime exception been thrown that's good style I think and we have the text translation and escaping which has already been moved to commons text but it originated in commons lang okay I need to hurry up a little bit commons lang and java 8 and 9 what about that we have commons 3.4 which requires java 6 and we are working on 3.5 which will require java 7 and we have discussed what about java 8 should we switch to java 8 but we think that is a topic for commons lang 4.0 sorry ah okay sorry then just add one so java 8 is not a topic for commons lang at the moment and about java 9 the code builds and runs on java 9 because we are not doing any magic stuff which is permitted in java 9 okay let's have a look at another component this is commons csv which I have worked a lot on and which I like a lot because I think all of you have already implemented their own cfv parser right everybody has and usually you hear something like this we don't need a library for this it's so simple we just have to do a split at the comma and then we're done right ah at first it sounds that easy but the problem is that csv files they are standardised in rtf 4180 but there are a lot of variants so unlike xma which is pretty pretty strictly specified csv the standard just says okay we have a delimiter character for our for our entries and we have a record separator that's it and then comes microsoft and says okay we have excel there's it's different we have a semicolon and then comes mysql and says our export format is this and that and this variants they differ in in so many aspects so you have different delimiters you have different line break characters you have quoting for example when you have a have you you define a quote character and say okay when i start when i quote something by double uh... quotation marks for example then please ignore all the delimiters you will you will see uh... until the next quote in character we have escaping so if i have a quote and i have a backslash before that quote then that is that is not quoting that is just something that is escaped some formats have had a have a had a record and others have not and all of a sudden a simple split becomes a real nightmare and another problem by your uh... by our simple uh... split by the comma solution is that it's terribly slow because it's using regular expressions under the hood and you don't want to pass uh... one gigabyte file using regular expressions so what we do with c s we we have an expand uh... extended buffered reader and we go through the stream and do it uh... character by character and looking ahead and looking back and yeah that's that's the trick behind that it's pretty fast uh... we have already predefined some some of the well-known formats like axle and my school and t d f which uses tabs to delimit entries uh... and this is what it looks like so you just create a new file reader pass it to the appropriate format and then you can iterate over the records and uh... important thing is that we don't we don't read the whole file into memory and give you an interval of the records but we just read as as far as you have iterated so whenever you get the next record we start reading the next record which is again a good thing if you think about uh... how big this files can be and uh... we hand you a record and you can ask to act the record for the header name for example if you have a a column with a header last name you can say okay give me the value of the last last name column and it will give you a string uh... of course you can customize the the formats you probably have that one system that sense you an export and it's of course not like all the other CSV formats so here's your API to define your own CSV format you can just derive it from one of the predefined formats uh... using uh... fluent builder API uh... you can ignore empty lines for example you can set the header uh... there's also uh... the possibility of detecting the header if you just call with header without any arguments it will use the first line and use that as header uh... a lot of formats they have the header inside that make sense to just use that first line and what i really like is to define the header using an enumeration and uh... because again i don't like strings in my code so i'd rather uh... define an enum and uh... you have that to access my my record uh... values as you can see it looks pretty much like the example we had before but instead of getting uh... values by name uh... we we pass an enum value and use that internally it uses the two-string method of the enum so yeah so in conclusion uh... task which uh... may seem simple in the beginning can be more complex than we think uh... remember don't reinvent the view the real even for simple things simple things like csv parsing remember my law uh... our uh... duty is to implement business solve to solve business problems and apache com and csv is easy it's customizable and it's really really fast uh... but it has some limitations uh... for example printing does not support all the features we currently support uh... for for parsing so you cannot you cannot only read csv but you can also print csv with uh... csv library but we don't support crazy lineback break characters for example so if your line break character is an exclamation mark for example uh... we don't have type conversions you always get a string because csv is uh... text-based format if you want to convert that into something else you have to use that string and do the parting and for example a number or date you have to do it yourself uh... and on top of that we don't support mapping into java objects so all we give you is this record this low-level representation of that file and whatever you do with that it's uh... your your job and we have been asked to uh... create some thought of csv to java bean mapping but uh... up until now we have decided that's that we think it's not part of this library so if anybody wants to create such a library go ahead it's a patchy license uh... have fun okay last component i'd like to show which probably not everybody knows is the uh... command line interface uh... interface uh... library apache commons c l i and the motivation for that or why why i think it's useful is uh... what i want is and nice i'm writing uh... command line tool for example like maven and i want a nice help system and i want help with with parting the arguments and i want to be i want to have it all formatted but what java gives me is this so as you know uh... the main method is called by the runtime and when when you have compiled your program and it gets all these strings that are behind your your program basically uh... in a string array and then you can you can go you can can start uh... implementing your head system and your your command line argument pausing and that's pretty tedious because imagine what you have to do if you want to uh... build up the uh... command line ap that maven has just on the basis of uh... of of string of a string array that's pretty hard so uh... we have common c l i for this and common c l i is uh... used in inside of maven to implement the command line uh... option pausing and what you do is you just define just describe which options your program has so for example here i have added a help option which has a short short name of age so if i do a minus age that will print the help and i have a long option which which i can uh... call with minus minus help uh... and that will display display uh... the display the help is the the short short description uh... but i can also create more complex uh... options the second line shows you how to create a log file option which has an argument and arguments name is file and it also has a description which says use given file for log okay so we have two options age help or age and l for log file and then we repass the command line the arguments array we get from the runtime we put it into the the parser and say okay pass it and we get an object describing the command line and this this command line can ask it do you have an help a help option set has the user uh... specified help as an option and if that's the case i have a head for matter which is all well and also part of common c l i and i can say okay print me the help this program and otherwise i can ask it do you have a file option uh... should be uh... log file option sorry uh... and then i can can get that value out of the command line and what it looks like i've coded up uh... you can look it up in my guitar repository it looks like a leg uh... exactly like that remember the code example i had two slides before we had the age and help option and we had displayed the help as description and we had the l and lock file option which had an argument called file you see that in the uh... third line and use the given file for lock as description and that's pretty neat i think uh... okay so uh... final words and i'm little bit ahead of time but that's okay uh... final words these were just a handful of the more than forty components we have at apache commons and there's so much more which could be useful for you so take the time look through the list and check out the stuff we have there there's for example commons codec if you need to encode stuff using different algorithms we have commons crypto that uh... dapeng has already talked about so if you need a cryptographic algorithms and you need them really really fast you can have a look at that we have commons imaging with which is a library for uh... image manipulation we have commons pool which is below the comments database connection pool pool object pooling library we have rdf if you are in the semantic web sphere and work with the resource description framework you can have a look at that we have commons r and g which is the random number generator library and we have a lot of uh... lot of component a lot more components and it's all online uh... commons apache dot org uh... have a look and have fun and with that i think this is the end of the the first day of apache con so if there are any questions then okay no questions okay uh... the question was what is the reason that uh... c s v doesn't implement a mapping directly to objects uh... i think the reason is there are several reasons first of all nobody had the time to do it yet and uh... we try to create very focused and kind of low-level libraries and we believe that the uh... that c s v is is about passing files text files so if you pass a text file you get a text and not a java customer whatever object so i think yeah that's probably the main reason it's we we don't feel that it belongs into that component but as i said that would be no problem to implement it on top no more questions so okay i wish you all a nice night have a beer meet me at the bar and yeah see you tomorrow later