In this talk, I will review the lesson learned (or re-learned) in measuring and improving the performance of his code. The talk will break into two parts. The first part will focus on the measuring of his code with TAU and that the analysis showed. It will cover the overuse of MPI_Barrier and improving the communication pattern. The second part will focus on the replacing how solution files were generated. The code used separate files per task per unknown. A test program was created to show to write a single file using HDF5 in parallel. This single file contained all the unknowns. Even with a relatively small local size of 50K points per task, the test program was able to write 1.5GB/sec on Lonestar. This second part will focus on how to dynamically size the number of writers and number of stripes to maximize I/O performance.