 I don't know if it's just me, but it seems like there are far few people here this morning than there were yesterday. Maybe it was everybody had too much fun last night. Hi, my name is Emily Stolfo. I work for MongoDB. If you've ever used Ruby with MongoDB, you've probably used a couple lines of my code. Through the gems, MongoEid, MongoBison, Bison Extension, Origin, different versions of Mongo, Mongo Kerberos, if you ever feel like doing authentication using Kerberos, I think all the downloads I have on that gem are me just testing it in my testing environment. So I'm coming from Berlin, but I come from New York. I've been in Berlin now for three years, and I've been working there because I originally went there to work with the other person who built the Ruby driver. And before I start, I want to thank the organizers for having me. I've never been to Singapore before. I arrived a week ago, and I think I've sweat my weight in water every single day. It's kind of like Bikram yoga. I feel really good at the end of the day, really cleansed. And I brought omiyage for the organizers, but I also brought omiyage for you to thank you for coming here at 10 a.m. on the second day of the conference. So I brought, so I can't bring something for everybody, but I brought these amazing artisanal mustard from Germany, and anybody who's German is disqualified from this competition, by the way. So these are little mustards from a little man who has a mustard shop next to my apartment, and I've hidden three Sani Metz quotes in my talk, and if you can email me with one of those three quotes at Emily at MongoDB, I'll give you a little mustard. So the first three people basically is to trick you into paying attention, and also so they can steal someone for a quote without feeling guilty. And they're all different types, so Okay, so this talk is called Refactoring Humpty Dumpty Back Together Again. So because it's 10 a.m., and there's no better time to talk about physics, I'm going to start with the second law of thermodynamics. Specifically, the second law of thermodynamics accounts for the direction of natural processes. We've all heard of this, right? Okay, no. Well, good thing I'm telling you about it. The law says that it's highly unlikely, though not impossible, to restore a system to a previous state. It accounts for the asymmetry between past and future. In modern times, this law is defined in terms of entropy. We've all heard of entropy, right? Yeah, more so than the second law of thermodynamics. It's kind of abstract, but it basically is the measure of the number of ways in which a system can be arranged. Measuring entropy is taking to be the measure of disorder of a system, the higher the entropy, the higher the disorder. And usually it's depicted like this, where it requires a certain amount of work to take something that's in a high level of disorder and make it orderly or restore order. So once upon a time, there was this egg named Humpty Dumpty. And his story was told in this nursery rhyme. Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall, all the king's horses and all the king's women couldn't put Humpty Dumpty back together again. Has anybody heard of this? I bet a lot of British people have heard of this. So this particular nursery rhyme is the most well-known nursery rhyme in the English language. And it references to it can be found in many works of literature and frequently in popular culture. I think there's the character in Shrek, who's Humpty Dumpty, in one of the Shrek like number 15 movies that they've had. And but the first recorded version dates from the late 18th century England. Like many traditional stories or poems, it's pretty much impossible to pinpoint what the original version was, what Humpty Dumpty actually was, or to take the poem literally. For example, we have other versions of the poem. This is the actual first recorded version published in 1797, but we have no idea if this existed way before 1797 or people just learned how to write in 1797. Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall, forescore men and forescore more, couldn't make Humpty Dumpty what he was before. So Humpty Dumpty, there are clearly many other versions throughout popular culture, throughout history, but what we can't ignore is that Humpty Dumpty is always depicted as an egg, despite the fact that there's nothing indicating in the poem that he actually was an egg. My favorite is that woman dresses an egg who's really chic sitting on a wall in the corner. It's likely the rhyme was originally a riddle that could have exploited a well-known meaning of the term Humpty Dumpty at the time. For example, the Oxford English Dictionary says that the term Humpty Dumpty refers to a drink of brandy boiled with air, ale. And I don't know about you, but when I drink my brandy boiled with air, ale, something magical happens and it starts seeing eggs. Perhaps the rhyme was equivalent to the 17th centuries don't drink and drive propaganda, warning you about sitting on walls after you drink. But still, why an egg? Perhaps it was meant to convey that whatever it was that sat on that wall, it was extremely fragile and virtually impossible to put back together. So as I said, there have been many other theories, many other versions, and one of the ones that I find kind of funny or absurd is that was put forth by this scholar, I don't know what he was a scholar of, but I guess he spent his time trying to figure out what Humpty Dumpty was in the 50s, and he said that Humpty Dumpty was in fact a tortoise siege engine, which is this kind of machine battering ram that was invented by the Romans and used unsuccessfully in the English Civil War in the 1600s. And apparently it was used and the thing broke without breaking the thing it was trying to break. And so they wrote a poem about it. I don't know about you, but that sounds really silly to me. I think I like the idea of an egg better. This theory was eventually determined to be totally ridiculous, but that idea was incorporated into a children's opera called All the King's Men, so it's just as true according to popular culture as the other theories. So whichever form Humpty Dumpty takes, what can't be ignored is that he's a fragile guy. He's actually become a sort of symbol for the second law of thermodynamics. Humpty Dumpty fell from the wall and subsequently ended up in pieces. As we've discussed, the law says that it's highly unlikely, though not impossible, to restore Tim to his exact state before the fall. And this is what the poem also emphasizes. As we also discussed, the second law of thermodynamics, modern definition, is in terms of entropy. The measure of the number of ways in which an isolated system can be arranged. Specifically, assuming for simplicity that each of the microscopic configurations is equally probable, entropy of the system is the natural algorithm of the number of configurations multiplied by the Boltzmann constant kb. This is theoretically how we can measure entropy, but nothing ever is like, you can't have a system where all the arrangements are equally probable, so this is highly theoretical. We can also find some examples of things that were broken and that have been returned to their original states with help. The Beauvais Cathedral, which is located in Beauvais, France, 60 kilometers north of Paris, is a symbol of the ambition of Gothic architects. The pet project of a wealthy and disaffected bishop of Nantuy, the construction of the cathedral may have been partly intended as an act of defiance against the French crown. So basically the bishop was a punk and he wanted to prove that he was better and more powerful than the crown by building this massive building and you'll see that it was a total disaster. The whole project was extremely unrealistic and the cathedral was never finished. Construction was started in 1225 and it was meant to be the greatest church in the kingdom, but centuries of construction were marked by structural problems and collapses. If the nave, which is the main body of the church, so cathedrals are normally shaped like a cross, so the nave is the main body and all that was actually constructed is the tiny portion at the top, like the head of the cross. So if the nave had been constructed, the plans for the cathedral were such that it would have been the tallest building of its time. The foundations in order to support this massive structure were in some places 10 meters deep. Even so, in 1284 part of the choir collapsed, which is like the front of the cathedral that was actually constructed, then the transept actually don't know what part of the cathedral that is, I forgot to look it up. This other part of the cathedral was started 150 years later and was completed in 1548. Then shortly afterwards the spire and half of the bell tower collapsed on ascension day during a service and apparently nobody was hurt. In 1600 the construction of the nave, so that main body of the cathedral began again, but only the first art was erected and they gave up. In the 1990s, because this became such a symbol and such a look into like the buildings that that exist today from this time that were great engineering feats by definition, were great engineering feats because they're still around today, but this one's a look into how these projects can be started and failed because of ineptitudes or overambitious people. So in the 1990s, like we really want to preserve this building and in the 1990s it was determined to be so immensely unstable because the pillars had been measured to have moved 30 centimeters and they wanted to do something about it so this building could still stand. So why is it so unstable? Why is it so weak and why was this project so difficult to be realized? The building is a perfect storm, poor architectural plans, different architects hacking on the same building, no real ownership of the projects, architects coming and going over the centuries, which by the way means they have much different styles, and the fierce gall force winds that come from the English Channel that are less than 100 miles away. So basically the cathedral might as well have been made out of paper mache. It's on the World Monuments Fund list of 100 most endangered sites, but today the cathedral is more stable than it has ever been thanks to a team of researchers from Columbia University. So what did they do? They did what you would expect someone to do who needs to repair a weak structure. They study the structure. So in 2001 a team of Columbia University from Columbia University went to Beauvais to acquire 3D range scans and imagery of the cathedral. The goal was to create a 3D model of the cathedral to assist historic preservation efforts including structural analysis of the cathedral. So for 10 days they roamed around the cathedral using instruments to record digital images of its facade and interior by bouncing litter beans off its surface. They returned to New York City with 75 of these scans each one containing more than a million data points. And remember this is 2001 so 16 years ago and at the time like we could probably do that with our iPhones now but at the time this was the largest structure to ever be scanned with that yielded the most amount of data. And this is a combination of all those scans from the data that they collected. So here's the flyover of the cathedral. This is what the image that they were able to collect looked like and as you can see it's only a small portion of what the original cathedral was meant to be but the structure is really large and complex and has a lot of cavity. It's not just like a block you know like there's a lot going on in this cathedral and then this is the inside so I did my undergraduate education in art history and computer science and actually took this professor's class and he showed us this and I was like super excited because I was like this is why I'm doing both of these fields because you can do things like this and preserve cultural heritage and so just as an aside the reason this cathedral was meant to be so large or like what motivated that was Gothic architecture part of its principle was to especially with cathedrals was to elongate the structure so you felt closer to God and you had this sense of being in this infinite space and so that's why the bishop was particularly hubris in doing this because he was trying to bring himself too close to God so he was flying too close to the sun. Because of the model that the team of researchers was able to create the support beams have been able to be installed in the right places restoring stability to the cathedral and allowing visitors to appreciate the ambition and engineering of the Gothic builders 700 years ago and also for academics to study how this project was started and failed. So what do the Bove Cathedral and Humpty Dumpty have in common? Both were in need of being put back together for stability to be re-established. So this system in particular has been restored to better order and stability because as we said it's improbable not impossible. Furthermore what if we aren't interested in restoring a system to its original state? What if we want to alter it arranging the pieces to make it even better? What if breaking something allows you to rearrange the pieces so that can be even more structurally sound? Does this sound familiar to you? Well it certainly sounds familiar to me because otherwise I wouldn't be doing this talk and it's something I've had to think a lot about lately particularly with Mungoid. So recently I had to study the structure of this project, break it a little and then rearrange the pieces that was inherently stronger. I'd even argue that I defied the second law of thermodynamics and the entropy has been decreased in this system. Who would disagree that their project's entropy increases over time? So who thinks their project's entropy decreases over time with no work? Right. So I maintain active records replacement for using MongoDB with Rails. It's called Mungoid. It's actually 10 years old which is basically 700 years in cathedral years. The first version of Mungoid version 0.2.5 was released by someone who's now my colleague, Duran Jordan. He's the original author and by the way on the original documentation site of Mungoid he says Mungoid was conceived one late night in February in somewhere in Florida after five glosses of whiskey and that's like pretty much the theme of like how Mungoid was built. Just like someone on whiskey version. I mean I love Duran. He's amazing but we're talking about Duran ten years ago. Version 0.2.5 was released by Duran on October 1st 2009. Version 0.2.6 was released on October 1st 2009. Version 0.2.7 was released on October 1st 2009. Does this sound like any cathedrals you know? The Mungoid to be server version at that time was less than 1.2.0. I actually don't know what version it was because in our project tracking tool the earliest version recorded is post Mungoid's first release and so for reference Mungoid to be server version the current version is 3.4. So they get 1.2.0. It was still in this phase where we had this feature that it dropped your data. So anyway Mungoid continued to be developed by Duran and also by the way Mungoid to be doesn't drop your data. I don't know if you've like run anything in the last five years but we've solved that problem. Anyway Mungoid continued to be developed by Duran who is working at SoundCloud in Berlin in his free time. It was a true open source project for many years and that many people contributed. Many pull requests are open and merged. Many discussions were had in the GitHub's issues list. Many people solved approximate problems but nobody had the big picture. It was built when the Mungoid to be server was quite simple compared to what it is now. There weren't many features or even replica sets at the time. So the history of this project and the complexity of the ecosystem built around Mungoid and how it fit into rails and how it used the driver is really complex and it might sound familiar to you if you're working on open source. The first version of Mungoid so like falling along with this diagram anything gray is not developed by Mungoid to be Inc the company that I work for. Anything in color is. So the first versions of Mungoid used the Mungoid to be Inc's Ruby driver, the 1x series. This is the driver that I was hired to work on five years ago. At the time I joined the company Duran had just built his own driver called Moped because the official Mungoid driver hadn't developed some features he was hoping to have and they were sent back and forth and some frictions so he was because the server was kind of simple at that time he was like okay I'm just going to build my own driver so I don't need to like have this extra level of diplomacy to get changes. To move forward with Mungoid. So at that time the Ruby offering if you're using Mungoid to be with rails was entirely was developed entirely outside of Mungoid to be Inc and wasn't developed by anybody who was actually paid money to do it. So at Mungoid to be at this time we knew how important Mungoid was to the Ruby community. Basically if anybody wanted to use Mungoid to be with rails went through Mungoid and basically anybody wanting to program in Ruby was unfortunately as someone said yesterday using rails. So by the transitive property anybody wanting to use Mungoid to be what Ruby with Mungoid to be would have to go through any code that wasn't actually developed by the company if that makes sense. The company was growing as were the features of Mungoid to be in the sophistication opacity of the behavior. So it was really difficult for someone in the open source community to keep up with what the server was doing because they didn't have that insight that I have working for the company where the like insider knowledge where you know what the roadmap is you know what the internal issues are what the priorities are. You can walk over to a server engineer's desk and ask him about something specifically because Mungoid to be has a lot of quirks a lot of between server versions the implementation of certain features can differ wildly. So sure enough at that time Mungoid's issue list grew and the projects started to lose traction and trust in the community because it just couldn't react fast enough. In 2014 the one next driver needed a rewrite and so we thought it was a great opportunity to approach Dern and say hey do you want to come work at MongoDB we can build a new driver and then we get Mungoid to use that new driver and then we're taking off that burden from your side because we can maintain the driver. So he was up for it in 2014 he joined us and he and I worked together to build a new driver which is kind of my way to get to Berlin and it was the Mongogem version 2.0 and then in doing that we were able to we decided that we bring Mungoid in in-house as well so then Mungoid became an official project. Since then Dern has moved on to work on another team at MongoDB Compass if you're familiar with our products there it's the GUI for navigating your data and collections and I've taken over Mungoid and the driver and just to as a little aside just to like show you how this is actually simplified version of the story there's also a gem called Origin which is the DSL query language for querying MongoDB that was a separate gem but in version 6.0 I brought it into the codebase because I realized a lot of people were using it independently so that's super complicated also. So like for example if I need to fix a bug in Mungoid 6 I can do it in Mungoid's codebase but then if I want to backport it I have to go and release a separate version of Origin. So now that Mungoid and the driver are back together again they're getting along quite well except for the occasional bickering over who does the dishes. The work is done the relationship's going well but a lot of baggage has been brought back into the relationship by Mungoid. So at first I was excited about all of this everything seemed so clean and centralized and I was excited to start working on Mungoid and the driver and that Dern would be moving on to another team so I'd have more responsibility but I quickly realized that I inherited a ton of work. Namely there were 199 problems and they were all Mungoid issues. We imported the GitHub issues list from Mungoid from Mungoid into JIRA and it was a disaster. I almost had a heart attack. There were a ton of issues and I didn't think I would ever get through them. I think there actually were 199. A lot was broken the project was in pieces the community was fragmented. How could I bring the project back into good standing with its users restore trust and communication? How could restore structure and reduce entropy? Hopefully restoring entropy to its original state. Was it possible to make Mungoid even better than it was before? So I did what the Kingsmen and women tried to do for Humpty Dumpty. I did what France the World's Monument Fund the Colombican repeater science team tried to do. I studied the structure, identified the pieces, the weaknesses and I tried and I kind of succeeded to put Mungoid back together again. So how did I do this? I'm going to spend a little bit of time talking to you about how you can take an existing project because I'm sure you all have them who are in dire need of a refresh and put them back together. There are many presentations and books on how to refactor. The problem is solved and there's no need to reinvent the wheel or retell you a lot of the things that you can just look up or watch other presentations on. Every type of code smell is identified and recipes are given for refactoring. The definitions can be overwhelming but who can really apply them perfectly? I read the definitions too and I would identify some of those things in my code base but they couldn't it's kind of like this equation on the second law of thermodynamics. It's a guide for how to understand the concept but it can't actually be applied in practice. So I'm going to tell you a much more human story of how I refactored Mungoid and put it back together again because it's a very real project with very real problems. I'm going to share with you some tricks and things that I did that I applied to my process. We'll look at how I study the structure then we'll talk about refactoring and there's definitely a way to refactor and many better ways to refactor. Finally we'll talk about how to avoid letting a project slip into this state in the future. So regardless of whether you're an open source project maintainer, I think you'll find that a lot of what I'm about to say can be applied to your own projects. Raw maintainers are some legacy code base, some pre-existing project. I bet you agree that the entropy and disorder of your code base increases over time. But I do think that we can pause, repair and restructure our code bases to actually be stronger than they were before we started. Again the second law of thermodynamics says it's improvable but not impossible to restore system to its original state. We're engineers and when we put our minds at something we can make it happen. So Mungoid structural analysis. I spent a while addressing bugs in Mungoid, one by one going through those 199 problems because I didn't have a good sense of how everything worked and at the time that during built Mungoid metaprogram was really popular so he did things like, I don't want to, sorry. We can talk about that later. I mean as I said during, it's 10 years ago so I'm glad he doesn't really watch conference talks that much. But I knew in the back of my mind I had to build up a familiarity with the structure of the code base so I took notes in the code in a notebook like literally with a pencil on how everything worked together. I drew diagrams like an architect. I stepped through the code with pry and wrote down the call stack. As I said before many solutions were applied that approximately solved problems but because not many people had the full picture so like typical case, obvious case is a pull request, fixing something very specific. It's really important to have a mental model of how a code base works in order to make high quality changes. Luckily Dern also had my back in this case as I said he was still at the company so me trying to figure out why something was changed was it wasn't good enough to look at get blame. I could look at get blame and say like hey Dern why did you do this and he would give me this like whole story and luckily he had a good memory and a lot of stories and so that was I recognized that was something that not everybody has like that resource but that was also really good for helping me understand the history of this project. So the like one thing that that made this refactoring seem possible to me was grouping my issues into categories. If you categorize the issues you can see where the hotspots are and focus on them when rebuilding or repairing the structure. So a 3D model was necessary for the same exact reason. In particular with mongoid I realized most of our issues had to do with behavior related objects so I created an epic in juror to track all those issues related to relations bugs and so when I say relations object it's when you define a model and you say like a book has one author there's a macro that runs and it creates this object called relation and it saves it as into this global variable on the book class and that object itself is what caused a lot of problems and I tried to cluster and categorize my issues around that one thing so that when I focused on refactoring it I knew what its needs were. Stepping through major code paths and taking notes is really important also. Choose code paths that you don't understand and step through them with pride. I know they're scary and it's really really tedious but it's really helpful to do that and as I said there's a lot of metaprogramming so that made it really opaque really difficult but I took notes in the code with with comments as well if something was for example an attribute accessor in one file. It's structure is made up of behaviors in different modules so there are like a lot of different files that define a lot of different things about this one document class and so I would I peppered the code base with a lot of notes so if I was following code paths and I saw a variable I would say like this is defined in X module and that really helped me to understand the shape of the code base and then lastly draw diagrams for yourself like literally with a pencil like an architect it was really helpful to do this as well and seeing the structure visually helps you I mean again like coming back to art history it's kind of like a sculpture you really like there is the shape to your code base and you want to understand it so after I did all of that what did I identify was a weakness so I'm going to give you a concrete example like making that relations issue that I built that epic around like more concrete so you can follow along with it and see how I focused on one element of the code base that was the weakest and I spent on which I spent the most time refactoring after refactor this one thing I was able to close about 40 issues which at the time that I was doing this was 50 percent of our issues so I was really happy about that I identified that we had one object that contained all the information about the relationship between two models in mongoid it was called metadata and it was inherited from a hash so essentially was a hash it is basically like the laziest class you could ever have because it's just keys and values with no specific logic or behavior so like a nightmare it was an object created when the model was loaded so like when you write that actual relation in the model class it would create this metadata class which was just a hash so like writing book has one author would use a macro to create this metadata object sticking on to the book class and that's what it used throughout all of the code to determine what behavior an instance of a book should have or even the class itself if you're querying or whatever so in code small terms this is a classic bloated smell this class knew and did way too much I'm sure there are tons of other code small terms you can apply to this as well so this is a metadata class definition does anybody notice something alarming about this comment the grand pooba of information about any relation in this class it contains everything you could ever possibly want to know and by the way possibly was spelled wrong which goes back to what I was saying about this being a whiskey project um as you can see um it was like basically like an eight ball like you just ask it anything it can give you the answer and it's totally random writing simple code is important but let's define simplicity does simplicity mean that we should have the least number of classes we should does it mean that we should favor one basic object over multiple smaller different objects is having one metadata object saving all information about every type of relation the simplest and thus best design design decisions I understand do involve trade-offs we are so frequently chanced DROI don't repeat yourself but sometimes we need to introduce a little bit of duplication in order to have a simpler design prefer duplication over the wrong abstraction I'll give some examples of how the metadata object was used so you'll see how it became obvious what rearranging had to be done even without understanding anything about mongoid I think you'll you'll recognize what patterns should that kind of come out of this code and what need is to be done to restore structural stability to mongoid's code base but before I do that I just want to say briefly that mongoid has two main types of relations because MongoDB is the document database it has reference relations which is what you would recognize from active records so it's straight up like reference relations their IDs foreign keys saved on objects hasn't belongs too many is has many through but there's no joint table because MongoDB has a feel that can be an array so it's just saving arrays of the related objects on either end and it's kept in sync and then embedded which is a pretty selfish monitor you can have embedded documents in MongoDB so you have these types which implement that relationship between embedded and parent documents so this is one example instance method it's called determined foreign key and actually when I was reviewing my slides this morning I I didn't even see the first line it says determine the value for the relations foreign key performance improvement what I don't that's something we'd have to ask them about at first for the life of me I could not understand what was going on basically it's a know-op if the relation is embedded but embedded relations don't save foreign keys because they're embedded they don't need them so like why would it return a foreign key in its options why would even allow an option of a foreign key if it was a really an embedded relation um yeah so this doesn't count as the sandy metz quote by the way but sandy metz says this thing where like if you squint at the code you can kind of see the structure and the shape will come out at you and so I kept squinting my eyes at this thinking like maybe something would come out of it but I didn't really see much else besides what was there but I did notice that basically there are a couple of things like when you're doing a lot of refactoring you get pretty good at recognizing these hidden patterns and so when I came to this one thing I noticed was in my refactoring mindset is that like first of all a foreign key option is set it's returned nothing else is done um if the object is embedded it returns nil and it should probably be before it checks if there's an option of a foreign key set and the last thing is the relation object knows something the metadata object doesn't though it uses the metadata's data or the meta metadata's metadata um uh but and also the other thing which I want to add to this list is that this metadata object is supposed to be the relation but there's also a relation object saved on the metadata class so why aren't those conflated like the so there were actually these relation objects that had different behavior but I I didn't see why we needed to have this metadata class if we could just have these objects that have their own behavior so here's another method just to give you a sense of how sticky this codebase was um it's used to get the names of an inverse relation given a certain relation um the first method checks if the type is uh polymorphic uh I know it's kind of small but it says it's self polymorphic look up inverses and then otherwise it determines inverse inverses and when I looked at those two methods they had a lot of overlap in logic so I it was really difficult to determine what um logic should be extracted if some of the checks were repeated um after they've been branched um um but basically the point of showing you this is to show you that it's pretty clear that the metadata object was begging to be refactored into smaller object oriented objects the entropy or disorder of the system was way too high any bugs having to do with this code were virtually impossible to fix and the structure was weak the most obvious need was for there to be a reference an embedded namespace with objects that knew that they were referenced or embedded and had their own behavior so I embarked on a journey to refactoring the metadata object into different objects under the namespace reference and embedded how did I do this did I do it all at once did I uh read a lot of books and uh learned about how to do this perfectly and then apply those practices so I had a couple of false starts I bought Martin Fowler's refactoring book but I honestly didn't really get through much of it um kind of wanted to learn as I was doing it um I talked to my manager a lot had some nervous breakdowns um but I learned that there are a lot of wrong ways to refactor and a couple of right ways or better ways so it's really important to do proper refactoring not random refactoring I like to think of in terms of the health of a project this is something very similar to the way there's something very similar to between the way a refactor and work on my code bases and how I design my weekly exercise routine I always ask myself when making changes to a code base is this a healthy change is this a quick fix like a piece of candy or a bag of chips that has a short-term payoff like it's really yummy right now but I know in the long term this probably isn't good for me um we all have to refactor at some point it's really important to have a plan and design for what you're going to do refactoring should require the same effort and process that you apply to building something from scratch I think sometimes we forget that just plowing through and trying to fix everything you can along the way is definitely one of the wrong ways to refactor so at this stage in the repair of mongoid I had done the structural analysis and identify the weaknesses the next step was to refactor with a plan over the course of my refactoring of mongoid I learned a lot and I'm going to share some of the highlights with you again I'm not going to go through like recipes or theories because you can read about that and it's something we've talked about a lot and because it's something we do a lot um but I'm going to show you a couple of things that I did using this metadata object as an example because I think it's like the classic case of something begging to be refactored as you as I said we can read about this but I also watched a lot of talks along the way for guidance I'm not saying that you should just dispose of all of the theory I think it's really good to know it but I'm going to like share something that's kind of not really something things that I read or heard about so the things that I learned were one to refactor one piece at a time use tests at every step and don't fix bugs this one was really important so Martin Fowler we know this we can probably recite it in our sleep defines refactoring as the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves the internal structure so if we rearrange the system so the external behavior doesn't change it doesn't matter if we rearrange one corner of the system and then another corner of the system and do it piece by piece because the outside behavior is not going to change so what I did was first define a namespace called reference and create a class called belongs to which seemed like a really obvious way to refactor this I returned this object when a model was defined with a belongs to relationship and made sure all the tests passed before moving on to create another object the largest benefit of rearranging the system piece by piece is you can test out different designs and not waste too much time overhauling everything only to realize your new design doesn't work agile principles aren't only for building new things so as I said you should apply the same practices to refactoring as you do to building something from scratch I iterated over my refactor design I tried out different hierarchies I tried creating classes for things like a builder so a builder is something like if you have a book and you build author there's this builder thing that did that for you if you bind something it would it would be book author equals another author and so those things were objects originally and I tried out having them be objects but I thought it would be much better if they were modules because their behavior and things that do something once like there's no reason to save instance variables on that builder because it's created as a as a side effect or byproduct of building an object or binding it to another one secondly before you begin to refactor make sure you have a solid suite of tests tests are the wallet your back refactor your tests simultaneously with your code I can't emphasize this enough you won't be able to figure out what went wrong with your design if you do all of your refactoring then run the test and realize they're not passing so this is just an example of that thing look up inverses that I showed before I decided that I wanted to I wanted each object to know what their compliments were and so this is just an example like you can you have to port your existing test but you also have to write new tests and this was a new test I had to write because when I refactored each relation object had to be able to ask another relation if it was a compliment of itself so that's a test that I wrote for that it's really important to add those tests in as well and the last thing is don't fix bugs so along the way I would I had I was really familiar with this list of bugs in JIRA and I when I was refactoring the code I would sometimes find the places or the sources of these bugs and I would I was really excited to find these places and I really wanted to fix them but I had to be really self-disciplined about not fixing them and saving them for later so this is just one example when you have a list of embedded documents in a parent document Mongoid would allow you to append that same document with the same ID onto that list and it's a pretty simple bug nothing really that exciting but when I was working on this code this is the the binder object and it would allow you to append that that embedded document twice to a list and I when I was refactoring saw this line I was like oh that's where this is happening but I was like I'm not going to do it I'm going to do it later but I would note it down in the JIRA ticket like where to go to fix it and also this idea became crystallized for me by my manager who's he maintains a Java driver and so he has a different way of thinking than I do he's he has this like really booming kind of godlike voice that makes anything he say sound extremely significant but it is significant and in this case we have these one-on-ones every other week and one time he knew how much work I was putting into refactoring this project and he also knew how many bugs I had to get through in the in the bug list and so this one time he was like so Emily you're fixing you're doing all this refactoring I recognize that's a lot of work are you fixing bugs as well and I was like and I got really defensive I was like no I'm not fixing bugs like I don't have time for it I really want to finish the refactoring before I do that and he was like good you should never fix bugs while you're refactoring and so it's like okay I passed that test so he likes to ask these like trick questions that make me think really deeply and like he's amazing it's great so the crucial point was perhaps the one that took the most self-discipline as I said I I've I really had to like tie my hands behind my back when I was doing a lot of this so the last thing that was part of this restoration is not to is to not always discuss in it's not always discussed in books or refactoring presentations nor is it specific to an open source project it's that of responsible maintenance and restoring user trust so this is kind of the same idea of like sustainable farming or responsible farming where farmers don't use chemicals or things that harm the environment for short-term benefit and financial payoff like like red or tomatoes that are at the expense of the soil and the long term the longevity of their land so it's kind of the same idea with your project you want to make sure that everything you do is with the long term and the health in mind so it's just like starting a new exercise or eating regime improving mongoid didn't mean applying this quick fix like running for a couple days and then like calling that my exercise regime and stopping for a while I had to establish healthy habits going forward for the code base this meant properly categorizing issues as they were open in the project it meant responding to users right away in order to gather the most relevant information even if I couldn't fix the issue right away so that I could reproduce it because that's part of the problem with all these issues that were imported into JIRA from github there's a lot of those people like didn't even code anymore it's like going back to 2014 2013 so I knew these problems existed but like didn't really know how they got themselves into said whole and so release notes documentation basically any interface in the community had to be kept up to date so that people knew that mongoid was alive so for example I made sure our api document docs were linked for a main documentation because a lot of people also were getting confused with mongoid's old documentation which was still around so I had to make sure the documentation was really centralized and obvious I release new versions regularly I make sure that I'm always in step with rails and I respond to rails movement in our march forward I follow semantic versioning closely I make sure to tweet and I send out announcements on google groups so that people know the project is always moving forward and that they can trust that mongoid is active that there's someone working on it and that it's alive as I said and the benefit of having worked on something working on something that was quiet and passive for so long is that people don't realize I'm paid to work on mongoid and so when I respond to them right away they're like so happy to get a response and like like thanks so much for your work and I'm like I'm paid to do this um no it's it's great like I think people I can tell people are really happy that finally like the project is treated like it's uh that it's it's being responsibly maintained so after all this work had I succeeded in reducing the entropy of the project how did it compare to the entropy of the project before these changes entropy is pretty abstract concept as we've seen and measuring it in the context of code bases seems even more intangible as we heard in the beginning of the presentation entropy is measured in terms of the number of ways in which a system can be arranged we can't quite measure this in a code base but I did however need to prove somehow to myself my community my manager that my time spent was time well spent so I measure entropy in these three ways and also other ways but I think about this constantly um to help myself like make sure that mongoid is always um kept at a stable state so how difficult is it to make changes when you have a bug sure it's difficult to find the source of the bug but how difficult is it to actually fix that bug do you have to fix it in five different files do you fix it in one place and then run the test and cross your fingers and if it passes and you move on um do you understand the structure of the code base enough to be confident that one fix uh is the only place you need to make that fix and then the other thing is can you explain the design so this was something I had to do again involving my manager um he we uh we're looking to have someone from internally from mongoid to be join the ruby team and uh someone who's uh a little bit newer um to coding into the company and um so he said to me he was like in preparation for talking to this person he said like why don't you write down everything you need to know um everything you know working on this project and something and everything someone should know joining this project and so I was like uh that's a lot but um so I met with the the the guy who was going to join the team and I spent about an hour just explaining the complexity of the gem dependencies and what projects I was maintaining and what um had um where the tentacles were um between the gems and um and I so I did that and I was like I don't know if I can write down everything I know about working on ruby but but um explaining the mongoid code base I can definitely do that now and I couldn't do it before because I I both didn't really have that mental model and I also didn't think there was much structure and then the last thing is how is performance slow performance is another indicator of high entropy in your system the messier and more inefficient code paths the worst performance will be previous to doing this refactor our mongoid test suite would take between three and four minutes after this refactor it's taking eight um so I was I was freaked out for a little bit and I was like wow that was a lot of work for nothing um but it's because I introduced a thousand new tests and they all dealt with um creating classes and creating relationships which is actually quite code heavy and um involves creating classes so it was understandable the test suite got slower but it made me realize that I needed to do some benchmarking and luckily we had a pretty rigorous benchmarking suite that I was able to use to confirm that I actually made the performance little um increased performance slightly um so that's really important also like sure you can do this refactoring but like make sure you always benchmark before and after so in the end I'm able to say with confidence that I reduced entropy in mongoid and I'm particularly happy that I'll allow other engineers easily join the ruby project and potentially open pull requests and make the code base less opaque this work has also shifted my perspective and I think differently about my projects so again my manager um likes to ask me trick trick questions and um or in part wisdom on me and about a month and a half ago again in our one-on-one meeting he was like um he so I I come into our meetings like with the things that I've been working on to tell give him an update because he works on the Java driver so he doesn't really know he doesn't really track like every commit that I do and every uh jura ticket update that I do so he comes into our meeting he's like so Emily what did you do today to make mongoid better and I was like I have to any of these make mongoid better so now I always think about that like I go into my office in the morning and I ask myself like what am I going to do today even if it's just a little thing to make mongoid better and then when I leave I'm like did I do anything to make mongoid better so um I encourage you to ask yourselves when you go to work on Monday before you start coding what am I going to do today to make my code base better so on that note um um I'm going to remind you for people uh well people came in late you might have disqualified yourselves or at least made it through can't email me but email me with any Mets quotes to get mustard um and the other thing is if I could hijack part of my question session and um ask you a question um can everybody take out their phones to get your phone and put it in the air and turn it on and sing along one no just kidding um can you can you open your email client on your phone and um write Emily at MongoDB as the receiver so you don't actually have to do this if you don't want to but um so right I'm doing this to also so I know how long it takes to okay so write me as the receiver and then um write to me one sentence that like one sentence that sums up if you've used MongoDB and you no longer use MongoDB why if you use MongoDB and you still use MongoDB why and if you've never used MongoDB and you don't want to use MongoDB why just one sentence and that would help me so much because um so on this note of making MongoDB better I um have been tasked now that it's kind of it's easier to fix bugs and it's a healthier project and other people can work on it with me um it's not so much of a black box anymore I um I know that in the Ruby community MongoDB is not super popular it's not the default database and I attribute it largely to the mindset that we're all in when we write web apps um that is created for us or at least imparted on us or talked to us by using Rails um we all think relationally for the most part and MongoDB challenges that and makes it kind of confusing because it's kind of relational in some ways and kind of not and so I know that it's there's a learning curve and a mongoid has always followed for the last 10 years this philosophy of um following exactly what ActiveRecord does to reduce the friction and the the learning curve if you're going from a relational database to MongoDB um but I'm now I'm pretty torn because I think that I can tell from all of the issues that are logged and the way people ask me questions that having that be the philosophy of mongoid makes it so that people don't really learn MongoDB and how to use it properly and so they end up building relational schemas with MongoDB which is not always the best solution and you end up doing many more requests to the database because we don't have joins and you want to do and that you wouldn't do if you had a Postgres backend for example or database for example so that's that's one philosophy that's what it's always been doing and I know that the switch to MongoDB is not that bad in if we follow this path but now there's this other path where I think that maybe I can either build a new ODM for Rails um or for Ruby or adapt MongoD to make it more modular so that you you're not imposed you're not uh being put into this relational mindset automatically so I don't know if I should make it like totally its own thing um and uh have and I recognize that the learning curve would then be higher because like it's a trade-off like I might have fewer users in that way because the path to entry is much harder so I I can't determine that just from because by definition the only people I hear about are people using mongoids so I don't really know why other people who aren't looking at MongoDB as a perfectly acceptable option of database for Rails why they're not using it I can't determine that without you all telling me why so please email me and I have a thick skin like I can handle it you can tell be as brutal as you want um I wouldn't be working for MongoDB or standing on the stage if I took everything personally um for working for MongoDB for five years so remember what everybody thought of MongoDB five years ago um so please be honest with me it'll help me a lot and will help me make MongoDB better so that's my question for you I don't know if we have time I wasn't really paying attention to to your time for questions yeah so we're running a bit late but I'm sorry we can still take some questions if anyone has them although I'm happy to take questions later I don't okay um if I know how humans minds work you guys are doing the single thread thing where you're replying to the emails right now so I don't know if you can think of any questions but you have our email address so the questions can go via the email yeah can I take the question offline I think yeah okay so thank you very much