 Thank you for coming to the talk. So I'm Yeni, and today I'm going to talk to you about washing away cold smells. So this is a nicer picture of me than real life, and I'm originally from Hong Kong. I'm a software engineer at Yelp on the business national team in Hamburg, Germany, where we're scaling advertising tools and also doing reporting solutions for our biggest customers. I was previously a speaker at PyConDE, PyDays Vienna, and also the talk PyCon podcast. The reason why I want to mention this is because I learned a lot from these kind of experiences. So I just want to give credits for the people I talk to, the organizers, and also a lot of things that I learned through the experience of presenting at these occasions. So yeah, a little bit more about my company. Our mission is really simple. We want to connect people with great local businesses. So you're in this beautiful city of Edinburgh, and one of the really important questions you ask yourself is, where do we get dinner, right? So Yelp help you find exactly that through people's reviews, photos, videos, and also their opinions on which one you should go to. And that's one of the things that I'm thinking about as well. And this is probably what our homepage looks like here. So there is a search, and you can just put down the category that you want to search for. So pretty easy. And some fun facts about our company. So Yelpers have written over 155 million reviews. There are a lot of users, as you can see, a lot of monthly unique visitors. And our engineering team is growing very fast. We have over 500 developers now over the offices of San Francisco, of London, and also of Hamburg where I'm working. And we have over 300 services, and our model list, Yelp main, has over 3 million lines of code. And why am I bringing this up to all of you? It's because when you have so many people working on the same code space, it actually really important that we keep a standard of what kind of code we're checking into production. And that's why I want to share with you this little message that I learned basically from my time at Yelp here, and hopefully you'll like it. So that leads us to our agenda. We're going to talk about what code smells are. Why is that something that we should spend time in? Why should we care about this? Why should we spend our engineering effort into clearing out these code smells? And how we can use the technique of refactoring to wash away these code smells. And also, if you're sold about the message that I'm bringing to you, how you can bring refactoring and this technique to your company. So let's get started. What are code smells? A code smell is a service indication that usually corresponds to a deeper problem in the system. So one thing to know is the service indication here. So sometimes you can have a code smell, but maybe there's nothing wrong with it. If you look deeper into the code, if you do some investigation, maybe that's what it takes to complete the problem. But so this is the service indication. So I want to bring forward a metaphor here. It's like eating cheese, right? Sometimes you smell really, really pungent cheese and you think there is a problem with this. But then when you eat it, oh, it's actually fine, it's good cheese. So that's something like code smells. So why do we care about this? So the tech lead for Twitter's engineering effective groups once said that 1,000 flowers bloom and then rip 999 of them out by the roots. So what do we mean by that? I think that's a metaphor for your developers have a lot of ways to do things. And yeah, of course everybody can try out their creativity. We can try out things and see if it works. But then gradually through experiments, we realize that some of them don't. So that's the time when we want to rip them out from your code base. And this is why we should do that. First of all, when we just leave it unchecked, it builds up tech debt. So what that means, it's not like if we procrastinate and not solve this problem in this project, it's gonna go away. It's actually gonna gradually build up and comes back to you in the next project, which is actually exactly what is happening to me right now, but so there's a problem. And also, it's not like if you design the code, like say I'm very good at designing code, I do it right the first time. And then it's gonna go away. So code actually gradually rods as time goes by. So one example that I can give you is say you have a code base in Python 2.7. You can create everything that's of the standard of the time. But then over time you'll code rods and now we have to migrate to Python 3 because there are new standards. There are news experiments that people do that make things better. So here, we don't wanna let your code rod. And there is also a class of saying that talks about why do we need to uphold that kind of quality in code when one day we will throw the code away because technology moves so fast and we're building so many new products. We might have to get rid of the old products. So why do we have to maintain this standard? Actually, this is a very strong argument on why we have to spend time doing this. It's because only with good code can you identify what to throw away immediately. So I can give you an example. I'm sure in your company, there is this one file, there's the admin code that is 4,000 lines long, nobody know what it does. And that's not great because when you have to throw away functionalities that you don't want, this is the problem here. You don't want your code to be like CSS. So you have like a long CSS file, you wanna take out one line and suddenly your page breaks and you don't know why. This is not what you want your code to be like. And at the same time, I also wanna bring forward that it decreases productivity. So I definitely have that experience before. I'm working on legacy code. Well, in my dry run, I keep on saying legendary codes, so I have to be very careful here. Legacy code and yeah, I thought I figured it out this one day. Okay, I know what's happening. I'll even try to write down how it inherits 10 hierarchy in the Python code. And then the next day, I sleep and I come back to work and then what is this? So imagine that times five because your whole team is struggling and doing the same thing. So that really weighs our productivity here. And there's one story that I wanna share with you briefly. So in New York City, there is an area with a lot of primes. And the city is very dirty and there are a lot of drug problems, homeless people. And then they actually send in some psychologists to check what is the origin of the problem of why is it so dirty, why are there so many problems? And actually, it all originated from a few broken windows, can you imagine? So if you leave this broken windows or in here analogically, your code smell here, you're giving other people a message that you can just do whatever to the code. You can check in code that is not of enough quality. You're checking in code that is not up to the standard. So you don't want your code to be like this. And at the same time, I also think it's very important for developer happiness and why is that? So a show of hands, who here just really likes working on legacy code? I will, can I say it's not the majority here? Yeah, so you get the idea. Maybe that's also important for us to retain people if your day-to-day work is just cranking out what your Python code of ten levels of inheritance means. Yeah, so probably we don't want that. So we have the technique of refactoring that can actually come to our rescue. So one thing to bear in mind throughout the talk, it's about changing the design of your code, but not the functionality. So the end result of what is being produced shouldn't be altered during the process of refactoring. So here, let's go through a small example to develop our code notes to see if we can identify some of these code smells here. So I'll give you one second to look through this. I'm not brave enough to do life coding, but this is the best I can give here. So this is a fairly simple program that gets us the cheese we want based on our mood, hunger, and money. So there are a few problems here. This is supposed to be a fairly simple function, but it's quite long. No, let's see what kind of problems we have here. Can anyone in the audience already see if there is any problem? It's missing one more than three times. Yeah, that's one of it. It's missing tests. Oh, missing tests. Oh, that's an interesting one. We'll talk about that in a second. Multiple return points. Yeah, that's a good point. So let's organize that a little bit. I think you basically hit all the points here already, right here. So first of all, the naming is a little problematic here. So mood is bigger than three. What does that even mean? Does that mean you're ecstatic? Does that mean you're sad, angry? There is no way to tell. So if it's not for the comment, we actually don't know what we're meaning here. And we're using comments here as a deodorant to cover up our code smell, right? So the problem itself is we didn't name things correctly. And that's why you need to write a comment to kind of explain what your code is supposed to do. But we don't actually need that, do we? And as you pointed out here, we have some dead code that never got executed. So for a simple program like this, it's pretty easy to spot out that this never gets executed. But for, say, the program that I was dealing with, maybe it's much harder to see what is being run and what is not, especially when you're not familiar with what the code is supposed to do. And duplicated code, which we can also extract out of the if-else conditions. And also conditional complexity, right? For a simple program like this, we're nested it for three levels, which is probably the maximum we should have. If you have more than three levels, you really should think about if there is another way to do things. But yeah, let's do a simple refactoring here. So magic and, oh, we have something better. In real life, it rarely happens this way. You actually have to take some effort to do that. But yeah, let's point out the things that we have improved here. First of all, just want to mention there is a nice doc string that tells you what the function does. That's usually helpful in most of the cases. Also, we changed the mood bigger than three to is happy. So now you know it's actually happy. And we have introduced something called the guard clause that's here to get rid of the complexity. So one thing to know is the top four are the cases where it's not supposed to happen. And the last one is the default case that you want to return. So if I have money and I'm hungry and I'm happy, I'll return blue cheese. And also, Bruno, this one is also pep eight compliant for people who like pep eight. Yeah, personally, I don't recommend you to go buck your colleagues and just be like, oh, you're missing a space there or missing an indentation. They're actually automatic tools that you can use that. But I think it does bring value to your company because since we're working with more than 500 developers, it's good to have a style guide of how you write code. It just makes things easier to read for everyone. And also it avoids get merge conflicts. So that's the practical point. So yeah, we have taken a look at a simple exercise. So now we're gonna dive into some of the points that I wanna mention for refactoring. So just a quick recap of what we've done here for the refactoring. First of all, we named things right. Same for the comments point. We have removed that code and we have dried out the duplicated code. And also we have reduced the conditional into guard classes. So I can briefly categorize these into categories. And these are probably the lower hanging fruits that you can get basically once you start refactoring. There are definitely much more ways for you to do that, but we can also investigate into that later. These are the things that I think it's easier to do and has the most immediate effects. So name it right. Naming is actually one of the hardest problems in programming. Do you know which three it is? Yeah, naming is one of them. Cash and validation. Cash and validation, then one more. Threading, okay, there we go. Three of them. So I'm happy to talk to you about one of it, which is naming things right. So it's a cure for uncommunicative naming. Python itself, it's dynamically typed. So since you don't have the typing information, we want to name things very clear so that other people know exactly what's going on here. Of course, I would also recommend pushing for type annotation. That is one thing that will help, but naming is also very important for Python. And apart from the variable naming that we have mentioned, there is also function and module naming. And module naming is probably more specific to Python. So you probably don't want to do something like from yelp.business.bizimport.bizinfo or something like this that doesn't give people a clear sense of what you're trying to do. And that's just asking for bugs. And at the same time, keyword arguments, that's part of the PEP aid as well, that increases clarity. So if you have it in the code that you're using, then you don't have to go back to the original function to check, actually this one corresponds to mood, or is it hunger? So it just saves you some time and increases clarity. At the same time, I would want to introduce replacing magic strings and numbers with enumes. I think that's a very good practice. So here there is a short how-to that we can go through together. So one thing that is good about it, so even though it's happy, it's not as explicit as this one. Now you can specify your mood to be exuberant. You can be content, you can be apathetic or melancholic as specific as you want. And you can specify it right in the enum class of mood. And if you're a fan of being explicit over implicit, so I think this is something that you'll like. And at the same time, it also have pretty good properties, like it supports iterable and also it's hashable. That means you can use it in a dictionary, you can use it as a key. So it can be a substitute for just a pure string or an integer. And that's really great because consider the situation where this string is just used in like 15 places in your code and suddenly your product manager wants you to change it into something else, which happens very often actually. And then you just have to change one place in the code and you don't have to worry about typo because there is just one place. So that's something that is considered a good practice. At the same time, I want to push for also getting organized. So it sounds pretty simple, it's actually probably one of the hardest thing to do. It's a cure for long functions, classes, and param lists. So one thing that this wants to uphold is the single responsibility principle. So your function or your class is only supposed to do one thing. And how do you know that? Actually, when you're after you write the function and when you try to name the function, if you find yourself having a very hard time naming that function, it's probably because you're not following the single responsibility principle because it should be fairly easy to name if it's only doing one thing. So in this case, you might have to extract some of that into other functions or break it down. There is also decomposed conditionals, which is one of the getting organized method, and also dry, which is don't repeat yourself. I also didn't know that I had to look it up. I think in the beginning of when I learned about this, dry actually means don't repeat yourself. So yeah. One of the example that I want to give here is fixing long parameter lists. I think this I've encountered when I'm doing programming as well. And so here there is a simple example of identified cheese. And you know identified cheese is a hard task. So like we need to pass in the country, smell, touch, city, year, taste, you know, a lot of information for the program to decide what cheese it is. But you know, gradually this list grows, right? And it gets out of hand. You know, you have what 20 parameters that you want to pass in. That's kind of like an eyesore. So what we can do is we create name tuples that can organize these things together. And you can even add type annotation that kind of groups them into a more orderly fashion. And hence you can only pass in, you end up passing in only two things. Of cheese production info and cheese attributes. So, and there is also like a, it's a way to document what exactly do you want to pass in. Name tuples to the rescue. And actually going from this point, why do I put name tuples instead of other data structures? So that's also one thing that we need to think about when we're designing. So I will want to go through some examples to talk through why dictionaries versus name tuples and also lists versus sets. But you know, like you use these data structure based on your need, but in this case maybe one thing is better than the other. And we'll see why. So here's how it's like using dictionaries, but this is a wrong case or a bad case of using a dictionary. So just one second for you to digest this. So we're just doing some cheese math manipulation, I guess. And we're passing in a dictionary as a default value. So I'm passing in cheese counts that consists of green and blue. And we're just doing something simple like cheese counts blue incrementing by one. But what is happening here that is scary is, so what Python's actually doing is just saving that into like a default variable. And in the beginning you have what you expected, right? Green and blue, both are equal to zero. But after you call it once, what happens? This actually gets mutated and your default got changed. So depending on how many times you're calling your function, your default is different. And that's pretty scary. And imagine having to debug this in our monolith code base, I cannot imagine. So this is one thing that we actively try to avoid, putting mutable data structure as a default that's not recommended. And here is where name tuples can come as a rescue because they're immutables. So the same thing, we can actually pass in cheese counts as a name tuple. And here is how you can specify a default for name tuples. So here if you're just specifying blue equals two, the default knows that blue is equals zero and blue is equals two. If you don't put anything at all, then you get blue equals zero and blue equals zero. And what happens if we put this in the previous function of some cheese, it's gonna shout at you, it's gonna not let you mutate the default. So that's better because you realize the mistake before it goes to production and I don't know how far it can take before someone finds out. So there's that. And also between using lists and sets. Here is a function that select my favorite cheese from the catalog. So we're passing in two things, the cheese catalog and also my favorite cheese. And you're just basically doing a loop to see if this thing is in the my favorite cheese list and we'll return picking the favorite cheese from the cheese catalog. And yeah, everything looks good. We're passing in blue and cheddar into the cheese catalog. And my favorite cheese is truffle brie because of course, and also blue cheese, right? And in the end you return blue cheese because that's the only thing in the catalog. It's supposed to do what it does. And that's good, except it's a very long function for this because we can do something actually simpler than that. And that's where sets come in, right? So right now, how many lines do we use? One, two, three, four, five lines for a one liner like this. And you can basically just use intersection to come up with the common set between your cheese catalog and my favorite cheese. So it also returns the same result, except now it's a one liner, which is more of a pythonic way of doing things. But of course, there are some drawbacks between sets and lists. Maybe sometimes sets has performance or memory implications. So that's also something that we need to consider when we're refactoring. But for like a general use case, the set comparisons are pretty awesome, no? It is. And yeah, another suggestion is to check out the standard library. There are actually a lot of gems in there that I feel like I personally also don't spend enough time on it, especially iter tools and collections. If you make something an iter tool, there are a lot of iterable, you can actually make use of a lot of toolings that are on it and also with different collections like sets and other data structures. So I think those are pretty handy. And yeah, testing, so we're missing some tests. So let's talk about testing. A pretty short one, but it's pretty necessary in the refactoring process because if you don't write them, maybe it's too late already. So the first part, I would recommend writing integration or end-to-end tests for the code to be refactored. And that ties back to the one of the slides we talked about, right? Refactoring is changing the design of your code and not really the end result or the functionality of your program. So supposedly, if you write an integration test, they should pass throughout the whole process of your refactoring. And that's also your compass to make sure that you're doing the right thing during refactoring and not deviate too far away from what the program is supposed to do. So it's kind of like a check for myself. And of course, you're having fun doing all the refactoring. And after that, we can write some unit tests for the refactored code to make sure that the code is actually correct. So this kind of helps you gear you towards the right direction in the refactoring process. And I think we shouldn't be lazy with the tests. It's a very important part of the refactoring process. So cool. And so if you think that you're so sold about the refactoring idea, like tackling cold smells, how do we sell this to the company? How do we talk to your product manager about this? How do we convince other people to also jump on the same boat? The secret weapon is code reviews. So I remember when I first joined Yelp, I don't know who told me, but one of the people who told me, I don't know, like a Boy Scout rule is you should leave the code cleaner than you found it. And I think that was very valuable advice. Every time I'm trying to push some code, even if I'm not trying to reinvent the wheel or try to refactor to a very deep extent, I try to make it a little bit better. Say I see a variable that is not named correctly, maybe I can do a little change. I see a test that is not well covering the code, then I add another test to make sure all cases are covered. So things like that. It's a culture that we need to cultivate in the company to make this happen. So one thing that we can do is also to encourage refactoring when people are adding code and especially fixing bugs, right? Because something happened to the code and refactoring can probably help with that. There are also other things like you can write code review guidelines to tell the code reviewers what to look for, especially when doing code reviews. So these are also pretty good advice that I've heard. And I think the harder part is coming with the product manager, because they definitely have, usually they have a different agenda than engineers because they really want the product to ship fast and they care a little bit less on the engineering integrity or the engineering quality. One way I would go about that is to break down the tasks and really take maintenance into account. So from my personal experience, as I told you, right, I'm working on the legacy code base and yeah, I have the hardest time trying to understand what is happening there and afterwards I have to spend a lot of time to maintain this code base and it actually takes a lot of time that in retrospect maybe if we have rewritten some of it, if we have refactored more of it, we can actually save a lot of time in squashing the bugs afterwards and in the maintenance part of it. So a rule of thumb that I tell myself is four weeks, four with refactoring with the maintenance effort and otherwise six weeks because we have to account for the times that we need to get ourselves into the not-so-well-done coder again. But yeah, if all things fail, maybe one thing we can do is to abstract out the implementation detail and as a good engineer, you can adjust your estimates to include refactoring and also the test and just say this feature takes X but that's the last result. Yeah, so far we've gone through mostly the manual work of doing refactoring but there are some things that can be automated, not all of them. So I think we have some open source tools or things that we use. One of it is called Undead. So that's based on high parsing. It can do massive find and replace. So that's something that's helpful because if you want to deprecate a function or deprecate a class, deprecate a package, you can do a massive find and replace. And on the same note, we also use a debt tracker called branch debt and for some of the code reviews, we actually include that into the code review itself so that gives people more pressure. But both the reviewer and also person who writes the code, how much tech that they're introducing and the new code that they're deploying. Some of the example metrics that this tool looks at, including how much no QA text you put in in the code, how many deprecated functions are you using or also how many lines are you adding to the monolith, y'all main, because we want to move out of it eventually. Yeah, so these are some things that can give yourself more pressure to really deal with it now instead of procrastinating it to later or another project that you're working on. Yeah, some of the takeaways just to wrap it up. We talk about code smells, why is it important? How do we use the technique of refactoring to get rid of them? And also some of the tips to bring that to your company if you're sold about the idea of refactoring and code smell. Most important of all, if you want to work for a company that cares about code quality, that lets you spend time to do refactoring work, we're hiring, we have offices in Hamburg, that's where I work, London and also San Francisco. So we also have a booth down there if you're interested and want to talk to us, we're there. Yeah, thank you. Before the questions, let me remind you, actual questions. So who has one? Thank you for your talk. So one of the things that wonders me from some time is, because many of these tips looks like, when you know them, they look like, yeah, great idea. But are people likely to just make up them by themselves, or do they rather need to learn them? Do you have an experience, like, did you see junior developers, for example, looking at such bad code? And do they know what to do with it, or rather... So you mean how to acquire the skill of practicing your code notes to spot out what is good in a code and what is bad? Yeah, so I'm just wondering, what are your thoughts about it? Is it like people need to learn it, or rather, they just see what's bad? Well, I think... So there is the first gatekeeper, which is the code review, right, where, I guess, more experienced engineers tend to tell newer or less experienced engineers what they have learned throughout their times, but I'm a proponent of, you know, you crash and you burn and you learn. So, you know, next time you'll know, because you see what didn't work. Any more questions? Why didn't you see it closer? Okay, so my question is related to you say that you should leave your code in a better scene than you found it. Some could argue that goes against having small reviews, because now the reviewer has to see 10 pages of the factor instead of just the actual value added, so to say. So one of the practice I usually use is, I see the problem, and I usually tackle it in another code review, just so, because it's a different context, but I should, I would recommend following it up immediately, otherwise it will lost in the Jira, Jira forest jungle. Or, yeah, or sometimes if you identify these to be good new hire tickets, that can also be the case. Anything else? Thank you, Yany. What's it? What's the top level? Come on. Go ahead.