 Hallå? Hallå! Thank you everyone for crawling out of your game of thrones bubbles tonight to come here and listen to some Ruby talks. I am Ted of House Johansson, but you can find me in most places online as Drenmie. Until recently I used to work for Ruby on Race agency called Tinkerbox. It's a really awesome place and they are currently hiring senior developers. So if you want to work for an awesome agency, you can go to the website and drop them a message. Currently I'm working as a contractor with a product company called Engaged Rocket. They are also hiring senior developers. So if you want to work for an awesome product company, you can go to their website and drop them a message. And I'm also on the Rubocop core team. We are hiring for all positions and we offer a monthly salary of 10 internet points. So if you're interested you can drop me a message after the talk as well. So the title of this talk is internal affairs and basically we're going to look at how you can write your own cop and edit into Rubocop. But before we start I would like to do a show of hands. How many of you use Rubocop on a daily basis? All right, how many of you use it for work? Right. So I'm always interested in hearing ways that people are using Rubocop. I'm also open to listening to your frustrations. If there are some things that are unclear or offering tips on how to configure Rubocop to fit better into your project. I realized Rubocop has nearly 16 million downloads on Rubygems, but there are very few resources on how to actually install and configure and use Rubocop effectively. So my hope is that giving some of these talks will help alleviate that as well. So the goal of this talk is to learn enough about the cop API to be able to implement custom cop for your own project. But the first question we need to answer is why would you want to do this? Rubocop by default will help you check for stylistic things so you can configure it to enforce certain naming of things, certain layouts in terms of indentation, what goes on, which lines and so on. And style related cops. It also has some performance cops and lint cops and cops of all sorts. So this work on a very low level to help you maintain a consistent style throughout your code base so that, for example, when you do code reviews, you can focus on the higher level design issues in the pull request rather than having to nitpick about style and ending up in some endless discussions about whether you should use single quotes or double quotes. Recently, however, in Rubocop itself, we added a cop department called internal affairs. And this department, you can't use it if you're using Rubocop. You can only use it on Rubocop itself. And this department has some cops that gives you useful hints about the internal API of Rubocop. If you're using certain methods with certain arguments, it can prompt you to do that in better ways. So essentially this is one level higher than the stylistic inspections where you almost have a sort of a primitive code review while you're writing the code for the things that there are cops for. So we've sort of codified some of the things that we keep commenting on in the pull requests. So if you find yourself commenting all the time to different people or you should probably use this method instead of that one. This is the default argument, so you don't need to provide it things like that. You can codify that into Rubocop to eliminate it from the code reviews and also give a shorter feedback loop to the people developing the code base. And this talk comes with a very big disclaimer, which is that this is not formalized public API of Rubocop. So we will be relying on some implicit behavior in the Rubocop code base to put our custom cop into the cop registry. And this comes with some limitations. The biggest one is that you can't add configuration options for your custom cop. This is not the biggest problem because if you're using it in one project, then you probably don't want to need the ability to customize it. And if you do, you can probably change it in the code itself. But it also prevents you from excluding certain directories, for example. So if you only wanted a custom cop that inspected files in the models directory, for example, you won't be able to do that. It will just inspect everything, all the files, which if you're unfortunate, can lead to a lot of false positives. I am however planning to formalized this API and make it available to people. Partly, because I think it's super useful. Since we started using the internal affairs cops in Rubocop, it has really improved the experience for contributors as well, because they can have some confidence before they submit the PR that the code conforms to the standards. Partly, because some cops are the result of companies doing this. They go and build their own Rubocop cop. And then they come and tell us about it in our GitHub repo, which sometimes leads to us porting that cop in to Rubocop core, which is super awesome. So I'm going to talk a bit about the inspection loop in Rubocop. This looks pretty daunting, but we're actually mostly going to focus on point number five. We do, however, need to briefly cover point 124, because it will aid the understanding of what the cop actually does. So the first step when we do any static analysis is to parse Rubocop into an abstract syntax tree, which is an unambiguous representation of the code that you have at hand. And if you don't know what an abstract syntax tree is, just think of it as a tree of nodes, and that is pretty much all you need to know. And Rubocop uses a gem called parser for this. So it is a hard dependency of Rubocop. Parsers parser also comes with some convenience tools, command line tools that we will be looking at in the talk. The second step, once we have the abstract syntax tree is to just walk the tree. So we basically iterate over the tree recursively. And for every node, we will emit a callback. So we will call a method with a certain name depending on the type of the node. And this is done by a class called a commissioner, the police commissioner, which delegates the work of actually inspecting to the cops. So each cop inspects for one single type of offense only. And finally, the cops are allowed to inspect the code that is emitted. So I'm going to walk through a simple example of how this happens. So I'm taking an example active record isch query where to the user constant, we send the message active and then we send the message where with hash as the argument. Now I'm going to use the command line tool provided by parser. Ruby dash pars to show you the abstract syntax tree that is the result of this this code. And you can use it with the flag dash E, which means you can pass it an expression directly from the command line. And this allows you to really quickly inspect the abstract syntax trees of different snippets of Ruby code. And it's always useful when writing these cops or working with the abstract syntax tree to have the abstract syntax tree available. So maybe you can copy paste it into a comment in your code or something. So the resulting abstract syntax tree looks like this. It contains a total of seven nodes. You can see that by the fact that there are seven lines. So each line begins with a node and you can see the next thing by looking at the indentation. So you can see that there's a single root node and it's a send node and it is the node that sends the message where and the first child of the send node is another send node. So that is the receiver, which is the message active. And the receiver of that send node is the constant user. The second child of the other send node is the method name, which is where and last child is the arguments, which is a hash that contains a pair where the key is a symbol and the value is a string. So if we look a bit at the code itself, the root node ends up being this one and its first child is this one and the first child of that one is that one. And then that's branch of the tree ends. And the second child of the root node is the hash here. Which in turn has one child, which is a pair, and that pair has two children, the key and the value, which is a symbol and a string. So this is a relatively common node pattern. And here is a list of the callbacks that will be called when the commissioner traverses this abstract syntax tree. It's just a depth first traversal, so you can basically read it from the top till the bottom. It emits on send, on send, on const, on hash, on pair, on sim, and on string. And the way we implement the cops for this is we just define the methods on the cop. So say that this cop is interested with method sense and it's interested with method sense that have blocks. We might define the methods on send and on block. And the commissioner will automatically pass every single send node in the code that you're inspecting, which is usually your entire code base to this method. And inside here, you basically decide, do I want to register an offense or do I not want to register an offense? And what will be primarily concerned with in this talk is the API for this cop. You can also see that the entire node is passed to the method. And that node actually has access to its children. So you're actually getting the entire sub tree in the abstract syntax tree of the node that you are passing in. So rather quickly, we covered the first four steps. We passed the code in to an abstract syntax tree using parser. We traversed it and we sent a callback for each of the nodes. And now we can focus on the part that this talk is really about, which is how to implement the actual cop itself, how to inspect the part of the abstract syntax tree that we end up with inside the cop. So I'm going to do some live coding. The bad news is I recently lost all my stickers in an apple care incident and that's where all my programming module was stored. Eh, but I've been praying to the demo gods that I will not have Java update notification during this demo. So let's get started. I can just show you the Ruby parse interface. So say that we have a single method called full and we Ruby parse it. We get some warnings here and those are unavoidable, unfortunately. And the actual output is here. So we get a, do I need to make this bigger? We get a single send node. It has no receiver, so it gets near here. The method name is full and it doesn't have any argument, so the node just ends there. I can keep that up. I have set up a simple project that we're going to use. Unfortunately, I can't zoom in on the left pane for some reason, but I'll walk you through what the project is. So I have a gem file. My gem file contains Rubocop and I'm using the master branch, so I hope no one puts any code there that is breaking just now. And I'm also using RSpec, so we can write some tests for the cop that we're going to implement. Of course, if you use this in your own project, you will have a lot of other stuff here, maybe, or a real stack or something. I have the Ruby version, nothing special. I have the dot Rubocop jammel configuration file. And the only thing I'm doing is in here is I'm requiring my custom cop from the directory where I placed it. And I placed it in a directory called ext and Rubocop. And I have a corresponding spec file for the cop as well. Lastly, I have the spec helper. And I would like to point you to a few things here. The first is that I am requiring Rubocop in the spec helper, which we will need to do. I'm also requiring the RSpec support files, which for one will give us this module Rubocop RSpec expect offence, which is a small assertion library that allows us to write very declarative tests for Rubocop cops. And we're going to look at what that looks like in just a minute. Other than that, my project doesn't contain anything. So what we're going to do here is we're going to implement a cop and we are going to call it a deprecation cop. And we're going to check for usages of the method foo and we're going to suggest to the user that they use the method bar instead. So I'm going to start by writing a test for this cop. So I just give it some description. Now I'm going to use the expect offence API. I'm going to pass it string. And I'm just going to type in the code I want to inspect. In this case just the method foo. And now what this expect offence methods allows me to do is it will allow me to annotates the expectation with the highlight. So I want the cop to highlight the foo method. And I can annotate the message. So I want the message to be use bar instead of foo. So we're going to try to run our tests. And you can see now we have one failing test, which is what we expected because we haven't implemented anything yet. So we expected to have foo annotated with use bar instead of foo, but we just got foo back. This is the actual file where we're going to implement the cop. Now you need to put the cop in the correct namespace. The outermost module is called Rubocop capital C. The next module is called cop. The third module you can name it anything you want. This is the department of the cop. So by default there is the naming department. There is the layout department. And in this case I added a new department called deprecation. I added a short description what the cop does. Because if I don't Rubocop will complain about that and you won't be able to see our own complaints. Lastly I just name the class whatever I want to name it. This will be the name of the cop and I inherit from the class cop. Now there's a bit of an unfortunate naming conflict here in that the module where the cops live is called cop and then there's also a class cop. And this is a bit of a legacy that we can't really get rid of just yet. The other unfortunate thing about this is just doing this will register your cop into the cop registry. But nowhere are we telling the code to register the cop into the cop registry. So there is some implicit things going on here that when you inherit from cop it somehow adds it into the cop registry which we should probably stop doing. So if you remember from earlier we need to listen for some callback. In this case we are checking for a message send. So we're going to define on send and we're going to get the send node in as an argument. This method will be called by the commissioner as soon as it traverses send node and it will pass us the entire subtree of the abstract syntax tree. In our case we want to return early unless the method that's being called is full. Every node that you get into the callback is generally decorated with certain methods which is where this method method is coming from. And if your editor allows you can go and check that out inside the gem. So I'm actually in Rubik of source code now and I can look at the stuff that is available on send nodes. So this is how you can figure out how to interact with the nodes themselves. If we reach all the way here we're going to add an offense to the node. So now for every send node that we encounter in any inspected file we check if the method is full and if it is we will add an offense to it and we'll get another error message now which is good. It's not the same error message as before. It's telling us we need to add a message to the cop. And I've already written the message in the test file. I'm going to go ahead and take that and by convention the message goes inside a constant called MSG and this is one of the things that is almost impossible to know unless you worked with Rubik up for a while and this is generally why we want to have the internal affairs cops in the first place. So firstly it will look for a method named message. If it can't find it it will look for a constant called MSG and if it doesn't find either of them you will get an error. So we made our test pass which is a good sign. But our cop is quite primitive because it only checks for full basically in any context and without arguments. So we're going to gradually change this cop in non meaningful ways to see how we can make it more specific and introduce some parts of the cop API as we go along. So next we're going to try to register an offense only when full is passed to a certain receiver. OK so this is slightly different and in this case we don't want an offense in case we just call with an implicit receiver. So there's another matcher expect no offenses. And of course we don't annotate it with messages or highlights because it doesn't have any. So there are a few problems here. The first test is failing because we didn't expect a message if the receiver is not buzz. So we're going to fix that first. Now one of the downsides of the abstract syntax tree is we don't really know what types of nodes we have. So we generally don't know what methods are available. Så we need to account for a lot of cases. We need to check that there is actually a receiver. We need to check that that receiver is sent type. And this is a bit unfortunate because you end up with a lot of this and we're going to look at how we can alleviate that soon. But first I'm going to check if the test is passing. So now we only have a single test failure and it's complaining about the highlights. And I think it's complaining because we are highlighting the wrong part of the node. So the second argument to add offense. You can select which part of the node you want to highlight. So as we saw the send node has itself its receiver and its arguments. Those are the possible children and by default it will highlight everything. So it will highlight the receiver and the arguments as well. But we want to be more specific and we want to highlight only the particular method that is offensive. Now we want to get rid of this stuff. Because this is pre fragile and also it doesn't read very well. So I'm going to introduce you to an API which is called node matchers. And you can think of them as regular expressions for abstract syntax trees. And we can define one by using the macro def node matcher. And the first argument is the name. Because I don't really know what to name it. I'm just gonna name it bus full question mark. And then it takes a string because it's a pattern. And I'm going to use the Ruby parse command line tool to give me the abstract syntax tree. And this is super useful when doing node matchers. You can just copy the output. Make sure you get all of it. And spaces make no difference in the pattern. And now this will define a method for you to which you can pass a node. And if the node matches the pattern, it will return true or you can yield to a block as well. So we're going to try that. This pattern looks like it's covering everything. It's covering the receiver being buss. And it's covering the method itself being full. So I'm gonna remove all the old code. I'm gonna use the new method and I'm gonna use a block and if nothing went wrong, I used the wrong name. Right, so it looks like our tests are still passing. So this pattern was equivalent to all that Ruby code that we wrote before. And this is very good partly because it allows you to declaratively show what kind of abstract syntax tree you're trying to match. But it's also cool because you can do stuff that you can do in regular expressions like wildcards and captures. So you can actually capture nodes or parts of nodes and you can wildcard them as well, which we're going to look at now. Because we're going to make our cop even more complicated. We want to catch all the cases where full is sent to buss, regardless of what buss is sent to. So for example, if I send that to a constant, I still want there to be a Rubikop offense on this part. So now our tests are failing again. It did not register an offense there. It did nothing. And if we look at our node pattern. The offender is here. This is the receiver of the buss method. And now right now we're explicitly saying it should be new that it should be sent to nothing. So we're going to use a wildcard underscore underscore stands for any node, but there has to be one. And that made our test pass again. Now we're not entirely there yet. Because we also want to match in the case where we're passing arguments to full. And let's check our tests. We have a failing test case again, which shows that our cop is not taking into account calls to full that have arguments. So we need to use a wildcard again. But we can't use underscore here. Because if we use underscore, it will fail another test. And that is because underscore matches something always. So then it stops matching full without arguments. So you need to use another wildcard, which is the ellipsis, which matches zero or any number of parts. And that makes the tests with the arguments pass as well. So we're going to look at one final thing in the cop API, which is one of the coolest features of Google cop, if you ask me, which is the ability to automatically correct code that is producing an offense. Now this is not always possible, but in a surprising amount of scenarios, it's possible. And it looks like in our scenario, we just want to replace one method call with another. So it should definitely be possible. So we're going to start with writing a test for that. It auto corrects full to bar. Unfortunately, we don't have the nice declarative API for auto corrects yet. So we need to put in some manual work. I think this is the method name. We're gonna see soon. So we put in some offensive code and after auto correcting asserts that it equals the code that we want it to be instead, which is bar. And assuming I got the method names right, we should have a failing auto correct test. So it we expected to get the bus dot bar after auto correcting, but we're getting bus dot full, which is the original code. And this makes sense because we have not implemented any auto correct. The way you implement auto correct is you define a method, not surprisingly named auto correct auto correct will be passed a node by the commissioner. So what's node is being passed to auto corrects? Well, it's it's not necessarily the node that was sent to the original callback. It is the node that you register the offense on. And this is a common source of confusion, I think. But it's actually really useful because you get only the part of the syntax tree that is producing an offense. So in our case, it will be the part of the syntax tree that is the full method call. The way you implement auto correct is you have the method return a lambda. This lambda will be called and it will be passed a special object called the corrector. The corrector has a bunch of methods. Like replace. And here is the part that is not always nice to work with. You see this lock keyword here. This is actually some source maps coming from the parser gem. Unfortunately, these source maps are extremely spars. And I think the maintainer is not. He's not adding any more source maps or adding any more locations onto the source maps. But basically it allows you to target ranges of code inside the abstract syntax tree. Så if you wanted to target a particular parentheses or a particular method name, a particular quotation mark, you can do that. But you might need to work around it a bit using this lock source maps and they're available in the parser documentation. For a method send selector, as we saw when we used it as the site of the offense, is the actual method name. So in our case, we want to replace the selector with the actual method we want, which is bar. And now we have a cop that is auto correcting our code for us. So if we recap and have a high level look at this cop, this is pretty much the anatomy of any cop in Rubikop. You have the message. You have the callback or multiple callbacks in the case of a lot of the Rubocop cops. You will add an offense to something if it's considered offensive. You can have the node matchers that will help you match things in the abstract syntax tree. And usually you will have an auto correct method. And if you go into the actual Rubocop source code and you look at the cops, you won't find many other things except this. That said, most of the cops have a lot of private methods to help them work with the abstract syntax tree or construct really useful messages. So one thing we didn't cover here was you can match things in your node matcher and extract them and format your message to give a very dynamic offense message to the user. So it's easier for them to figure out what they need to do, especially in the case where there's no auto correct for the cop. So one neat side effect of this is now that you all know how to implement a Rubocop cop. You can, if you want, also come and help out in the Rubocop GitHub repository. We're always in need of more help. Not only programmers, if you're interested in improving the documentation, writing guide, guides how to use Rubocop, how to develop Rubocop, improve the messages for the offenses. Then that's also super helpful. So if you're interested in contributing, you can come and see me after the talk or you can always ping me in the Ruby SG Slack chat. So that was everything for the talk. If you have questions, we can take them now. Due to time, we have time constraints, time for one or two questions. Ja, Peter. So one thing that happens when you start getting pulled into the rabbit hole is that there's actually an extended community around parser, around Rubocop and a bunch of other static analysis tools. And you inevitably get pulled into these projects as well because you need to patch something that is not working, that your project is depending on, or actually this expect offense thing comes from the maintainer of Rubocop RSpec. So he created it for his own project and then we ported it in. But there is really a lot you can do with static analysis. That said, there are some very severe limitations. I'm going on a bit of a tangent, but I should mention this. There is a very severe limitation in that you're trying to statically analyze a dynamic language, which means you can never safely assume what is going to be inside a variable. And because Ruby has things like duck typing, you can't even look at the methods being sent to that thing and assume that it is of a certain type. Ja, sorry for not answering your question directly. I think what happened to me is I started contributing to a lot of the same kinds of projects. That's a great question. There is another method that is not a callback that is sent on each node. There is one called investigate that is sent on each file and it will pass the undigested source. And then we can do some more primitive checks like, oh, how many spaces to be used? Are there tabs in here? Is there a trailing space in any of the lines? Good question. Thank you, Ted. Thank you. Our next speaker is Dragosh.