 Yeah, so my name is Kirsten Troff and my talk today is called building a chat-ups framework. A bit about myself, so my name is Kier, I work at the developer acceleration team at Shopify. I'll talk more about this team in my talk. I live in Canada, it's where Shopify is based. And we may probably have worked together on some open source projects like Rails, Capistrano, Rudebench. And that's me with the cat. So let's start with chat-ups. Please raise your hand if you heard about that. Cool. So with chat-ups you can move your technical and business operations into chat, into a conversation with your team. And this term was first introduced by GitHub. And they first started to talk about that on conferences. They first made a chat-ups framework. And yeah, it also connected to a term, a conversation-driven development. As you probably heard, there is test-driven development, behavior-driven development, and many other kind of driven developments. And with chat-ups and conversation-driven development, you can bring all of that into a chat with your team. And a bit about Shopify. We have quite a lot of developers, more than 300. Yeah, so if you don't know about Shopify, it's an e-commerce platform for a small and medium business. And when you have so many developers, you need to build tools for those developers. So the developers could be productive. And my team, where I work, is called developer acceleration. And we built internal tools for our developers to make their productivity better. And chat-ups and all that kind of automation is one of the things that developer acceleration team is working on. So for you to have a better idea how all of that looks, let's start with an example. So in Shopify, every developer is responsible for shipping his or her own features. That means we don't have any kind of release engineers who push comments of other people. So if you made a feature, you're responsible to deploy this feature to see that it works. And if it doesn't work, to roll it back or do something about that. So imagine you made a pull request. You are about to merge it. You merge it if everything is okay with the CI. And in a few minutes, you get a message from a chatbot that your feature, your comment in the master branch is ready to be shipped. And you tell about, okay, let's ship it. And in the group channel in Slack, we use Slack, everyone will see that you're deploying something. What comments do you deploy? And also the result of this deployment. So it's usually succeeded, but it can also fail. Like on this slide. And this is how deploy work. So right after the deploy or after you committed something, sometimes it happens that we have an accident. For instance, if signup is down, for example, someone comes to this, the same chat and starts an incident. An incident is a special procedure to manage some kind of bad thing that happens in production. And it includes actions like updating status by page and investigating what's wrong. And we also have a chat command for all of that. Another example is monitoring the most heavy SQL queries or the most heavy customers who bring a lot of load on our service. And another example of automation is creating new repositories. So if you work in a small company, in a small team, you probably have a CTO or someone who is admin in your GitHub organization who can create a new repo for you. But if you have hundreds of people, there can be no special person who has a responsibility to create a new repo for someone. And another aspect is that as a developer you don't even know who to ask to create a repo for you. We have special events called hack days where we have a lot of internal hackathons at Shopify. And on these days we create 100 new repositories during a couple of days. So this is an action that should be automated as well. And speaking about chat ops, it's also about the interface. If we would take another path, we would probably create a web interface in Bootstrap or something else and to give developers all actions, to give developers ability to trigger all those actions and script that automate it. But with chat ops it's just another interface which is chat, which has a lot of advantages. For example, your team will see what's happening and what actions are you taking to do something. Now we come to the next part of my talk, which is about frameworks. About chat ops frameworks that exist and about our own kind of framework that we wrote and reasons why we wrote it. The first framework is called Hubot. It's the framework invented by GitHub that I mentioned. Hubot is written in coffee script, which means it's in JavaScript in Node.js. And as a Ruby developers you're probably, maybe some of you don't like JavaScript. And there are many Ruby developers who don't like JavaScript. But in case of chat ops framework, JavaScript may be a good thing because it brings a lot of asynchronous support to your code, which is important in case of chat ops framework because comments have to be asynchronous. And one heavy comment shouldn't block comments from other people. Another framework is called Lita. It's written in Ruby and it's very well extendable. It's a few years old, a very good framework. And it's fair to mention that both of these frameworks have different adapters to every chat provider. We use Slack, so Slack is the only adapter we use. But if you use some very rare chat solution, you can find existing adapter or write your own adapter. Let's see how chat scripts and how DSL looks like. So this is the Lita DSL. You just define a small Ruby class which has a macro code root. In this macro you describe a regular expression with the comment that you'd like to trigger. So with this handler, if I go to Slack and write echo something, the bot will catch this phrase and reply with the second word that comes after echo. And the Hubot syntax is very similar to Lita. You also define the regular expression that the bot should wait for and send a reply. If we take a closer look, we will see that both of these DSLs are based on regular expressions. And you should write a regular expression to tell the bot what comments to detect. And why regular expressions? It is the easiest way to tell the bot what comment to watch. And this approach has a few disadvantages. Like it cannot detect typos. It cannot reply with this comment was not found, maybe meant something else. It also cannot do input validation. Like if the comment was right, but the argument was wrong. And that argument may have not matched by the regular expression and this comment won't be found. And having regular expressions in your chatbot means that all developers should be really good in regular expressions. And it's always easy to make a mistake and find a regular expression that will conflict with a different script regular expression. So we thought that maybe we could do something else without regular expressions. And here is an example. The first option to write the common syntax with a regular expression. And the second one is to write it with some kind of pattern language. And with echo, the difference is not that big. But with a bigger comment like github add user name to team name, the regular expression becomes quite long. And it's quite easy to make a mistake there, as I said. So we thought that maybe we can improve that experience of writing chat handlers. And what we wanted to have from that solution. We want to be friendly for both developers and the user. By being friendly for developer, it means that developer wouldn't need to write a regular expression. And friendly for user means that we would suggest right comment if user made a mistake. We also have a lot of Ruby infrastructure code written in Ruby at Shopify. So we decided that we want to stick with Ruby. After we tried both Lita and Hubot in production. And we wanted simpler and more powerful GSL that would provide better argument support. So our solution, we decided to make it on top of Lita with a custom common router and custom DSL. And this is how this DSL looks like. First of all, it's very similar to Lita. But instead of defining the regular expression, here you define a special pattern. And you also define a help. And right after this pattern is matched, it's dispatched to Ruby method with a keyword argument. And in this case, it's very simple handler. It will reply with the same command. So let's take a look on a bit more complex handler. It has two arguments. One of them is, yeah, this handler is for displaying some chart from your lake. The first variable is application name. And the second is format. A format is Yenom field. It can detect daily or hourly value and help. And it should be converted into a calling of Ruby method, which is kind of simple. So this pattern would match all the following user inputs. It can be my app. So hourly is the default value for the format variable. You can override it here and here. And you can also define it in the explicit way, which is useful when you have more arguments. And maybe you don't remember the order of them. So we also wanted to have the explicit format. And to be able to work without regular expressions, we tokenize this pattern with different kind of tokens. First, to our static tokens. So the user input should start with a neural egg and chart. And then there is a simple variable. And then there is a variable with default. It looks like this. So this command consists of four tokens. Our next goal is to convert the user input of neural egg chart my app daily into calling a Ruby method. Actually, yeah, instantiating the neural egg handler and calling that method with those keyword arguments. And this may seem as a task, as a difficult task, until we discovered the class in Ruby standard library, which is called string scanner. Yeah, it's a class in a Ruby standard library. Please raise your hand if you heard about that class. Yeah, so not too many people. So string scanner works as a scanner. I'll have an example now. So you initiate an object with a string. In this case, string is the user input. And there is a method called scan. And you give just the token to that scan command. And yeah, so it scans. And if this user input would start from something else, it wouldn't scan the string at all. So if it would start with GitHub or some other command, it wouldn't scan. Then we have the next token, which is chart, static token. It is also scanned so we can go further. Then we scan for a variable. So it's scanned. And then the next variable. And we get the values for those variables. So it wouldn't be honest. So it's not very honest to say that we completely got rid of regular expressions. But the end developer of a handler doesn't have to write a regular expression. But we use some regular expressions under the hood. So more than that, we have type coercion. So when defining a handler, you can declare the type of variable. For instance, the target. This is the command used to tell to the infrastructure that some server will go to downtime. That means that maybe we're going to restart the server or repair it somehow. And there are two arguments. One of them is a target, which is a Chef node. It should be a valid Chef node address. And then duration. Duration can be one minute or one second or one hour or just any duration. And we also declare that. So the first one types Chef node, second types duration. And when the Ruby method receives... So when we call this Ruby method inside this method, you'll have duration as active support duration. And the target will be a valid Chef node. So we can be sure in this method that both of these arguments are valid. So this command will be valid for the first input. But for the second input, it won't be valid. And it will return an error. And this method won't be called at all. So when writing code in that method, you will be sure that you get the right input. After shipping this DSL to our developers, many developers could write their own chat handlers to automate their workflows. And we got to the number of more than 200 bot scripts handlers and more than 600 command invocations on a busy day in Slack. So this became a part of infrastructure that we had to scale. As I mentioned, we based our framework on Lita. So it was just the next on top of Lita. So Lita is written in Ruby. That means that it didn't have any support for asynchronous workflow, which meant that if you ask a bot for some command that takes a minute, it's working on commands from other users. So the bot was blocked for that minute, and it couldn't accept commands from other users, which was super bad, especially in the scale. So we decided that we will, as one option, we can make a new thread on every command to not block receiving new commands from Slack. And Ruby threads are not so good in most kind of operations, but in our case, when most of the handlers for chatbot were only making HTTP queries, or they were invoking some other kinds of systems, they didn't do any calculations on the bot side, so they just requested data from other systems. So in this case, Ruby threads were quite efficient, and this approach helped. But we thought that maybe there is some other approach, and we went with the master process and Redis, and when the master process received a command from Slack, it pushed that command to Redis, and we had a pool of workers, and we could have multiple machines that work as workers, so we could scale that horizontally, and it works the same as a sidekick or delay job worker queue. Having that approach, we could have active and passive instances of the bot server running, and Slack would make a callback to a load balancer with the message for a bot, and then the load balancer could determine which machine to route the message, and that comes to the availability problem. So if you remember, at the beginning of this year, GitHub was down for like three or four hours. That was a pretty big downtime, and one of the reasons for such a long downtime was that GitHub is heavily based on their chat ops scripts, but chat ops was down as well because of some network failure, so they couldn't use any of the chat ops scripts to recover the system because chat ops setup was down as well. And we know that problem at Shopify, we also had this problem when our bot was unavailable or Slack was down, and in this case we couldn't do anything. So we decided that we'll build a special offline or rescue mode in our bot, so if you have this bot locally on your laptop, you just launch it with a special beam command, and you will have exactly the same interface in your command line as you would have interface for a bot in chat, but it works even if Slack or something else is down. Summary, this is very important. So probably you learned a bit about chat ops and how it can automate things, and you thought, okay, cool, I'm going to try that in my team, in my company. But I would like to say that it only makes sense if you have a very big team, because when I worked in smaller teams in smaller companies, I would say that we didn't need all of that just because it wasn't on such scale that we needed to automate things with chat ops. And in this case it's really easier to come to your CTO and ask to create a new repo for you instead of bringing more code and more infrastructure to keep the chat ops running. So if you're interested in working on such a big scale systems, you're welcome to check Shopify careers. And I have mentioned a lot of projects in Ruby, a lot of gems and some other things. So you can go to my Twitter. And the last tweet is a gist with all links that I mentioned today. I wish you're welcome to check. Thank you. Thank you very much here. Any questions for him? How many of you use some form of bots on your favorite tool? My favorite slack bot is the flip table bot. So whatever you type in, it inverts it with a raging guy. So if there are no questions for Kier, round of applause again for him please.