 Thanks for coming. I know there's some other cool talks right now, but you're here, so that's awesome. Let's get started. You're here to learn about how to tame cogos. My name is Jason Sisk. I work at Groupon. I've been here for a couple of years. Work on predominantly Ruby Rails systems, backend development, et cetera. And I do not like onions. My name is Avi, and I've been at Groupon for about two years too. And Jason and I work on a team that does backend service, basically managing inventory. And I don't like fruits. So part of what we're gonna tell you today is a little bit of a history lesson about the early pain of Groupon having site outages, et cetera, due to Rails scaling. We wanna tell you about the story of the developers that actually handled those problems and some of the decisions that they made. So that's that. But we want to lead off with one important point. Boom, we don't have to pause for that long, and yeah. So, back around 2007, we were doing what all the other cool kids were doing. We were using a Rails monolith, and to some degree, still are. Rails 2 was a great framework. Who was using Rails 2? All right, you and us. Rails is a great framework. We all love Rails. That's why we're here. We still love Rails, and that's why we're here. But what's great about it is that it's great for agile teams. And for us, it was really simple. We could make some really quick decisions. We could iterate product very quickly. We could iterate new features. And we could do it with a small team of five to 10 devs. We had a single repository. We had a single test suite, and we had a single deploy process. Very simple. And most importantly, we had one shared conceptual understanding of the code base. When we wanted to make a change, we knew where to put it, and things are simple that way. Also, what was great was, and still is about Rails, that integrating components is really easy. The convention over configuration, model associations, all of that business, you can put together things very quickly, very easily. But we didn't come here to talk to you about Rails. We came here to tell you about Cobras, and how to tame them. At Groupon, we actually have a monolith, and we call it the primary web app. But Jason and I thought for the purposes of this talk, we'd come up with a more scientifically accurate name for it. Yeah, so centralized, omnipotent, big ass Rails application. Big ass. Okay. So we want to take you back to 2009 for just a minute. So Groupon was about two years old, give or take, and we were still kind of kicking into gear. People would come into the office in Chicago, wake up, open up new Relic, and they would see stuff like this. So as you can see, like in the middle of the night, it's great, everything's working really well. Soon as people woke up and started using it, damn people, our performance immediately started to drop. And then eight months later, we had about 30,000 requests per minute and everything was on fire. We blame Oprah. It's Oprah's fault. As you do. Oprah crashed Groupon. Oprah crashed Groupon not once, but at least twice. And also the gap crashed Groupon too. Actually the truth is Groupon crashed Groupon. We were not scaling properly. Bad Groupon. The Cobra was getting fatter and fatter. We were up two. Yeah, so we were up to, we started, we had like five to 50 devs. We started with about three to 500 commits per month. Slowly in a couple of years, as you can see, we were averaging about 2000 commits in a single month. We had a lot of developers developing a lot of things. This is all one Cobra. And you know, we started thinking about SOA at that point. It was already becoming really painful, but we looked at the Cobra directly in the eyes and it scared the shit out of us. We had a lot of scoping problems. And a lot of that had to do with model coupling. So one of the biggest things that was keeping us from extracting services early was as the code grew, you had a lot of sort of natural convention coupling that was happening in the models. It's a little bit of an oversimplified example here. But you have a, let's say you have, you're on the My Groupons page, you wanna look at all of the groupons that you bought and you wanna see all the titles for all of those. So when we go to render the interface, we wanna display all these deal titles. In the Cobra, you might find a set of dependent relationships that are somewhat like this, where you can see the cyclical dependencies. But building these types of associations was fairly commonplace, which was kind of bad in some ways. So in this case, you would instantiate a user, which would require a database lookup to the user's table, select star, and you would map over that user's orders to get all of the deal titles. In this case, there is a demeanor violation. Demeanor violations are bad. And it looks clean. I mean, it looks good. But what it does is couples are components. Here is an example of what I was talking about. You have a basically unnecessary table lookup to users. Now, if you're designing your applications well, you can avoid this right out of the gate, but Rails conventions don't encourage you to avoid this out of the gate. An active record DSL for advanced queries aren't something that people just tend to do by default, or at least they didn't in 2009. And I mean, things got a lot worse because hard code base and Cobra was just getting bigger and bigger. You can see here, it's almost two million lines of code at this point. And oh yeah, we have to stay up 100% of the time, so that's the problem, right? Also, the database is completely on fire. So yeah, we were in quite a pickle. It was painful. Testing sucked. I mean, we had to wait like 45 minutes for a build to run. You basically ran your tests and then figure out something else to do because you had to wait while your tests ran. And a lot of our release engineering devoted a lot of effort to making those tests run faster. Deploy's were terrible. Deploy process for the one Cobra was somewhere on the scale of three hours to deploy the application. This is a really bad development experience, especially as you start to have teams that split ownership. They want to iterate on features that matter to their team and they don't want to be held up by this gigantic monolithic application. And the deploy's only happening once a week. That really hurts a team's ability that maybe wants to do continuous deployment. It sucked. And I mean, development pace was increasing as you saw. And I mean, what's the best place to put the next line of code? As I heard in the talk earlier, it's the place that you're changing. Models got bloated and there's a lot of cruft. So all of these things were terrible. It was very painful. So we decided to move towards service extraction a little bit more seriously. If there's a big takeaway from this first section, we just want you to remember that Cobra's are great. They are great, until they aren't. So we needed to alleviate this pain immediately. We needed to get that code out of there. We needed a quick extraction. So we decided to extract a new service and build it on top of a current schema. We decided to start with the order service because it was causing a lot of database contention. We had a lot of people buying a lot of group bonds and a good problem to have, but it was bringing our database down. So we needed to get that code out of it. And also, another thing behind choosing orders to start is that it's gonna be a long-living model in our domain. We know that for sure. So to illustrate, this is what it looks like in the beginning. And this is what we're trying to accomplish. You have an orders, you have the Cobra, and then we're trying to have a separate orders code base, which will have its own database, but it continues to have a read-only access to the Cobra's database because we didn't focus on completely making the order service, stopping it from reaching back into the Cobra's database. And, I mean, the Cobra was really sneaky. It was really tough to find all the ways that with Rails callbacks and model associations, all the ways that the components were coupled. So we built some tools to make that easier. This is one of them, the service wall, as we call it. We're trying to, the main goal here is separating the concerns of orders within the application. So you start with having your services in a separate directory. Let's see a closer look of it. You have the order services in its own directory and you have its own app, its own lib, its own specs. The way that works is that in the environment.rb file, we iterated through these services and added them to the loadpads. So the application looks like it's just one big application, but for our purposes, the code was separate. So this is a small example of how a service wall works. You have this disable model access method that basically, if you specify the models that you wanna specify the service that you want to disable or deprecate, and it'll figure out the models of that service and add it to this do not touch list and basically raise these kinds of violations. So if you use the disable model access method, when you run your test, it'll put up this message saying, you don't have access to this method. When a deal is trying to access an order, we can figure that out just by running our tests. If you use the more friendly or deprecate service, deprecate model access method, then you can be more permissive and it'll just log it to a file. You can see that in development mode or you can have it on staging and that'll allow you to find all the places that you're having service wall infractions. You can't do this in production though, because it causes a serious performance hit. Oh yeah. So this is how you actually use the service wall. Use, at the top of your controller, use the method disable model access or deprecate model access, depending on what you wanna do, you tell it what service and it even lets you exempt some actions that you don't wanna raise violations on yet. That way you can comment out that action and tackle one action at a time, which endpoints are actually reaching over and causing that service wall infraction. But so in addition to the service wall, one other problem with this extraction approach is that because you necessarily fork the code, you get a lot of cruft leftover from the old domain. So you find yourself asking, teams find themselves asking very often, is this endpoint even used? Do we even care about this code anymore? So a small team of Groupon developers hacked together something called Route 66 that we use internally to track down cruft in both our old Cobra and our new Cobra. So it basically answers the question, are these endpoints used? I don't know if you can see this very well, but this is a little bit of a UI. But what we do is we analyze log files, we analyze Splunk logs to come up with which controller actions are being hit, what's the frequency? Is this a route that is hit once a week? You know, once a month and we can very aggressively de-croft using this tool as well. All right, so there's definitely a pros to this approach. Because you're focusing on just separating the models, I mean, just separating the code, you can quickly and not worry about spinning up a separate database schema, separate naming, all that. You just worry about separating the code and that focuses the extraction. It makes it easier to spin up endpoints. But the cons are you're still tied to that legacy database. Not such a bad thing if you really need to get it out of there. But because you're forking this code now and now it's being hit through endpoints, there's still a lot of cruft in the code base because a lot of these endpoints are not being used. So this was the first extraction pattern that we used at Groupon to get out of the original Cobra, the original Groupon Cobra. But teams sort of own their own tactics and there are other ways that they're doing it as well. One way that a service extraction is also happening is by building Greenfield services that use a message bus. Sometimes you just need to keep that legacy API running because there are a lot of client dependencies on it. There's a lot of dependencies on the structure of the data. But who likes doing Greenfield work in here? Raise your hand if you like Greenfield work, right? I should be all of you, right? No. Whatever. So it is possible to do Greenfield service extraction and we're doing this as well. So again, we have a similar, whoops, juggling between PowerPoint and Preview, similar type of situation. You have this Cobra and then we get to the scenario that we're trying to reach with the Greenfield extraction where you have, in this case, the red box represents all new code. There's a client gem that runs in the original Cobra, that runs in the Green Cobra. And when this service writes data to its DB, a message is sent that the Green Cobra consumes and sends over to its own data store, thus satisfying all of the legacy API requirements. And then what's notable about this is to keep everything in sync for service cutovers, rollouts, et cetera. There is a background sync worker that runs, that syncs a one way from the old database to the new database. There are pros and cons to this approach as well. Some of the better parts are that you can get rid of your legacy data quickly. Again, devs like Greenfield stuff, you like to design your own systems. You also get to minimize the cutover risk with your data sync, so you're not splitting the table and you have to have all of these API dependencies written on one hand so that when you break your database you don't have failures. So you can phase out your new endpoints and you can own the timing of when you build out new endpoint features. Again, some of the cons are that it is not trivial to build a synchronization worker and it is less trivial to build a validation engine for the data to make sure that you don't get it out of sync when you're pulling from the original source. And then there are race conditions involved in this as well. So Jason and I work on a team that manages inventory as I said earlier. One of the, looking a little bit further down the road, one of the things we needed to do was get, now we need to get vouchers out of the order service, another service extraction. And vouchers are actually the things that customers redeem. So a simplified example of what a voucher actually would look like, except that now we have an ID which is stored in our database. We have the price which is stored in a legacy database. And now Groupon's grown since orders. We now have an international platform, code base that serves many different countries. We have offices in Berlin, London, Chennai, Korea and many more places. But yeah, now we've got to make it, but our service responsibility is to make it seem like none of that matters. Anyone asking for voucher data needs to know all about your data. Our services need to be global now. So this is what our world looks like. And this is how our service needs to be built on top of that. What helped in managing these different sources of truth was this manager accessor pattern in our code base. Specifically, oh let me check if I need to say anything. Yeah, specifically, next slide please. This is how it helped our code base because in the controller you could just specify, you talk to this manager object and you say find me this voucher. In the manager, can you jump to that? All right, it's gonna look like a lot of code but let's go step by step. In the manager, that's where all the complexity lies. You have an accessor, the access is local data. You have a separate accessor and the accessors are just simply, all they do is persistence and finding data. So the accessors for the legacy database here, the Cobra accessor, you get that price information and then you have an international accessor that goes, it could be a database call or in our case, that's a H2D P call across the ocean. And then you bring all that together and wrap it in a model and have it return that back to the controller. All right, so definitely pros and cons to this approach. One of the things was it's easy to incorporate many different data sources. We called it a facade because it kind of hides all that but the back end of it is really more complex but you hide that complexity. That your accessors are bound to the schema changes so our Cobra accessor still has to know about the legacy schema and you can't really, making changes there is not trivial. And sometimes you can use that as a crutch so someone asks you, can you give me this piece of data about a voucher I really need it and you wanna expose it through? And once you're like, well I do have access to the database or I could just make a call and now you're serving that data and you're tied to serving that data in your API. But the important thing there is to be diligent and as soon as you start serving that data put a strategy together to actually own that data. Otherwise the complexity in the manager which is both a pro and a con will always be there. The purpose of the manager is that it hides that complexity but as you start owning more data it should become simpler. So these three extraction patterns that we've gone through are just a little bit of a little bit of what's going on. There are different service extraction patterns going on both at Groupon and probably in your worlds too. So again this is just an example of some of the ways we've chosen to do things. There are other interesting talks at RailsConf this week about this going on so be neat to check those out too if you wanna talk to us about them. But you should definitely consider letting your teams on the tactics if you're trying to make decisions about doing SOA because you might find some neat things that you didn't know about. Yeah, so I'm gonna stand over here because I feel like I'm just talking to these guys. So there's definitely a lot of things that we learned from doing these different service extractions. Like Jason said, there were a lot of other service extractions that happened at Groupon and continue to happen today. But taming a Cobra is serious business. I mean, like I always say, you'd be aggrin. You probably ain't gonna need it right now. But the tipping point on which you need to start going towards service oriented architecture isn't just black or white. It's more of an art than a science. But as soon as you start talking about service oriented architecture, once you start feeling the pains, you need to put together a strategy to accomplish that. Yeah, you don't wanna sit around and wait for Oprah to blow your sight up. But there's also the importance of allowing your domain to actually evolve. Models that you think are important in the beginning aren't gonna be important later on. And that's the big benefit of a Cobra is that it allows you to iterate quickly. Something else that we have also learned is that when you're going to service extraction, it's really important that you actually have a strategy. Know what you need to break apart. Know what you need to leave in the monolith. These are important things to consider. Know what the priorities are between those things. It's very tricky to just go about service extraction very scattershot and not really understanding your business model or what benefits you derive from extracting certain pieces over others. You should prefer the things that are clearly like their own thing, their own components, or things that are particular maintenance problems or represent some sort of legacy design or strange behavior. But the other important part of having a strategy is that you should expect the unexpected. Scope creep will bite you, and as these code bases get bigger, pulling out of them becomes a lot more of a tricky process than you might envision. Another thing that's important is that you think about your entire service stack, and you should know your business, and so you should know, or you should at least conceptualize how all those parts of your business are going to fit together. How does the data flow between them? What are the service agreements between those compartments? That's all important to know. You're going to need to be caching between services for load. You're going to need to be caching services for latency requirements. So you have to serve upstream to some kind of complex algorithm. That algorithm is going to need zero latency return from your service. You need to be thinking about all of these kinds of things when you're doing service extraction. And the way Jason is saying it is definitely makes it seem like it's one slide on our deck, but each of those topics could be a separate talk, and they are. So definitely there's a lot to learn in that domain. Right. Just in terms of actual topics, another thing you want to think about is messaging. Inner service messaging, when you're pulling these services apart, they do need to talk to each other. You should definitely think about what do those messages look like? What are their delivery SLAs? Do you guarantee that they're delivered? Do you guarantee the order that they're delivered in? What are the payloads look like? Think about all of this stuff. And you also need to concern yourself with authentication and authorization. These are important topics. I think there was a talk about this yesterday. There were two talks about this yesterday. But you should know what your users are doing. Your site's getting bigger. Your user base is getting more complicated. Know what they need access to. Know how they get into your services, how they get through your services, and know what they can do at each step of the way. And you need to create a supporting environment for services. We were lucky. We had entire teams devoted to building tools that make it easier to spin up services easily and a release engineering team that made it easier to deploy these services. All those became really easy for us. But in your company, you need to make sure that, or in your application, you need to make sure that you think about these things and devote tools and time to making those things simpler. Also, now it's time to start considering UUIDs. As soon as you start talking about service-oriented architecture, go to UUIDs from the start. This will immediately separate you from your database, and that's gonna be really important because you're gonna be moving data from one source of fruit to another. And you need to write code good. It's hard to, I mean, it's easy to say that, but it's hard to do. Think about the solid principles. Think about where things belong. Ask yourself, are my coupling these two components together, and is that useful enough that it's gonna cause me a lot of pain later in the future? So when you're writing your code good, you should be thinking about your models. Those models are gonna become your APIs. They're gonna become your service APIs. So consider your public methods. What are you putting in the public space of that model? Is it named well? Does it represent what your service should be doing? Make sure that while you're building up your cobras that your models are reflective of the way you intend for your service APIs to look like should you ever need to go down that road. And like I said earlier, avoid tangling those components together. Specifically in Rails, when you introduce associations you're kind of expanding that API that Jason was talking about. All those, now you're creating ways for developers to reach through these models and get data and they'll couple them together and make it harder for you to separate them. Test. Who's here, who are your tests? Anyone test? Not DHA. Nope, you didn't test anymore. You should be testing. You should be testing at high levels. Avoid the unit tests. If you can avoid the unit tests, especially because once you start doing service extraction you will break ass loads of unit tests. Make sure you write your high level tests first. Make sure you've got solid coverage on those high level end to end tests. Secondly, as you are doing service extraction it is not trivial to be spinning up other services quickly in order to test end to end but you should be thinking about how you might be doing that because otherwise you're going to be doing a lot of stubbing and that gets very painful and is error prone. I mean, when we talked to the developers who had to do some of the tougher service extraction they were like, I wish we had more integration specs because we're gonna be changing a lot of this stuff and we need to know if it works. If you've got a good set of integration tests you can be a lot more confident about making those changes over there. Yep. Yeah, so you need to communicate. I mean, everyone always says this but when you solve a problem, when you're spinning up a service you're gonna, and as more teams are spinning up services a lot of you are gonna be encountering the same problems. So when you solve a problem, share it. Make it a gem, write it down, put it in a wiki and tell people about it. Give talks because it's gonna be hard to, I mean, you don't want people solving the same problems. At Groupon we have this core architecture form it's called and basically it's got a bunch of people who meet and you can say I'm gonna spin up a new service or I'm gonna solve this problem. Have you seen this before? They're gonna help you answer questions like what has someone else solved this already? Is there a similar problem? Is there a particular technology that would help you solve that problem better? All those questions are really important to ask so that you don't reinvent the wheel over and over again. What else? Oh yeah, one more thing. One more thing, that sounds like Steve Jobs. One more thing, we have the interest, we have interest leagues at Groupon which are just internal user groups for closure, Java, we even have one for onboarding. Those are really cool and that's another way to help communicate what's happening. Once your company gets big enough, that's really important. So in conclusion, Cobras are great, Rails is great and Cobras do serve a useful purpose. But beware, it's not so simple. Once you decide that you're gonna start raising up a baby cobra, be ready for what comes next. Yeah, and okay, so last part, we're hiring. I mean, if you wanna come help us solve some of these problems, come talk to us after the talk. There's a booth downstairs, you can go to this website, tweet at us, I like that. But yeah, join us. And we are standing on other people's shoulders here. A lot of these folks are people who helped with the talk or who helped actually do a lot of the service extraction work. This does not comprise the total list but we definitely wanted to bring attention to these people. Yeah, and I mean, people like these guys, they give us a lot of feedback when we did the talk at Groupon and having people who will mentor and like spend time to help you understand things. I mean, that series is not worth a Groupon. Thank you all.