 So, I'm Eric Saxby. Today I'm going to be talking about iterating towards service-oriented architecture. I may have time for questions at the end, but I may not. This might go 40 minutes, so I'm just going to get started. So if you're like me, you can't really pay attention in talk about programming unless there are pictures of cats. So really we're going from this to something a little bit more like this. We're working together as a family. So why should you care? Why should you listen to me? So you may not know this, but I'm kind of a big deal. But really, I've actually not been doing this for that long. I've been doing Rails for about six years before and after that. I have been using various different technologies. I have been fortunate to work with some very smart people and be able to learn from them and to break a lot of things really quickly. And right now I work at Winilo. Also I'm trying to collect all of the buzzwords on my resume. I have more. I have more than just this. But so why is Winilo important? We're like a social network for shopping, but really it's because we have many millions of users, active users, and databases with billions of records. And we've gone through the pain of getting there and keeping the site actually running. So you can save any product from any store on the internet into your own collections and have your own wish list. That's what we do. And more importantly, we've gone from having this as one main Rails application doing all of this to a central Rails application that's still fairly large but supported by a lot of services. We've done as much as we could open source. Some of the business domain logic, it's really hard to open source. But we try as much as possible. We've done almost all of this in Ruby, including some things that people who prefer other languages say can't be done in Ruby. And we've done this with a very, very small team, very quickly. If you're like me, though, you're really not so interested in the success stories. You're interested in how did you screw up? How did you break? So let me take you on a journey to another company that a friend of mine once recently called the Anti-Pattern Goldmine, a completely hypothetical company not naming any names that I may or may not have worked for. Some of you in the audience may or may not have worked for after some of the story you might think you did. Come in, it's a startup, it's new, small team, and come in and say, wow, for a startup you have a lot of code that's really, really tangled. It's all Rails to code base. If you remember Vendoring Rails, we did that. If you remember how Vendoring Rails can go wrong, yeah, yeah, yeah. That was there, that was there. And I think a lot of this might have come down to the fact that at least early on, success of a product, success of a feature really was launching it as quickly as possible. And no, no, no, no, don't worry about that stuff, don't worry about design. And we have 30 engineers doing that as rapidly as possible, like five or six teams all doing that as rapidly as possible, trying to get it into production, and releases were a mess, you know. I'm sure a lot of you can relate to this. Deployments, multiple hours with all of these engineers trying to put all of this code in as quickly as possible, invariably, deployments went wrong. And eventually this got a little bit faster from monthly releases to weekly releases. But then we would have code freezes where for three months, everyone's trying to come in all the features without deploying, and you can imagine where that goes. And over time, really things are just getting worse. We are rough estimates that we're trying to give to see how long things are going to last. Come back and we say, oh, that means you're deploying on this date, you know, hey, we told the board, great, that's awesome, you're going to deploy on this date. The deadline like that, the only really way to meet it is to make serious shortcuts. And then as soon as a product finishes, because we're invariably missing those deadlines, and there's a next project that is supposed to be out in a week based on the estimates that you were supposed to do, team gets dispersed, finishes new projects, and no matter how worst case we are in our estimates, it's just never worst case enough. It just keeps getting worse. So some of you might be familiar with this story. So during the course of this, I think I've learned a lot. Programming is one of the most fun careers that I've had. And when it's not fun, you know something's wrong. And keep reading about service-oriented architecture, and a lot of us latch on to this. This is the solution to all of our problems. And you know DevOps? That's pretty cool over there. DevOps is the answer. That's how we're going to do services. It's just going to, it's a done deal. All we have to do is do services, DevOps-y, and we're done. So around this time, I move into the operations, not the operations team, because DevOps, not the operations team, not the team over there that you just throw the things over to the wall and they just make it work, no, no, no, not the operations team at all. And certainly me, a number of other people in the engineering team, I could say, really decide that services are the only way forward. Really quickly, product, and I don't mean an individual product manager or anything like that. By product, I mean really all of the people, all of the teams going into designing the products, planning the products, really how it's going forward. They quickly come to the conclusion that services are really just that thing getting in the way of us cranking out these features, because features is success. But at first, if at first you can't succeed, I've learned, you know, there's a few things. You can take a firm stance, breathe deeply, and become the loudest person in the room. Really, really helps. Also, if there's anything that I've learned from my cats, it's throw shit, you know. Done. Yeah, it really, really helps in situations like this. So we have a few things coming up, a few products. These aren't necessarily concurrent, these features, these projects. But we have a new login feature, and login in this application is as homegrown as you could possibly imagine, it could go wrong. And it's this core, core login functionality, and we're like, no, no, no, no, no, no. That's not going in this code base. That's going to be a new application. That's going to be a service. That's the only way we're going to do this. Otherwise, we're just not going to do this. It's going to fail. We know it's going to fail, so let's just skip to not doing it. Also we have this enterprise software over here, and we have this homegrown Rails application. And all the data really needs to kind of be synced between it in order for the company to actually work. So a lot of iterations on this, but really this time we're going to do it right, and it's going to be a data service. We're going to have our enterprise software and our Rails app, and that is totally going to just make this an enterprise Rails app. It's going to be amazing. And I remember saying this a lot to a lot of different people, like we're going to screw this up. We're going to fail. I know we're going to fail, but it's the only option that we have. That's the only way that we're actually going to get to be able to do this and succeed at some point in the future. So on a side track, hopefully this is not new to very many of you, but really there's a lot of different ways to do service-oriented architecture. It can mean a lot of different things. Read blog posts. You get a lot of different ideas. It can be synchronous, asynchronous, use a message bus, maybe some services require actually hitting them over HTTP, TCP sockets. There's a lot of ways of doing this kind of thing. But why would you really want to? So scale the organization. You have a lot of different teams, a lot of different engineers. Maybe you really want this team over here to just have their own code base that they deploy separately. Also maybe you want to outsource this to a different team, and you really don't want to give all of the code to this other team. You really just want to isolate this and say, you know, you guys over here, you know, just here's this. Sorry, I'm actually trying to expunge all gendered pronouns from my vocabulary, but it's really hard. So you can also scale the code base. Difference of this thing over here might be completely different than this thing over here, and that might actually be easier in a service. You can really tune the code to the workload. You might have this really complicated code that for various reasons needs to be complicated. It might be really complicated functionality, but you might be able to hide that behind a clean interface, an API. And tests usually not always, but a small code base can mean fast tests, and if you're sitting around for hours waiting for tests to complete, that can really eat into your productivity. But it comes at a cost. All of this comes at a cost, and I think one of the things that I've been learning is that sometimes the cost of not doing this is greater than the cost of doing this. The cost of the infrastructure, new servers, how they talk together. It's really complicated, things will go wrong, but sometimes not doing this is going to mean that your productivity is going down and down and down, and it actually is more costly. So back to these projects. We have this data service. It's sometimes, I think, six engineers, sometimes eight, I don't really remember. And nine months of work. It's really complicated, like state transactions, and this is really critical data that really needs to be done right. And there are, so there might have been, when deploying it, some slight data things, but they were fixable really quickly. It was fine. There were actually no major data problems. And there were some applications that were built on top of this data service. And depending on who you talk to, they were more or less successful. Some people using these applications are like, oh, thank you. This did exactly what I need. It's actually helping me with my workflow. Some people are like, yeah, okay. But engineering. You know what? This was really critical data. It was really hard, totally new for us. This was a success. We did not break anything. Product is like nine months, nine months. Eight engineers, nine months. How could you possibly call that a success? Okay, so different application. Depending on the time period, two engineers, four engineers, plus the DevOps team. DevOps. Three months. We were figuring out the systems automation was really tangled. In a few weeks, we had some staging servers. And about two months later, someone comes up to me and says, hey, where are the servers? Can we get those? And I'm like, deep breath? Deep breath? Which servers? Oh, those staging servers. But we worked it all out. Something that I learned about DevOps, everyone actually needs to be invested and it's new to people. And it was released with a feature flag. Only a small amount of people were actually running through it. So we had some really good time to break it in production. And really figure out how it was really going to run. And we launched it to all our users and it was actually, I would say, very successful. It worked exactly as intended. It was very resilient to failures. We had to do some unexpected database maintenance, restart master databases, and we're like, you know what? Just do it, you know? Just restart the database. No, no, no, no. Don't turn off the service. It's going to recover. It's going to be fine. Maybe like one user will notice. And the company is actually sort of figuring out that user metrics might be more important than dates as metrics. And the user metrics were generally successful. So engineering, like, this is good. This is good. This is how we do this. This is, you know, the next project, this is how we're going to have to figure this. Products is like three months? What are you talking about, success? That's not success. We had these other product features that needed to be done. And we had these four engineers that are sitting in the corner not able to do these things. Okay. So I would say that this is a really important question to ask. What is success? If engineering says it's a success and product says it's a failure, who really wins? So this is actually a trick question because it's a zero-sum game. Like nobody wins in this interaction. So let's go a little bit deeper to ask why we needed SOA in the first place. And this is really complicated. There's lots of moving parts. But some of the things that I think now were because engineering didn't trust products. And what that really means is we didn't trust that we could do our jobs well given the shortcuts that we needed to do to actually do our job and actually meet our metrics. And product didn't trust engineering to meet the promises we were given and we couldn't actually meet those promises. And again, over time this was changing, but product was accountable for features, not for quality and the actual how users interacted with them. And this is probably the subject of much more discussion and would love to continue this and learn more myself. But I think that a lot of this comes down to trust. If you can't trust that you can do your job successfully, how can you actually do your job successfully? If product, if different parts of the company don't trust each other to do their jobs well, how can they actually do their jobs well? So what did we learn? Fuck it, next time it's SOA from the beginning, like before we even do Rails new, it's going to be RabbitMQ, we're going to have a cluster, it's going to be amazing. So no, no, no, no, not the right answer. I almost went here and thankfully I worked with some very humble, empathetic people who are also very convincing, but no. So I think what I have really taken from this is Agile is not just one sprint after the next. It's like four miles, it's great. You just break it up into 100 meter increments and each one is a sprint, it can be fine. What it really is about iterating, deploying small things really quickly using the data that you get from that to figure out what to do next and refactoring. Refactoring is really, really important. You might be able to do a small thing quickly and with some shortcuts not really needing to know how it needs to be designed, but when you see the pattern you have to fix it and SOA is not going to solve larger organizational problems. It's not going to fix your code. Basically what it is is another tool at our disposal to solve our pain points and our organizations, our company's pain points. And how we do that is through iteration. So small changes deployed quickly using feature flags so that you can actually get code into production as soon as possible knowing that it's off and it's not going to affect users and prioritizing this when it's necessary. So when might it be necessary, I would say performance, that's a really big thing. This is actually driving a lot of what we're doing. Code complexity might be less in a service than it would be outside of a service. So if you have these two things and you're like this is actually going to be easier to do over here than to put in this tangled mess, maybe it is better. And also maybe sometimes you have a new feature and it is completely unrelated to everything else that you've already done. With the caveat that in the short term you might be able to put it in with all the rest of that mess trusting that when it becomes a problem you will be empowered to fix that problem. So performance, as I said this is driving a lot of our work. So when you know it's getting slower, we're running into some problems. We're discovering that the databases are becoming IO bound on disk. This is a really, really bad place to be. But our user growth is increasing dramatically. The activity is increasing dramatically. We're starting to see exponential tendencies in our graphs. And if you see exponential tendencies in your user graphs, it's kind of like giving a talk to hundreds of people, you're like, why are you all looking at me? It can be scary. It can be really scary. But we have one table that's really outstripping the others and that's causing problems. We discover because of our data, because of our graphs, that it's really one table that's really destroying the performance of the rest of the table. And we're in the cloud because we have all those buzzwords. And so there's really a maximum size of a host that we can get. So there's really like an upper limit of how much we can actually solve this with one database. Even after read-write splitting, we've already done that. And pretty soon we realize that the site is going to break. If we don't do anything, we're just not going to have a company anymore. It's just going to fall down. But we have really, really, really active committed users joining us right now. Now is not the time to stop feature development. Now is the time to really learn from what our users are doing, double down on it, really tweak our features and figure out what is going to make our business successful. And we only have 10 engineers at this point. We don't really have that many resources to work with us. So our first step of iteration is realizing that this is one problem, how do we solve this one problem? And this is maybe going to be a service. How do we get to the point where it can be a service? So first step is isolating the data. So ActiveRecord gives you these really nice things, associations has many, has one. Things that really make the joins between your tables easier. When you have a service, you don't have joins. So these need to go away. These just don't exist. But it's actually really easy to get rid of these, honestly. A product has saves. You have saved this product into a collection. We can just take that where clause that ActiveRecord was going to do for us with the join and just pull it out as aware. Really not that hard, actually. And you know what? ActiveRecord also gives us ways of talking to a different database. But we can actually use this to just pretend it's in a different database. Established connection allows you to have this model live in this database and all the rest of your stuff live in your main things. It's really not that hard. And one of the key things is that each step of this, each slight change is deployable. And it's testable. And you can deploy to staging. You can click around to see where your test coverage is maybe missing, figure out where it breaks. And one thing that I will say is you might be doubling your database connections at this point. And when your database hits the max connections, just everything stops. It just stops working. So we've learned this lesson the hard way. But now that we have this code deployed that pretends like it's in a different database, we can make it a different database without actually that much work. You have a Postgres. We love Postgres. We have a master database. And we spin up a replica over here. And we put the site into maintenance mode. If you have more critical interactions with your company, the maintenance mode might not be possible. Braintree has some really great blogs and I think talks about this. But for us, the operational overhead of making it so we can do this without the site being down was way more than it was worth. So we just take the site down, push a new database.aml saying now this connection is talking to this database. And we just promote that to a master and restart everything, bring the site back. And now we have two databases, five minutes of downtime. Not that bad, actually. And after the fact, you can clean up. You have a redundant table over here, a bunch of redundant tables over here. You just truncate them and delete them very carefully. Not that hard. And actually, at this point, you might not need a service. You might be done. Your site might just be working. It's fine. For us, we knew based on how things were going that we were going to have to horizontally shard this data. Now it's in a different database. That is going to have to be many databases. And we want to hide that complexity. We do not want our main application to have any of that code. So we know we're going to have a service. Now isolate the interface. And by this, I mean, how are you actually accessing this data? And what is your long-term goal? Where's the sharding? And when you know it saves, and any time we're actually getting saves, we're either looking at it by product. The product has these saves. Or we have it by user. A user has saved these things. So this is actually really helpful to plan out what your DSL, what your API is going to look like. We know that things are going to have to get to it by a product or by a user. And so you have some ware clauses. Where is ActiveRecord? We are not going to have ActiveRecord. So instead, a save has a byproduct method that is also, oh, so one thing I will say is at this point it's really helpful to remove redundancy. If you have different ways of accessing kind of the same data, do you really need that? Can you change the product to mean that you don't actually have as many of these finders as you have? And very soon, things are going to break. So if you don't have tests, this is a really good place to add tests. So now we have a small sort of Ruby DSL. How do we actually pull that out? And I would say right now it actually doesn't need to be a service. Really what you need is a client. And how do you build the client out? And that's where adapters really come in. So we use the module. This could be a base class. Some of the reasons why we thought we needed a module. Maybe we could have actually done this as a base class. But now a save is a save client. And that save client is where your finders go, their class methods. And one thing I would point out is that that finder is calling through to an adapter, a database adapter. And really that's what's wrapping up all your database calls, hiding them from your application. And really one of the core pieces of this is that your database, your adapter is your feature flag. It's also deployable in place. You can have this in your lib folder, you can actually start a gem now, and you can just deploy it. And your main application is still talking directly to this other database. But you're starting to build out your client. And later when you have a service you can replace it with a different adapter. So that adapter gives you back when you call, you know, you get saved by product and you call all on it. When you call by product you're getting back a relation. And this is something that we thought we didn't need. But turns out ActiveRecord does this for very good reasons. And that's because threads and because state. If you say I want this type of data and you save it away to a variable and now you call some other method, order by this and it changes state, you might do something else on this variable over here, not realizing that you've altered it state later. So anytime you make a state change on these objects you really want to get back a new instance, a different instance with different state. And when you call, you know, all first or pluck any of these things, what you're really calling it on is your relation instance. And the key thing that we learned is that that relation is shareable between all of your adapters. So the actual work done to get the data is on your adapters. So the relation delegates back. So in our database adapter the thing actually getting the data is ActiveRecord. We've just moved our ActiveRecord class into the library and hidden it from our application. In this case we were using ActiveRecord so you could just, you do ActiveRecord if you have another favorite database adapter, great. So you call save by product, you get an adapter that calls through to the adapter and gives you back a relation. You call relation.all and that just delegates back to the adapter which calls through to an ActiveRecord, gets your data, takes the attributes out of it and gives you back an instance of your core class because you're hiding ActiveRecord. You don't want to get back an ActiveRecord class. And I would say it's critical to deploy at this step because you've made mistakes. I guarantee you've made mistakes. And the cost of fixing this is really low right now as opposed to spending a lot of time trying to design your server, how that's going to interact and realize whoa, whoa, whoa, we screwed up the client. All of this work we've done on the service we have to throw away because we did it wrong. Now you have a client, now you need a server. And it doesn't matter, whatever you want, it's fine. It's actually the cost of writing this is really, really low. Because the server is the simplest part of this. And if you did it wrong, if you chose the wrong transport mechanism, you know what? You build a new adapter. That's actually really quite easy. So let me take a moment to just reiterate why should we have deployed by now. And I'd say it's because the client is much, much more complicated than the server. So your bugs are in the client. Your bugs are not in the server at this point. And the server is going to be dictated by the choices you've made in the client. So if you've made wrong choices and you build your server, you've built the wrong server. We use Sinatra and OJ just because it's awesome. It really just works as small and as useful. We thought we would need to move away from HTTP, but we've grown a lot and we haven't had to do this yet. It just works. Things that we thought we would have to change immediately, it's almost a year later, and it's just done. Okay, so now we use the service. And that's really a feature flag. You just write a new adapter that talks to the service instead of the database. So now you call by product. You get an HTTP, it calls through to the HTTP adapter, which gives you back the same type of relation. When you call all on it, it calls adapter.all, which now goes to an HTTP class that actually gets the JSON and takes the attributes out of it and gives you back your save class. You're getting back the same object, you're getting back a save. So retrospective, great. We've isolated the data, we've isolated the interface, we've started to build our DSL. We've pulled that DSL out into a gem. Now that we actually kind of understand what that gem needs to do, we can launch the service and then just build a new adapter to switch to this. If I would say that if we hadn't, if we had realized that this was the order that we needed to do it on, we would have done this in two weeks instead. So that first part of it was like a day worth of work. That second part of it was like three hours worth of work and deployed immediately, the harder part was realizing that we needed an adapter. And at this point people, we didn't really see anything about the hexagonal architecture. This might have been before some of those talks and papers have been coming out. But it's actually really useful. Tests, we use SunSpot for some of our solar things and we're already used to spinning up a solar instance using a gem and trap it up. You can do that for your integration tests. But for unit tests, we have tests around all of this. So we can have tests around a fake adapter that proves that it behaves the right way. And then that just saves data in memory in your application. And redundant tests, you might say, do I really need this test? Yes, because one thing you can delete your tests later that are redundant. And you wanna be really confident that when you do switch over, it's gonna work. Forman and subcontractor are really helpful for this kind of thing. So subcontractor is a gem that really says for this service change directory over here and run this command in a completely isolated bundler environment. Cuz you really don't wanna mix your dependencies. You don't want your server dependencies and the versions to be conflicting with your main application dependencies. You wanna be able to change those separately. Okay, so what about a new app? I'm spinning up a completely new thing, not extracting totally new. How do I do that? How do I iterate on something that doesn't exist yet? And I would say that some of the lessons that we've learned from this and actually from just doing our product development in general is iterate. Find a way to get to the smallest deployable thing as quickly as possible. And whatever tool you use to deploy to spin up infrastructure, one of the sort of heuristics that we found is really focus on organizing that code around being able to change things easily and understand how this thing is different than this thing. Chef makes it really easy to define a giant hash of global state over here. And then just run this thing over here and it's magic and it'll just do it. When you actually start to spin up different services, this thing is gonna need to be slightly different than this thing. So how can your code make that as easily understandable as possible? So feature flags also, on or off, do customers see this or they don't. But you know what, maybe it's just kind of half on. Maybe everything, every request that's gonna actually couple through this, use Sidekick to just every time just spin off a job to hammer your service. And if it breaks, you've learned that before you've launched it to your users. There's a lot of other ways you can do this. On to these five users and let's see how we can break it, see how the interaction feels. It's really useful. Also, one thing that we found is often helpful to integrate very deeply before you go widely. So if you have a code interaction that goes through all this process, do it kind of once through all the process without worrying so much about the edge cases or the errors. Because you're gonna find those, those are gonna come up. But it's really useful to go all the way through the interaction, knowing that it's really kind of sketchy and doesn't do everything. And really let that drive the design. And this is also a thing of letting the feature kind of drive the design and figuring out what is the pattern for how you really need to organize the code. Before you don't just whiteboard everything and say this is gonna be perfect, production is going to destroy every single design that you think you have. Also, if something seems clever, it's bad. You're like, no, no, no, no, it's bad. Complexity is gonna come to you, don't seek it out. It's, it's, it's evil. So if there are any kind of takeaways from this, I would say, you know, hexagonal architecture is really cool. But you don't have to design it from the start. If you have trust, if everyone in your organization has trust that you can do your job and that you're all going, working together to build an awesome product and awesome company, you can fix this later. You can actually let the needs of your product determine where your boundaries are. So, thank you. I actually have a few minutes of questions for you to answer questions.