 Video equipment rental costs paid for by peep code screencasts. My name is Mike Parham and I'm from Five Runs here in Austin and I'm going to be talking today about how not to build a service. So who am I? I'm probably like a lot of you guys. I'm ex-Java, did Java for the last decade and started playing with Ruby when I started playing with Rails a couple of years ago and loved it. I'm probably best known in the Ruby community due to the data fabric gem that I wrote which adds sharding to ActiveRecord and that is now Rails Envy approved since Jason from Rails Envy this morning said it was an innovative technology of the year so thanks to him, thanks to him for that compliment. And Five Runs is my sixth startup since 1999. So I've seen a little bit of failure and a little bit of success both. So who's Five Runs? We're from here, we're focused on building tools for Rails developers specifically in the monitoring and performance realms. We have three products right now. Install is a free stack which has Apache MySQL and Rails integrated in it. Tuneup is a plugin, developer plugin for your Rails app which puts a little bar at the top of your web pages which shows you how long it took for your page to be rendered in the various parts of the rendering and how long those took. That's also free. And then our flagship product is Manage which is a full stack monitoring solution that monitors Linux, Apache, MySQL, Rails, Memcache, PostgreSQL, basically any major part that a Rails app requires it'll monitor it. And notice I've got 2.0 here because this talk is about the mistakes we made in building the 1.0 product and how we fixed it in 2.0 in a sense. So what not to expect? This is not a technical talk. You're not going to have any Ruby code in here. I'm not going to be talking about web services. So noWisd also for XML. I hope you're not too disappointed. Another caveat that I should probably add is I was not a member of the Manage 1.0 development team. So when I talk about the mistakes they made, there's a lot of speculation and hindsight on my part as to why they made this mistake. So I just keep that in mind. So I'm going to, I divided this talk into three parts, you know, generically talk about failure in the software world and what it means. I'm then going to talk about the mistakes we made. And then finally I'll wrap up and try and, we'll try and figure out what are some lessons we can learn generically from the mistakes that Five Runs made. So failure, why do startups fail? You know, I can't, I can't give you particular reasons for every single startup in the world, but more generically you can say that failure is due to the summation of a series of mistakes. You make one mistake, then you make another, then you make another, and that adds up to failure. If you think about something like New Coke or Microsoft Bob, right, they didn't fail for one particular reason, but it was a number of decisions that they made which turned out to be mistaken. So it thus goes to, it thus, that implies that there's more ways to fail than succeed because you make hundreds and thousands of decisions as part of a project in a company and any of those can be mistake, can lead to mistakes and thus when you add up all those decisions could lead to failure. So the question then becomes how do we, how do we reduce our chances of failure? Well in the software world there's a couple of very simple rules of thumb. You hire experienced people and you hire smart people. And honestly I think when you hire experienced people really what you're hiring is, what you're getting is other people paying for failures that they've made in the past. You're paying for guys who have failed a lot in the past and not having to pay for that failure and that's really all experiences. Now smarts on the other hand, you know I'm not so sure that smart people really fail less than, a smarter person fails less than somebody who's not as smart. I guess maybe they just, they're quicker to gather data and analogize from a previous situation there and into the current situation. But, and money of course is the third thing, money just allows you to make more mistakes because you can then afford the time to correct those mistakes. So to wrap up how do we avoid failure, you have to think about the decisions you make when you're writing software, when you're starting a company. And you have to understand that each of those decisions has a cost if it turns out to be a mistake. You then, you need to avoid making those most costly mistakes and, or you have enough money to correct those mistakes and then you won't fail. So let's talk about types of mistakes that you can make. Like I said, you wanna prioritize the decisions that you make to determine which ones are most likely to lead to failure. And when I was thinking about the mistakes Five Runs made, I came up with three categories plus a fourth which I won't really talk about much. But in general, the three categories I came up with were business, social and technical. And I tend to think of them in order of importance. That is, business decisions are gonna be, have wide ranging effects if you turn out to be, if you make a mistake in one of those decisions. And in technical, I tend to think, aren't necessarily as important. Because usually it's just a, you know, a man hour, a man day, a man week to fix a technical problem. So let's talk about business mistakes. Everybody loves that guy. Business mistakes are those, are business decisions and business mistakes. Are those mistakes that you make that involve an entire company, all right? These are the most deadly because they can take man years to fix. And indeed, we made a couple of those business mistakes. I think the most, the first and most fundamental mistake we made was simply knowing who our customer was and knowing what we're building. When we first started out, we were targeted at the enterprise systems management space. And we really just used Rails because it was the hot technology at the time. And we wanted to get something bootstrapped really quickly. Well, it turns out that the enterprise systems management space has wildly different needs and wildly different expectations of their software than the Rails development community does. So when we spent, you know, half a year building all this software for enterprise IT departments, it turns out the Rails developers really didn't value a lot of that. So Managed 2.0, a lot of the parts that we rewrote in Managed 2.0 were simply kind of refocusing or retooling the managed service so that it would be more appropriate to the Rails audience. So once we figured out that we were going to focus on Rails developers, the question then was, what do we know about these guys? Obviously, we are Rails developers. And I think that's where that was the fundamental decision we made, is that we need to take our own Rails, our own development team ourselves, and sort of work the way that they want to work, because they're going to work how the Ruby on Rails market works in general. So all the stuff that we did around the enterprise market turned out all for not. We had all the support stuff that we had, things like a 24 hour, 800 number, that makes no sense in a Ruby on Rails world. It's much more you need email, you need campfire support, you need forum software, that sort of thing. Pricing is another thing that varies dramatically between those two markets. And marketing in general, taking out magazine ads doesn't make any sense in the Rails world. Whereas you much more focus your marketing on things like conferences, on things like open source and blogging. So the other realization we came to in the business category was that the trial is absolutely fundamental to getting customers. The trial is something that you start with and then you need to treat it organically, that is you need to constantly be improving it. Because every single person that tries your software that doesn't turn into a customer, that's lost money. So you have to understand what value they want to see in the software and you have to show them that value as soon as possible. And like it says, getting the audience to the trial to click that download button, that's your marketing department, those are marketing decisions that you have to make on how to get those people there. But once they download it, they then have to install it, they have to be able to use it quickly. That's all about development, it's all about us having to write good code so that the customer immediately says, okay, this sounds good, I wanna buy it. So social mistakes. Social mistakes, usually the picture is misleading, it doesn't represent any sort of drunken, drunken hilarity. I think the social mistake is involving a single department in your organization. So business mistake again was the entire corporation. If you make a mistake there, you've gotta get marketing, sales, development all involved. With a social mistake, you're talking a single department, you make a marketing mistake or you make a development mistake. First social mistake I think we made was we started off building the system in Rails but our team had no Rails experience. And this led to a number of fundamental problems in the source base. We invented the wheel on a lot of technologies. We didn't use Capistrano, for instance, for deployment. We used a shell script. And I think that's just a matter of, you just don't know the ecosystem that you're playing in, right? The experience is not just how to write Ruby but it's also knowing all the various tools that are out there to be used. We misused technology a little bit. We had some demons, some headless demons that were running 24-7, that were running Rails, the full stack of Rails. Well, that's completely unnecessary and memory intensive, right? I think the reason they were doing that is because they didn't know how to spawn off a Ruby demon. They were just reusing Rails and its mongrel code to spin off the demon. And they also didn't know how to use ActiveRecord by itself. So instead we got demons that were running the full stack of Rails. And lastly and possibly the most important was we simply had Java guys writing Ruby code. So we got Ruby code that looked a lot like Java. It had getters and setters. It had all sorts of really, really awesome design patterns that made no sense. So Ruby's completely different from Java. It takes six months, a year, two years to really sort of get your head around it and understand how best to write Ruby. It is a different language. I know that sounds dumb to say, but it is a different language. So another social mistake we made was no testing. We had nothing. And I don't know why that is. I'm assuming that they were thinking that to stay agile, you couldn't have tests breaking. All tests do is slow you down. And that may be cruel to say because these guys were an experienced team. But at the end of the day, everyone knows you've got to have tests. I hope everyone knows that you've got to have tests and I don't need to explain it to anybody. But the fundamental issue that this caused was the Manage 1.0 system was impossible to refactor. Remember I said we had no Ruby experience. So the code oftentimes looked like a shark attack. Well, what do you do with a shark attack? You refactor it, right? Slowly piece by piece. Well, if you don't have any tests, what are you going to do? How do you know when stuff's broken? You don't. So we had to throw away a lot of code because it simply took less time to completely re-implement it than it did to refactor it. So obviously we're a lot better these days. We've got a lot more. We've got unit functional integration tests. We don't have any automated browser tests. Is anybody using Selenium or automated browser testing? Okay, a couple. Yeah. It seems like it's diminishing returns once you start getting into automated browser testing, but your mileage may vary. The last social mistake we made. We didn't need our own dog food. We are Rails developers. We are selling development tools to Rails developers. Therefore it goes to a figure that we should probably be eating our own dog food. Dog food is the best way to show value to the customer because you are acting as your own customer. In fact, we were telling people, here's how to monitor your own systems, but we weren't actually using it to monitor our own system. Well, what happens there? What happens is you get what we had before, which was a product that was targeted toward enterprise systems management, and we were selling it to Rails developers, and it wasn't showing as much value as it should have. And so we weren't getting the trial conversion. All this stuff starts to roll together into a big snowball. So there's a question, catch 22, how does a system monitor itself? Well, you don't have your production system monitor the production system. You have the production system monitor your staging systems, your development servers. But the point is you should be doing what you can to eat your own dog food, because otherwise you're not going to show as much value as you really should. Technical mistakes. Like I said, I think of these as probably the least important of the mistakes, simply because we're developers, if we know of a mistake, we just go in and fix it, right? It takes a day or two, no big deal. Ideally, it takes a day or two. There are absolutely times, though, where a technical mistake can take a lot longer to fix. And this is one of them. We had a client. The managed service, you download our client installed on your machine. It detects everything and then runs 24-7, collecting data and uploading it to our service. Well, the client that we originally wrote was written in Java and C. Now, why this is, I don't know. Again, there's a lot of hindsight here, but I'm guessing there was concern, because, again, we were going after the enterprise market, there was source obfuscation concerns. Ruby generally runs directly from the source code. So with Java, you have that compilation step. And there's source obfuscators for Java also. So they also wrote parts of it in C, even, believe it or not, so that there would be 1,000 lines of C code to collect a bunch of metrics. Well, that's way, way, way low level and takes a lot more time than it probably should. Our new client is written in Ruby. It is obfuscated. It is just a raw binary. But the source code is in there. It's just encrypted and what have you. So it is protected in that sense. But when it collects metrics, all it does is cat the proc file system, do something as simple as that. It's not going into native binary land and APIs and what have you. And so the results reflect that improvement. The old client was almost 70,000 lines of code. And the new client is over 90% less. It's also 40% more memory efficient. There's another problem. We do still have a design issue that we have to deal with every day. And that is we have permissions problems. We haven't fixed everything with the client. Because the client runs as a five runs user, the Rails app may be installed as some other user. And so our client can't see the data in the Rails app. So we do have file system permission problems that we oftentimes have to help the customer fix. So we're still working on getting all the client issues resolved. A couple others. Because we had that enterprise focus, we built a bunch of unnecessary features that the Rails world simply does not need. We didn't use SSL. Why we didn't use SSL? I don't know. I suspect they thought it would be blocked at the firewall. But instead we wrote our own custom HTTP encryption to go over port 80. And so that's time wasted on technical detail that we could have spent providing more value to the customer. We also built a custom proxy so that if the machines that the client was installed on were behind the firewall and couldn't talk to the internet directly, they could have a machine in the DMZ which proxied for the rest of those machines. And there are a few enterprises where they do have the firewall locked down that much. But in practice, in the Rails world, it's not a concern. We've had one customer that has a problem with this. And so in the end, it's just not cost effective for us to build this complex proxying system when we only have one customer that needs it. And in a similar sense, support for these other subsystems just aren't cost effective because in the Rails world, they're just not really all that popular. Jay Boss and Tomcat and Oracle, certainly. We do have one or two customers that are running FreeBSD and one or two that are running Windows. But again, at $40 per month per server, we can't afford to spend man weeks building this stuff. So lastly, we didn't have integrated billing. Now if you remember, I talked about how you give them the marketing message, they come to the site, they download the trial, then your development takes over, you show them the value. Now they want to buy it. We've made that process as easy as possible up until now. Now they want to click the Buy button. So they click the Buy link, and it takes them to a page that says, please fax us at this number with your credit card details from 9 to 5. Well, what you've done is introduced another hurdle that the user has to jump over. And so you're going to lose customers here, too. Even ones who are convinced, if they're on the sort of borderline, if they think, OK, maybe there's enough value I'll buy this. But then they find they have to call somebody up, just another step that they have to go through. Today, obviously, we've now got online buying so that you can try the managed service, and you can buy it at 4 in the morning without having to talk to any of our fantastic sales guys. So what can we learn from this? Well, Five Runs is no different from any other software company. You're going to make business mistakes. You're going to make social mistakes. You're going to make technical mistakes. You have to recognize that you're making a decision that could turn into a mistake. You have to think to yourself, what is the size of this decision? How important is this decision? What is the ramifications if I'm going to make a mistake in this decision? And in general, the rule of thumb is the more knowledge you have, the more context you have, better your chances that you're going to make the right decision. So the question is, which level of ignorance can you afford? Here's the levels of ignorance. This is one of my favorite slides from the ACM a couple of years ago. But I've gotten you to the third level of ignorance now. You're no longer meta ignorant. But when you're talking about those business decisions that are absolutely critical and take man years to fix, you got to be to the 0th level. You've got to have a business plan. Venture capitalists talk about how important a business plan is. Well, this is why, because these business decisions you make cost man years to fix if you get them wrong. And maybe something like a social decision, maybe you can get away with first level. I don't know. But that's something that you need to recognize and come up with for yourself. And remember that the longer it takes to fix a mistake, the longer it takes you to recognize and fix a mistake, the more costly it is. This goes back to the whole software engineering phase where, if you make a design decision, it costs a lot more to fix than a simple implementation bug at the end of your deployment cycle. And that's because it's fundamentally weaved itself into the software that you're building. So effectively, you're taking out insurance by raising your level of ignorance by getting more knowledge and more context about a decision that you're making. You're taking out insurance up front by spending more time so that you don't make a mistake and don't have to pay a lot more in the future. So there's two things I think that can lead a software group to making more mistakes than usual. One is group think. I think group think most often happens when you have a group of developers and one of the people in that group is seen as an expert in a domain. Now when you have to make a decision in that domain, that person just automatically voices their opinion and people agree with them. However, oftentimes things are conceptually slightly different from their experience. If you have a guy who has worked in rich client development for 20 years and he did performance tuning that entire time on rich clients, he would be seen as a performance expert. But now if he goes to work on a web application, a lot of his knowledge about performance tuning rich client apps makes no sense in a world of browsers and assets and the stateless HTTP protocol. So what you have to be careful of is taking an expert at face value and not thinking through yourself to verify that you think that indeed their opinion is correct. The other thing that I think affects software groups is optimism. We're all optimists. There's a reason why we, if you're in the Java world, why you run Ant or Maven or something to build your software, and that's because you think it's done. You're an optimist. You're saying, OK, I'm done. Well, unfortunately the compiler tells you whether or not, in fact, your opinion goes with reality. But it also goes when we're giving time estimates for how long something is going to take. We oftentimes give an estimate that is just how long we think it's going to take. It doesn't include any time to get that knowledge, to get your level of ignorance about the thing that you're building. As Fred Brooks says in the Mythical Man Month, which is an excellent book, if anyone hasn't read it, go read it, you estimate what you think it's, what it ought to take, not what it actually will take. So what has five runs learned? Well, to recap, we learned that changing markets is extremely expensive. That's the fundamental question that underlies almost every other business decision that your company makes, is who is my customer? And how do I interact with them? How are they going to use the software? How are we going to support the software? How are we going to price it, market it, all that sort of thing? Everything comes out of which market you've chosen. So it goes to show, it stands to reason then, if you change that, that's going to take a million years to fix. And indeed, it took us, it's still taking us many years, even today. So the trial experience, that's the user's first impression of you. It's critical to get that right if you're selling software. And the trial is not something that is frozen in time, right? You throw it out there, you see how people use the trial, and then you need to get feedback as to why aren't they buying your software after having used the trial. And then you iterate over it. The trial is software in that you use agile development on it, just like any other piece of software. And lastly, use your own software. If you're selling development tools and you're developing it, you sure as heck better be using it so that you know it has value, and therefore you know your customer will find it valuable. So what can you guys take away from this talk? Obviously, the lessons that we learned at Five Runs aren't necessarily lessons that are going to apply to you. But I think generically, as I've said, the bigger decisions require you to gather more knowledge, gather more context. You need to take out insurance, as I said, so that you don't pay down the road, and possibly cause your startup to fail. And technical problems are the least of your concern. As I said, the technical decisions, oftentimes one, two, three people can fix them in a matter of a week or two. It's those customer decisions, those business decisions that you really need to watch out for. And we're all developers, so we tend to focus on the tree, not the forest. But the forest is something that sometimes you need to step back and look at and say, are we doing this right? And finally, we're all developers, knowledge is the only thing we bring to our job, what's in our head. So you have to learn, you have to be growing, you have to be getting experience. And as I said, the best way to get experience is by making mistakes and learning from them. That's what you've got to do. And as I said, I'm not going to talk about any really dumb mistakes, but we made one or two of those. So you're welcome to ask me about it afterwards. That's my contact info. And that's all I got. Thank you.