 All right, let's get started then. So, thank you ever for coming. Hope you're enjoying TripleCon Denver so far. Everyone having fun? Yeah. Good. And the organizing team put a ton of work into this conference, so it's good to see it paying off. This session is on architectural balance and architectural trade-offs. This is kind of abstract and hand-wavy, but it'll make more sense in a moment. A little bit about me. My name's Larry Garfield. You may know me as Krell Online or on Twitter or most other places. I am a senior architect with Palantir.net, where I Drupal Development Shop based in Chicago. About 22, 23 people who do mostly institutional not-for-profits, museums, universities, that sort of thing. I'm also an advisor to the Drupal Association, formerly member of the board. For Drupal 8, I am the web services and context core initiative owner. So, if you've heard about the Whiskey Initiative, that's me. I was the co-author of Drupal 7 Module Developments. Who here has actually read that book? I'm sorry. No, really, good book. And I talk about architecture and other abstract concepts and conferences and they keep inviting me back, so I guess I must do something right. And as my colleagues at Palantir will tell you, I'm mean with an IRF gun. Fortunately, I did not bring one today. But before we actually get into the software part, this is a tech conference, so we have to have the obligatory car analogy. This is the 1964 Chevy Corvair. It was a decently popular car in the 60s. It ran for a number of years. And one of the features of it is that it used a swing axle design. Basic idea of a swing axle instead of a single bar all the way across to hold the wheels. Each side could move independently of the other. So one side could move up without the other. And this had a number of different attributes, one of which is that in certain conditions, because you end up with less traction, the car can slip. Now, that's not necessarily a problem. In a sports car, and if your driver knows what he's doing, having the car slip just a little bit on a turn can be a very good thing. That lets you make tighter turns. It's a lot more fun. If you're driving a family sedan and you don't realize this, then your car can spin out. And I don't think most of us want our car to spin out when we've got the kids in the backseat. There are ways to compensate for that, which Chevy did not do on the first couple of years of the Corvair. They did eventually fix that. But not before launching the career of Ralph Nader, who for those familiar with him, first achieved public attention by drawing attention to the fact that the Corvair didn't have the right architecture. The design decisions to have a swing axle design but not a stabilizing bar was not appropriate for that type of vehicle. That was not a good or bad decision necessarily, architectural decisions are not good or bad. They're not right and wrong. They are appropriate or not. And in the case of the Corvair, a swing axle design was an inappropriate decision. There are other cars where it's perfectly appropriate. When you're trying to design a car, when you're trying to design a website, when you're trying to design a software application, you're thinking about balance. You're not thinking about what is the right way to do something. You're thinking about what is the appropriate way to do something. In an ideal world, sitting down, writing some code, banging out a new Drupal module, whatever you're doing, everything you do would be fast. Everything you do would be nice and flexible. You could exchange it and change it and do all kinds of fun things with it. The code would be easy to read. You'd understand it. Everyone else would understand it. You'd be able to scale it, run it on a blog or examiner.com or whitehouse.gov or something big like that. All your code would be right. You'd know for a fact that there are no bugs in it and oh yeah, you'd have a pony too. And this is just as unreasonable. You can't actually get all of those things. You can't always get what you want. Who's seen this kind of chart before? Most of you, yeah. Your code, if you have anything you want to do, you can do it right. You can get it done quickly or you can do it cheaply. You cannot do all of them. If you need done quick and it has to be high quality, fine but you're gonna pay for a lot of developers to pull it off. If you don't care how long it takes, you just want it done and done cheaply, okay, it'll take a while because you can only afford one developer or you're gonna do something in a hacked up kind of way. Over in the business track, they talk about this kind of trade off. Who here has done project administration of some kind, project management, tech lead, uh-huh. Who's had that conversation with their clients where they know exactly what they want, they know exactly when they need it and they know exactly how much they're gonna pay you for it and these don't line up. I see all the same hands are going up. Yeah, I've been there too. Again, that does not mean that fixing scope is wrong or that fixing cost is wrong. Fixing all three of them is impossible. You have to decide what is important to you in your case for your site, for your module, for your product and trade off what you're willing to sacrifice. You just want it to be good. That's wonderful but there is no such thing as good when you're talking about software. There is only appropriates. There is, is this correct for these users, for this use case, for this size site, for where I'm going to install this module, for who's going to maintain it a year from now. Good means something different depending on the answer to those questions. Good is context sensitive. When we're talking about software, going through this way too fast, I like to break down the various definitions of good into three general categories. Engineering axes, human axes, and quality assurance axes. So first, engineering. Code can be fast to run. This is time it takes for a program to execute. That's all. That's one measure of quality in software. It can be low memory. These two together are performance, but performance is not strictly speaking one or the other. Drupal, for instance, in Drupal 7, trades off trying to be fast for using a lot of memory. Something Drupal does an awful lot of is build up a gigantic lookup array and then using that array is very simple. Say, theme registry. That's a big array in memory. It's very, very fast. Array lookups and PHP are extremely fast. Big arrays are also very memory intensive. That's why Drupal 7 has a very high memory requirement. Good trade off or not? Maybe. It can be scalable, which is not the same thing as performance. Loosely speaking, performance and faster run is how long it takes to answer a single request. Scalability is how many requests can we handle at the same time? They're related, but they are different things. You can have systems that take a long time to run, but also can handle a lot of requests at the same time. It can be modifiable. That is easy to change the behavior of the system. Ideally without much work, ideally without modifying the code. You can just plug in a new component and change behavior. Which is not the same thing as extensibility. Extensibility is the ability to add behavior, add functionality. Modifiability is the ability to change functionality. Drupal is extremely extensible. It's actually not as modifiable as we think. It's getting better, but don't mistake ability to add a module for the system is very modifiable. It's not the same thing. You can't take out functionality necessarily. On the human side, those affect code, these things that affect people. Usability. This is primarily usability for end users, for site builders. And how easily can they get their task done? Which is different than understandability, which is for a developer. This is if I'm someone maintaining this code, if I'm reading the code for a module six months or a year later, can I figure out what the heck is going on? Versus a user, can they figure out what they need to do? What buttons they need to push? And learnability applies to both. Code can be very, a system can be very usable, but not very learnable. Take for example, command line. Who here is comfortable on a command line? I mean, a room full of developers, good. It took a while to learn, didn't it? It took a while to learn all of the git commands, all of the shell commands, all of the fun with grep and find and some other commands even I haven't figured out yet. Once you learn them, they are extremely powerful. The command line is an extremely usable interface. It is not a very learnable interface. A graphical interface, if done well, can be very learnable. It's very easy to sit down at it, pick it up and figure out what you need to do. It's very easy to ramp up on. But all that moving a mouse around and moving your hand between the mouse and keyboard slows you down and hurts usability. Some problem spaces can't be done in a very learnable fashion. Who here has tried to do 3D modeling? 3D Studio Max or anything like that? Wow, that's a lot of you. Those are hard programs to use, aren't they? I dare you to make one that's easy for someone to just pick up like that and do cool things with. It's an inherently complex problem space. We also need to think about expediency. This is, for the developer, how easy is it to just get this job done? You can take the time to build a wonderfully usable, wonderfully learnable interface and it takes time. Do you have the time to do that on this project? Maybe, maybe not. Is this code gonna be reusable? Do you care if it's reusable? Maybe, maybe not. Expediency, how easily can I just get the job done? Maintainability. If I come back in six months and need to fix a bug, how easy is it for me to figure out what is going on? What was the developer thinking a year ago? What moron wrote this code? Oh wait, me, nevermind. Raise your hand if you have not had that experience at some point in your career. I saw one hand go up and he's probably a new developer. In about six months you'll have that experience. Yeah, maintainability. How easy can I figure out what's going on later? How easily can I read this code? Finally, for quality assurance, testability. How easy is it to write unit tests? Not functional tests, unit tests for this code. How easily can I write tests to verify that the code is not broken and that when I change something, I don't break it without realizing it. Which is not the same thing as verifiability. Testability measures your ability to write to tests, unit tests, which verify that code is not doing what it's not supposed to do. Verifiability is about verifying and proving and confirming that code will do what it is supposed to do. The subtly important distinction, unit tests will not show that your code will work correctly in all situations unless you test it with all situations. Verifiability, there are ways to mathematically prove that code is correct. You can mathematically prove that code will do what it is supposed to do in all circumstances. It takes a very long time and your code has to be structured in such a way that it does not, you don't take a century to do it. Is that a worthwhile use of time? If you're writing a website, probably not. Really? If you're writing the control system for a nuclear reactor, yes! When bugs are measured in megatons, you take your time. Okay, all of these illities, whatever, Larry's not talking about this academic nonsense. Your client cares because they've got a budget. They've got a timeline. They need to not have bugs. Your boss cares when six months from now, the client is still filing bugs and you need to sit there and fix them and you're not getting paid for it anymore. You care in six months when you're the poor sucker who has to fix all of those bugs or has to fix a performance issue or fix a scalability issue or has to confirm, okay, why did that nuclear reactor just explode? This is why these things matter. This is why you should be thinking about these because they directly matter to you and your bottom line. Software architecture is good if it is good in ways that matter in this case. In this case. Not in all cases, in this case. Let's look at a couple of examples. Say you're building a website for a small church. I'm gonna make a couple of assumptions here. I'm going to assume it's a fairly low traffic site. You're not gonna get 1,000 hits a second. You're gonna get mostly anonymous traffic. You're gonna have 50 or 100 pages on the site total, something along that line. Most of them are gonna be fairly static content. Couple of event nodes, some news articles, some about pages, nothing really extravagant there. You're gonna care about how learnable it is for the administrator that you're handing it off to. That is going to matter because they're probably not technically savvy. They may be, but you can't assume that necessarily. And they're not gonna be using the site on a daily basis. So they need to be able to ramp up on it every other week again. They're never gonna become an expert user. You care about expediency. Most small nonprofits, small churches, small community groups, don't have a huge budget. They don't have time to pay you to build everything perfectly. They just needed to work for their case and be done with it. Most, many of those will be running on shared hosts. Shared hosts don't run APC. This means Drupal's memory requirements are going to suck. If you can convince them onto a VPS, great. If you can convince them onto Aquia's hosting or Pantheon, great. You probably can't. I've built these sites. We often can't. And so doing things in a memory-efficient way matters a lot. On the other hand, scalability. I don't expect the site to get more than 100 hits a day so that it's never returning more than one request at the same time. So I don't really care if it's going to scale across multiple database servers. Really, I don't. Verifiability, it'd be nice, but they've got better things to do with their time and so do you. Not that these are unimportant. These are less important. And if the site's not scalable, it's not the end of the world. On the other hand, so you're building a major media site for, give me an example of a media site that uses Drupal, Sony BMG. What's that? Python. Python? Lifetime. Lifetime. There's another good example. You're gonna be getting hundreds of hits a second. You're getting, you wanna have a lot of social networking features. People commenting, people posting to Facebook and Twitter. You're going to have millions of pages. You're going to have news nodes coming out to your ears. You're gonna have static content everywhere, events all over the place. Your content is changing minute by minute as new events develop. You're covering a sporting event and you're posting updates to a news article every time someone hits the ball. You care about scalability a lot because you have to serve 100 hits a second. You care about performance. You care about how long it takes a page to refresh. What is that number from Google? 200, 300 milliseconds and people get bored and go away. Google slowed down their own site by like 300 milliseconds and saw a significant drop in traffic. You care about performance at that point. Do you care about verifiability? If you're dealing with content that you get sued, if you do it wrong, you care about the verifiability of your publication process. Maybe not the whole system, but you care about the verifiability of certain parts of your workflow. And you should be willing to spend the time and money to do it right so that you get all of those. Getting it done this week by five o'clock is not your top priority or it should not be if you want to be able to pull off this kind of site. Is the site mostly anonymous traffic or authenticated? If it's mostly anonymous, page caching. And if you spend an extra 10 bucks, put varnish in front of it. Drupal site with varnish in front of it can serve 5,000 requests a second. Who cares what performance is on the Drupal site? Your caching strategy is varnish solves the problem. Done. Most requests are never gonna even hit Drupal, so if it takes Drupal 800 milliseconds to generate a page once and then it's cached and served in three milliseconds after that for every other person who comes to the site. Okay. Do you care about the memory usage? Well, yes, but rarely you're going to have 100 requests sitting to serve at the same time. You have two or three. And so add up the memory requirements of all those requests. You don't need to be as concerned. Scalability. Your scalability is varnish takes care of it. Now, there's a limit to that. If you have more than 5,000 requests a second, you need to do more than just put varnish in front of it and go home. But, your trade-offs are different. If you have authenticated users, in Drupal 7, varnish doesn't exist. All authenticated requests just go straight through it. Page caching, straight through it. Which means every single line of PHP will get executed. You need to be a lot more careful with what your caching strategy is. Can I cache individual blocks on the page? I probably should. Does this views query take 300 milliseconds to run just for that query? If it's authenticated users, I care. If it's anonymous, maybe I don't. And so I need to do views caching or block caching just on that one block. And again, at that point, take the time to do it right. Experiency doesn't matter as much. To what extent do you care about mobile? So who here has been to something in the mobile track at this conference? The fair number, the rest of you should at some point. Responsive design is all the rage these days, but it has limitations. Responsive design can take longer to do. Because you have to think about not just one screen size, but four screen sizes or a scale of screen sizes with browsers with different capabilities. You can't design your site to have 50 things in a sidebar and assume that everyone's on a cable modem or a fast corporate internet because a lot of those mobile devices have slower connections. If you have 15 ads on your right rail and you try to load that up on a mobile browser for someone who's on just a mere 3G, then 10 minutes later they're gonna give up on your page loading go elsewhere. What is your target audience? Look at your own browser stats. Don't look at what the average of the world or particular market. Look at your browser stats. Look at your traffic logs. Good example there. I run a small RPG club website. Small as in not much traffic, but about a half million nodes. About 200 people use it, but it's about a half million nodes. And looking at a Google Analytics data a while back, I found that less than 5% of our traffic came from a mobile device. So when doing a redesign for it, do I care about mobile? Not a great deal. According to Google Analytics, we had more users on Android, just Android that Internet Explorer six, seven and nine combined. Do I care about mobile? There are news organizations that are saying their traffic is 30% from mobile devices right now. And that's up, that's doubled from six months ago and it looks to be continuing. Do they care about mobile? Yes. Is it worth the time to do a responsive design? The answer is not always yes. Mobile track will yell at me for saying that, but it's not always yes. So you're building a module. Is this module site specific? We hear it's written a module that exists on the one site is four and that's it and it doesn't go anywhere else. Yeah, pretty much everyone. You don't need to go out all out on that. Get it done, go home at five o'clock or earlier if you can. What you're gonna care about there is when the client changes their mind the day before launch about something can I as the developer go in and change it quickly? You don't really care about the usability of it because it's gonna get configured once. Don't trick out the configuration page for it. In fact, don't use a configuration page for that. It's site specific. If you need to make something configurable, put in a variable and put your variable configuration in settings PHP. You won't be able to do that in Drupal 8 with the config initiative, but there'll be some equivalence to that. Don't really do a usability study to figure out the best UI for a single admin page for a module no one is ever gonna use except you. Really, it's not worth the effort. And you don't really care how extensible it is because the client probably is not going to want to do anything different and if they are they're gonna come back to you and change it. Don't try and build an extensibility and guess what they're gonna need six months from now. Just make it possible for you to fix it later. Are you releasing this module to Drupal.org? Who's written a module for a client site that they have then released? Thank you. If you're doing that, do you want to integrate with rules? You should be asking that question. Do you integrate with views? You should be asking that question. Maybe, maybe not. But you should be thinking about do I want to tie into these other systems? Do I need to make it extensible? Do I need to make it scalable? Once you release code, you do not control where it gets used. Good example here. Who used, who's been around since Drupal 5? Nice. Who remembers node reference in CCK and Drupal 5? Whoever tried having a node reference against a given node type where you had 200,000 of that node type. You know what I'm talking about. So for those who did not just raise their hand, node reference in Drupal 5 to when you hit save would load up the title and node ID of every single possible node that you could be referencing to check that you're referencing a valid node. Which means that when you hit save and you're checking against 100,000 possible nodes that you could be referencing because that's how many of that node type you have, your memory explodes and people can't save nodes anymore because no one thought about that level of scalability. If you're releasing a module to the wild, think about that kind of scalability. Think about what happens when someone throws a million records at this. How is it going to degrade? You should make it testable. Write your unit tests because you're supposed to do that. It makes the code better. It makes it easier for other people to submit patches against your module later without breaking things. You care about that. And again, expediency. Not that expediency is unimportant. You have a client to deal with but it is not necessarily as important as it would be otherwise. It's not as important when you're doing a site-specific module because you want something that you can extend, that you can use later. Different situation, different needs, different use case, different priorities. Your admin experience. You have one content editor. You probably don't need complex workflow. You're gonna train one person on how to use Drupal. You probably don't even need to trick out the admin necessarily. Give them three hours, train them on Drupal admin. Drupal 7's admin is quite nice. What you care about there is their usability for that one person. Can that one person get their job done quickly? You care about saving budget, expediency. And learnability. You're gonna train one person and be done with it so how easy it is for them to pick it up at random is not as important. Not unimportant. Not as important. So you have 100 editors editing different parts of the site and you have rules, business rules about who can publish what. You need to have some kind of workflow system in place. You need to have some kind of access control, work bench module, taxonomy access control, something like that. You may want to take the time to trick out your editorial process, build some custom admin views or use a module that has some built-in. Think about what happens with unpublished nodes? Who has access to view them? What limitations of Drupal do you have to work around in order to make that work and fit their business needs? You're now going to care about understandability of the code. You're gonna care about how extensible it is when they change their business rules in six months and come back to you and say, okay, we've got 10 bucks for you to finish this and change the rules for us. Raise your hand if you have not had that situation. Yeah, that's what I thought. You're gonna care about testability because again, if you're dealing with embargoed content where you get sued if something gets published too soon, you need to be careful with your workflow. You need to test it, you need to verify it. Building a module. Do you want this behavior, whatever it is, to be overwriteable by another module? Don't assume yes. That's modifiability, but you then can't assume that data's not going to change out from under you. It's not necessarily worth the time to make something extensible and flexible in a given way. Do you want it overwriteable by a specific site? Different question. This is on the theme layer potentially. Do I want this piece of HTML overridden in the theme? Knee jerk response. Well, of course, I want my themers to be able to do whatever they want because they're responsible for the markup. You're writing JavaScripts that triggers a certain behavior. That JavaScript depends on certain classes or IDs that are present. Don't let your themers change those or they will break your code. Not because themers are bad people, they're wonderful people, but there are certain things that you do not want to be changeable because you are building assumptions around them. Do you care if this code is fast? I don't know. Is this happening on every node load? Yes. Is this happening every so often when an admin pushes a button? Maybe. Is this happening on cron? Not really. If you're sending an email and it takes a while to format that email, you do it in cron. You do it in a queue. You don't do that in the actual page delivery process. And then, meh, performance is not a huge deal. It matters, but not as much. If it takes an extra four milliseconds to send an email or an extra four milliseconds to compile this index of something, whatever, it's four milliseconds. It's four seconds. I don't care, it's happening on cron. Is this happening every time someone hits in the new summary page? I care about every single millisecond. You can't have your cake and eat it too. Good, fast, cheap, pick two. If you want something to be modifiable, chances are you have to add abstraction to do that. Abstraction costs. Abstraction costs performance. Extra function calls, extra lookups. There are a lot of things you have to go through. A lot of hoops you have to jump through to make something modifiable. Is that appropriate? Is that worthwhile? Maybe, but you are going to her performance with modifiability much of the time. Not always, much of the time. And the fast performance you'll get is something that is completely hard coded to your use case and does nothing else. But it's totally not modifiable that way. Making something modifiable takes a while. Doing it right takes time. Do you have the time to spend to build in the right hooks? To build in all the right theme functions. Do you have the time to think through what all the extension points are? Maybe, or maybe you just need it done by five o'clock because that's when the site launches. So you don't worry about that. Drupal uses, does not have much in the way of unit tests. Drupal has integration tests that test the entire system. It is very, very hard to write unit tests for Drupal because Drupal passes these bare data structures around. Nodes, forms, users and so forth. And let's hooks, which exist anywhere, modify almost anything. This is extremely powerful for extensibility. Drupal is an incredibly extensible system. That is one of the reasons it's so successful. It is also virtually impossible to unit test properly in our current design. Many forms of extensibility make that testability hard. Unless you design that in such a way that you can still encapsulate your extensibility and then you have that cost of abstraction, you have that extra layer of complexity that you have to work around and they have to deal with. Is that a good trade-off? Sometimes. Who here has tried to wrestle through pre-process functions or process functions or render arrays or all that kind of stuff? Yeah, I'm seeing all the hands go up again. Hook page alter lets you do anything. Who here actually understands what happens under that part of the code? I certainly don't. When things can happen anywhere, things could happen anywhere and you cannot keep track of it. When you make that kind of flexibility, you're trading off not just performance but how easy it is to just read the code and see what goes on. Verifiability is hard. If you want to write code that we actually care about verifiability, you don't do it in PHP. Bottom line, go use a purely functional language like Erlang or Haskell. Anyone heard of Erlang and Haskell? Anyone written in Erlang or Haskell? Yeah, I see them much smaller. Who raised their hand only for the second question? You're just weird. These are purely functional languages, which means that you cannot change the value of a variable once it's been created. You can still write any program in them, but you have to do it in a different way. Those different ways make it much easier to mathematically verify that your code is correct. You can prove an Erlang program correct, mathematically with certainty. You can't do that in PHP. You really can't. You can confirm some things, but you can't really prove its accuracy. Please do not use PHP to write a nuclear control system. Has anyone here written a nuclear control system? Okay, no hands, all right. Your choice of tools matters to your situation too. On the broad scale, PHP versus Erlang, or Python, or Node.js. At a smaller scale, views, very expedient. You push a couple of buttons. And that level of abstraction costs. You can, in some cases, write much more efficient queries that simply do less in PHP and SQL, writing your own code with entity field query. Is it worth it to write code and then have to go in and modify the code yourself by hand when you want to change it? Maybe. Or maybe it's better off to just use views and push buttons and have a nice day. Time to stop being negative. The kind of abstraction that we talked about before helps modifiability and testability. Why? Both of these require taking a portion of functionality, drawing a fence around it, and pulling it out. And then you pull that piece out and put it into your test environment and test just that one piece. Or you take another piece and put it back in its place. Think views plugins, views style plugins. Easy to swap pieces out. And, if done properly, you can also then test each individual plugin on its own. Code that is written in a test-driven development approach is inherently modular because you have to, in test-driven development, carve out just this one piece that you're going to write unit tests around. And that just one piece is then much easier to turn into a modular system. Now when I say modular here, I do not mean Drupal modules. Drupal modules are a form of extensibility. I mean, can you take out the cache system for the database and put in one that is backed by memcache? Actually, we can do that in Drupal. That's nice. If you want something to be extensible, you make it modular. Because then, not only can you add, you take out one implementation, put in another one that does more than the other one, than that first one. Or put in a new implementation that supports dovetailing multiple into each other. Who here used the cache router module in Drupal 6? There's a case where modifiability and extensibility help each other because you could swap out the cache system and swap it out for one that has multiple cache systems inside it. If you do your job right, it's easy to change things later. Modifiability is how expedient it is to change later, not now. Do you think you'll be changing this in six months? Is it worth it to take the extra six hours now to make it possible to change in six months? Maybe. Testability and modifiability are complementary. Performance. Vaguely speaking is measured in seconds per request. And if you're taking multiple seconds per request, you're doing something wrong. But, that's the measurement. Scalability is the number of requests per second. Again, kind of hand-wavy. Little bit of math. If something is more performance, it usually improves scalability. Not always. Usually, faster code is more scalable. So if you are writing a system you know has to be scalable, you care about performance more than you would otherwise. Don't get carried away though. If you're counting functions, you probably are looking in the wrong place. Not always, but probably. Unless XHPROF tells you otherwise. Has anyone here used XHPROF? Anyone who doesn't have your hand up, go use XHPROF. We use Drupal. Most of the code that we write, most of the code that we deliver to a client, we didn't write. It came from someone else. You want your system to be maintainable. You want the third-party code you're using to be maintainable. You want it to be easy to learn because you have to learn it. You have to learn how to use this system that you're about to leverage. You want it to be reliable. You want to be able to confirm that the code you're about to use is not buggy. That's what makes it expedient for you to use third-party code. We've all used existing code from somewhere before, right? Okay, you're all liars. You've all used Drupal. We've all used third-party code, right? All right. Because it's expedient, because it has these attributes, because the code you know is maintainable, learnable, reliable. This is what makes popular Drupal modules popular. Because they are maintainable and learnable and reliable, and that makes them expedient for you to use. That's also why Drupal is adopting Symfony for that exact same reason. When you try to have your cake and eat it too, when you try to do everything in one site, when you try to do everything in one module, you end up with that. That's disgusting. And it probably fails as soon as it touches a wall. It's gonna be terrible in a crash test, and it's going to crash. Your job as a site architect, your job as a module developer, your job as a consultant, your job as a core developer is balance. It's figuring out what situation am I in and which of these attributes matter. Is this worth my time or is this worth my time? Is this worth 10,000 lines of code? Is this worth 10,000 lines of code? Your job is not to write the best code in the world. Your job is to write the most appropriate code in the world. Thank you. We've got about 15 minutes for questions. There's a microphone in the middle of the room here. Please use that so it can get recorded. And while people are lining up, please do, for this session and all others, please do go review sessions, give feedback. As speakers, it's great for us to know what people found valuable and what they didn't. It's also great for conference organizers to say, okay, who do we bring back? What can we help people improve on? Questions? Nobody. Yeah, please use the microphone. Here's one. Thank you for the great talk. Up until seven, we have been using the modules with the core technology that we have been adopting since 4.6. What is the real reason to use symphony? If it is the expedient so far with the approach we have used, what is driving to use symphony? What's driving symphony? What makes it expedient? Drupal, as of version seven and earlier, is not a rest framework. Drupal assumes, architecturally, everything is a web page. Everything is a complete web page that we're going to return from HTML tag to HTML tag. And everything else is an edge case done with a hack. Six years ago, that was true. Most of what you're returning is going to be a page. And if you're returning something else, it's an edge case. That's not true anymore. In today's web and the web three, five, seven years from now, that's not the case. We want to be able to return parts of pages. We want to return JSON to SVG. Drupal is going to need to deliver a lot more than just an HTML tag to an HTML tag. And we spend a lot of time trying to figure out how we would go about modifying Drupal to do that. And then last November, December or so, we'd already decided it was expedient for us to say we need better handling of the HTTP request and let's use symphony's HTTP library because that's a pretty good library. Let's just use that on its own. And then looked more and realized, the architecture we've been talking about building for Drupal 8, symphony is already implemented. They've already got a good system for it. It's already deployed on hundreds of thousands of sites. It's already tested. Use it. What we wanted to do, they've already done. Why waste our time rewriting that exact same code? That's what makes it expedient to use symphony components for our HTTP handling. We're not looking to change the entity system, for instance, with that. The symphony full stack framework uses the doctrine database abstraction layer and object relational mapper. We're not looking at those. That's not on the table. But for those things that we wanted to change, why spend our time re-implementing what symphony's already done, let's just use it, it's open source, cool. That answer question? For more information on that question, actually, right after lunch in this room, Thébien Potentier, the Project Lead for Symphony will be talking about all the pieces of symphony Drupal will be using. I do recommend going to that session. Next. I'm new to Drupal. We are moving into a project which will require Drupal. My question is related to scalability. When should we look into adding more servers, enable caching? I know there's a lot of options available in Drupal, but where do we start looking for measurements? When should we say we need more database servers or more web servers or neither of them, we just want to enable caching? Can you give us some kind of guidance? Where should we start looking for? That's a really complex question, and I can't really give you, when you hit this many requests per second, you need to do X. In general, I would say if you expect any significant amount of traffic, put Varnish in front of it, put Memcache behind it. Just start there, that's easy. If you go with someone like Acquio or Pantheon for hosting, they'll have that built in already. A lot of other hosts will do that. You can set those up yourself if you're self-hosting. You mentioned Memcache, what was the other one? Sorry. Varnish. Varnish? Who's not familiar with Varnish? Oh, wow. All right, Varnish is a proxy cache server, so short version. Request comes in from a browser or from some other user agent, hits the proxy server, Varnish, and then Varnish makes a request to Drupal and caches that request in memory. And then the next request comes in, Varnish already has that cached and it serves as a free out of memory. It's the same idea as Drupal's page cache, but a zillion times faster, basically. And if you expect a lot of anonymous traffic, you put Varnish in front of it. At this point, that's not a debatable question. Memcache is a more performance caching back-ends. Again, it does everything in memory. And it's just more performance than Drupal's default database cache. Beyond that, you'd really have to look at what your traffic stats are, what your targets are, and are you running into performance problems with page delivery time? Okay, measure it. Is it the database that's slow or is it PHP that's slow? Because if your database is nice and fast but PHP is taking a long time, throwing more database servers at it ain't gonna help. On the other hand, building out four web heads is not really gonna help if it's your database server that's crawling. So you really have to look at your particular use case and exactly what is slow for you and then figure out, okay, what do we need to tweak? Next. Do you have any advice for startups in terms of how quickly can you prototype and then for somebody like, if you're trying to build something in social media and you expect a sudden increase and you didn't think of scalability initially, do you have any advice as to how to go about that so that we can turn our code into extensible and scalable really quickly? So like what to do when you get slash dotted? Right, or should we start with keeping all that in mind and slow our prototyping process? If you think, okay, if your business is such that you think you're likely to hit that point at some point, plan ahead. As I said, put varnish and memcache in place and that will take care of a lot, not everything but a lot. Don't necessarily spend the time to optimize everything but know in advance when we need to optimize, here's where we're gonna do it. That's, if you're doing a lot of custom code, design it to be modifiable. So you have one thing you plug in here, you're not gonna bother optimizing it but you know that this piece, when that gets slow, you can rip that out and put in a new one when you have time to write that. Now when you have time to write that it's another problem and that's not a software problem but that kind of planning ahead for knowing where you're going to tweak when you need to will help. Okay, thanks. Thank you. Well thanks, it was a very good talk. I thought what you were saying about the difference between extensibility and modifiability was particularly interesting. I was wondering, I know you could probably spend an entire talk just discussing this but could you maybe point to a few other design decisions that were made in the development of Drupal over the past years where there have been trade-offs that have been made and if there are problems with those trade-offs, what sort of things can be done in future core development to mitigate that? Oh dear. I don't wanna kiss and tell but I've got a few examples. In Drupal 6, CCK Fields lived in an SQL database and that's the only place they could live and there's a lot of optimization done to play around with the structure of those tables to make it fast or faster. In Drupal 7, Fields and Core, the database is completely normalized so it's a very academically pure and modular database structure but not necessarily all that performance. And that was done because in the process we also made it swapable. So examiner.com for instance, they don't use SQL storage for their nodes. They put all of it in MongoDB which is way faster in their use case than SQL is. The way they've got their site tricked out, they can serve up authenticated users without page caching in 20 milliseconds, something obscenely fast. In part because they could take advantage of that scalability enhancements to make that field storage pluggable and swap in something faster like Mongo at the cost of performance for the default SQL case. Is that a good trade-off? Depends who you ask. If you ask an examiner, absolutely. If you ask a small church site that doesn't have the budget for something like that, maybe not. Trying to think of other examples. Another good one, the database layer in Drupal 7 which I was one of the lead architects for has a series of query builders that are much nicer to work with than raw SQL if you've worked with them. They are slower than writing raw SQL because you have to spend the time in PHP to compile the built query down to a string. If you are writing a select query in Drupal 7, pro tip, do not use DB select unless you actually need its capabilities. It is notably slower than just calling DB query. If you need DB select, by all means, use it, but it will be slower than calling DB query with your own query string. Depends on the complexity of your query. There is a memory overhead for it and there is like 30 function calls involved in that process. I don't know that we have solid benchmarks on it, but my general advice there is use DB query unless you need DB select. And if you need DB select, but the performance of DB select is a problem, you probably have other things to optimize first. So it's not a huge issue, but don't use it gratuitously. Yeah, oh, thanks. I think that's some really great examples. It kind of goes along with what I was thinking about. Drew's keynote the other day that seems like, from my perspective at least, the trade-offs that are being made in terms of architecture for Drupal moving forward are beneficial to the large sites like examiner.com to the small, hobbyist sites, both from the perspective of performance of what resources they have accessible to them and just developer experience might not be as beneficial. So I think that's something we'll have to watch. That's one of those cases where, that's actually another good example of where things can be complimentary. Drupal 7 is extremely flexible and not really learnable anymore at the code level because nobody uses complex arrays the way we do. Nobody. Drupal 8, we're shifting a lot of functionality over two more traditional object-oriented approaches, over two more established patterns. And that actually improves learnability for anyone who's not the self-taught hacker in their parent's basement. Now 10 years ago, that was the majority of Drupal. Today, that is not the majority of Drupal. Majority of Drupal developers work at a company, are trained in computer science or whatever, and so shifting to something that's learnable for them is a good trade-off. And hopefully it can help with performance too by being able to leverage edge-side includes and varnish more readily. So that's another one of those places where the trade-off is not always obvious. That's good. Thank you. Is there an easier way to clear varnish cache while you're still developing? Don't develop with varnish cache on unless you're actually testing varnish. Pantheon, I think, does it by default. I don't know. I haven't actually run aside on Pantheon myself. Oh, okay. So I can't speak to that. Hi. So you mentioned memcache a couple of times. We've been trying to work with memcache and APC. Is there any preferred one? They do different things. APC is a PHP opcode cache. So the way PHP works normally, request comes in, web server handles it, loads up PHP, reads code file off disk, compiles it, executes it, and throws away the compiled version. Next request comes in, it has to read it off disk and compile it a second time. APC saves that compiled version, so you skip that whole read off disk and compile process. And then it can also then do shared memory so that same file across 100 requests is only used in memory once. You want to run Drupal with APC. There is no exception to that rule. You run APC with Drupal. Memcache is a replacement for Drupal's caching system. So with page cache or the filter cache and stuff like that. Drupal's internal caching. Memcache is a replacement for that system. And it is also beneficial there, mostly by taking pressure off the database server to do the complex stuff, while the relatively trivial but high volume cache requests go purely in memory with Memcache. So that's the advantage there. But most high end sites run Varnish, Memcache, and APC. So in case where you have multi-site setup where you're using the same code base, APC is sort of, you know. If you're running multi-site, APC is even more important because the same physical file on disk will be cached only once. And so your memory usage with multi-site will be much lower than the same exact sites in two separate instances. Actually, if you'll look at the Palantir blog, I have an article coming out about multi-site probably in the next week or two, at palantir.net slash blog. Thank you. Other questions? Yeah, I actually have two questions, I think. Okay. One of them is, how will caching work with REST services? Will you be able to cache JSON queries and stuff? The current plan there is to, again, do what Symphony does, which is HTTP has caching logic built into it. Use it. Varnish will do that. If you don't have Varnish, Symphony includes, and some of the components that we're using include a PHP implementation of an HTTP cache. They basically have Varnish ported to PHP. And that means you have a JSON request, it's an HTTP request. You look at the expires tags on it, you look at the validation tags on it. And the logic for what you're supposed to do there is already established in the HTTP specification itself. So when in doubt, do what the spec says. That's the general plan. I believe Fevia is talking about that a little bit in his talk after lunch. Second question. So you might have a follow-up. Okay. Is that browser dependent? A browser that does not implement that part of the HTTP specification is not a browser I'm willing to support because it means that they haven't bothered to implement stuff from 1991. We don't support IE6 anymore. We don't support Nescape 1.0 either. And they probably supported that too. So yeah, a lot of that stuff, everything supports at this point. Except us. Yeah. And at that point, it becomes the browser problem if they have a bug. Is there a reason to use mySQL views or stored procedures or anything like that to make Drupal more performant? That's again a very platform dependent question. MySQL views, so who's familiar with SQL views? Okay, for those who aren't, views in SQL, not views in Drupal. Views in SQL are basically a query that you write and save, and then you can use as if it were a table. In mySQL in particular, it doesn't actually save that actual result set anywhere. It doesn't keep it up to date. It's just an alias to make your query simpler to write. So using SQL views would not help us in the slightest. You can just put a subquery inside your from clause in your query and get the exact same performance benefit. Stored procedures might have advantages, but those are completely non-standard across databases. So Drupal makes the trade-off that we don't do database-specific optimization in favor of database portability. That's a trade-off that we've decided to make in most cases. Actually, the database layer in Drupal 7 can do some database-specific optimization, but not a ton. There's also a good argument to be made that business logic, like stored procedures, does not belong in your database. It belongs in your application code in the first place. So I don't see views in stored procedures as something Drupal leverages anytime soon. Are your slides available online? Slides will be available online soon. I'll be tweeting that, and probably posting on my blog as well, garfieldtech.com. All right, enjoy lunch. Please tip your waitress. See you around the conference.