 So, I want to talk today about a project I worked on at Wayfair. Wayfair is an online furniture and home goods store. And we recently rolled out PHP 7 with minimal incidents or downtime. So I want to talk about how we did this. The real story behind this project is how we leveraged testing. So I want to talk about, start by talking about testing and why we test. When engineers talk about testing, we usually talk about unit tests or test driven development. Unit tests, for those who don't know, are a form of automated test. The idea is that by breaking your code into testable units, you can have tests that cover every possible code path. This example is from the getting started guide from PHP unit, a popular testing framework. The idea with test driven development is that as you write code, you also write tests as a way of ensuring correctness as you complete a project. And since it's all automated, you can run tests quickly and integrate them into build processes, and as you code more and more, you can ensure that you're not breaking things as you're adding functionality. But with this approach, it's not an approach that everyone likes. It's certainly something people talk a lot about, but in the wild, it doesn't always get used. If you've ever tried to convince a colleague to write a lot of unit tests, the conversation is probably going something like this. Someone says, you know, you should really write unit tests. And someone responds and says, well, I already tested by myself. I work. I don't need to write automated tests. But unit tests will do it faster, letting you work faster. But I really only have to do it once. It's not that slow. Unit tests will catch regressions. But this code doesn't change that much. Let's take a step back and describe the underlying model. Software has acceptance criteria. You can codify it, and you can automate it. As we all know, this is why everyone writes unit tests and why software never breaks. The problem with this model is it supposes a level of noability that's unrealistic. Software, especially web applications, resembles an organism more than it resembles clockwork machinery. It's not practical to create tests that would simulate every way a website is used in the real world. A better model is to think about testing in terms of risk. To understand this, let's talk about the project we did at Wayfair. PHP 7 has been out for more than a year now. It's commonly understood to be faster. A handful of companies have published metrics showing this. But if that's news to you, I can show you our charts. Each one shows the server render time for a different page type. Yellow is PHP 5, and green is PHP 7. The results we found were that page execution time dropped by about 50%, and CPU utilization dropped by about 30%. And the real net benefit is that we can serve more traffic with the same amount of hardware. If you're interested in the mechanics of why this happened, it really all boiled down to how PHP variables that you create in userland are represented internally. And there's a couple great blog posts here that can describe those changes. But let's travel back in time to before PHP 7's release when it's released. And I'm sure you've heard of it. Most engineers weren't as well understood. At that point in time, most engineers' understanding resembled this cocktail napkin. Very tantalizing, obviously. Let's compare what we knew about Wayfair at this point. It's an online retailer that focuses on furniture and home goods. We get about 2 million visitors every day. We've been around since 2002. We didn't always use PHP. When we were starting off this project, we had 3.5 million lines of code over 28,000 files. It was mostly our own code, but also some third-party composer packages. Code and convention spans several versions of PHP at this point. All of this code relied on 66 extensions. Many of these are officially supported parts of PHP core. But there are also third-party extensions, some third-party extensions we modified, and some code that was totally custom to Wayfair. What I'm trying to get at is that we had a lot of risk. A simple way of describing risk is what you don't know. We knew a lot of the expected benefits of PHP 7. We didn't really understand if they were real. We didn't know how much of our code was compatible with the new version. We didn't know how much effort we'd have to put in to make it work. If we go back to our napkin, it's safe to say that it's only useful for setting a drink on. If you brought it to your manager and asked her, can we do this project? What do you think she would say? We can make the case for this project by increasing what we know. That's to say, we're going to increase, sorry, reduce risk, and we're going to use testing tools to do it. So what are we going to cover? We're going to talk more about this model of risk as a thing that you don't know, and how to identify risk in complex applications. We're going to talk about common sources of risk on big infrastructure projects like this. We're going to talk more about this model of testing as a way of risk management, a form of risk management, and different manual and automated tools you have to measure risk. What I'm trying to poke at basically is this idea that the underlying activity of software engineering isn't necessarily building things, it's gathering information. A lot of what I'm going to talk about is product management strategies. A lot of people think of product managers as people who figure out features that people should build and what they should look like maybe. But really the underlying goal of a product manager is to make sure that a team is working on the most important thing for a business at a given point in time. This is something that matters just as much with purely technical projects like an upgrade like this. So I'm going to talk about how to identify goals for technical projects, how to kick them off and run them, and how to choose the most important thing to work on in service of that goal. Basically, this is all about figuring out how to use engineering projects to deliver value to the business, which as people progress through their careers is increasingly important. Before talking more about tools, I want to talk about common risk with infrastructure projects. Some of the unknowns to juggle aren't just technical, they're organizational. A lot of them stem from the fact that there are other projects going on at the same time. It can be easy to get excited about these projects. This is usually because people are focusing on the desired outcome, but that's our cocktail napkin. It's the place everyone wants to get to, but we really need to think about how we get there. These projects end up having some other common qualities. One is that the people working on the project are probably working on other things at the same time. These could be other software projects, or they could also be other big infrastructure projects like the one that you're working on. You may have to figure out ways that your project could potentially collide or interfere with other big projects and think about how to manage that. Whatever you're doing, there's probably going to be in-between states with your system. You're probably not going to snap your fingers and in a moment you're suddenly going to be on PHP 7. There's probably going to be a slow period as you get your systems there, so you have to accommodate for that fact in your planning. There's the idea that something doesn't work, it works once. You can see something in production, but unless you see it consistently and continuously, it is something that only happened once. And because of that, you really live and die by your ability to monitor your production systems. Given all the effort that you have to put into projects like this, they might not be worthwhile. It could be that you'd have to do so much work to undergo some kind of transition, migration, upgrade, deploy, whatever, that it might not be worth it in the end. All of these things together mean that disaster is really more likely than success when you're taking on a big engineering project. This is kind of a fatalistic world view, but it's something to be conscious of and something to plan around. How do you manage some of these common concerns? One important thing is to always communicate in terms of value to the business. When you're talking about doing something like deploying PHP 7, you don't want to tell people now we're going to have spaceship operators because they're nice, but probably not directly driving value. You probably don't even want to say PHP 7 is faster because that's kind of another fact. What you want to talk about is how performance can save the business money. So you can say, you know, if this turns out to be twice as fast, we can serve more traffic with the same hardware, which means we could require less hardware or that we'll be able to grow our business without investing more in infrastructure. These are important things that senior leadership cares about. That said, you don't want to promise too much too soon. You don't want to start off by saying that we're going to save a ton of money and everything's going to be twice as fast. You want to share as much as you know. You want to talk a lot with the broader organization to understand how this project could collide with others. And throughout the project, you want to be working the room. You want to share your successes as they emerge. This helps reinforce the idea that your project is headed on the right track. If you disappear into a hole, then people may not necessarily trust you or have faith in you. But if you're sharing your work as it develops, people will be more comfortable. This reinforces the idea that the project is headed in the right direction and that it's worth continuing to fund. One thing that helps here is identifying allies at different levels of the organization. People on your team are really a form of ally. Someone who's doing work to get the project done is an ally in achieving the goal. But also, say, maybe your manager or a chief architect or your CTO may be someone who's an ally for the project. Find ways to share things with them that speak to the things that are important for them so that they'll continue to be interested in your project. Also, look for opportunities as you work through the project to deploy smaller pieces and get value out of the things that you're working on. It's much better to realize value incrementally over a long period of time than to wait for that entire duration before releasing anything of value. So all that said, planning, where do you begin? We started this project by assembling a big picture view. We never tried to know everything about PHP 7. Like I said, there's a limit to knowability. We just tried to understand what we didn't know and what was most likely to go wrong. This let us rationalize our priorities. We started this big picture view by identifying all the dependencies for our code and understanding if they behave differently in PHP 7. Thankfully, PHP makes this easy. You can run either of these commands to tell you which extensions are installed and how they're configured. PHP Info is a function that you can run from a script. PHP-i is something you can run from a command line. There's actually a ton of command line switches from the PHP command line interface that tell you about how PHP is set up, which are quite helpful. We then compared this information against some published documents about what changed in PHP 7. This allowed us to build a simple backlog of tasks. This is the first version of that view from October 2015. We used a wiki page. However you track this information, the important thing is to maintain a single source of truth about the state of the project. It's easy to waste effort if it's hard for people to understand what has and hasn't been done. Let's pick this document apart and understand what it says. It might not look like it says a lot. We don't know if any of these extensions work with PHP 7, but it does tell us something. We based it on the PHP Info output from our server, so we knew what was in use. It also starts to explain where we got this information. This is important to be able to put the notes in context. It's one thing to say, you know, the MQP extension is compatible with PHP 7 because the project maintainer said so. It's totally different to be able to say it's compatible with a specific use case of a specific application that we tested ourselves. If you're vague about what you know, it's easy to lose track of risk. Once we had the skeleton, we could start adding detail. This was the first revision of our task list, which we made the same day we made the skeleton. In our wiki page, we stuck it on top of the supporting information. Most of the tasks are still very general. You'll see a lot of things that say investigate or remove. So it's very broad, but note how it's still a lot more specific than our original view. We're not trying to say too much too soon about anything. The open questions to the risk is still quite big. We have hyperlinks so we can document where we learned particular things. In the line about the eReg extension, we recorded where we learned which functions were still in use. We didn't know when someone would pick this up, so we wanted it to be clear that they shouldn't assume too much about what needed to be done. And we explained how we came to that conclusion to make it possible to reproduce and validate. So some of the strategies I've been hinting at. You want to start simple and naive. You don't want to assume anything until you do the research. As you begin making tasks, don't start with very specific tasks. Start with similarly broad tasks. It's very easy to take a task like investigate APC extension and break it down into smaller tasks. It's much harder to start with a smaller task and build out around it other things you might need to do. Start by over-explaining. Link to sources in case anyone needs to work backwards. Be super clear about everything because a wide range of people may look at the information that you put together and may have different sets of, you know, their own context and what they bring to understanding it. The next big question once you have a list of things to do is how you spend your time. The first list we generated was unsorted. As I suggested before, though, you want to manage your time to help manage risk. Talk about this through one specific example. PHP 7 dropped some extensions, including some we used. We knew we'd have to find ways to remove these dependencies. Eredge and MS SQL were two extensions we knew we'd have to work around. So let's imagine you're doing this project entirely by yourself and you're trying to decide where you spend your time. Let me describe both extensions. Eredge, it's a small extension that resembled others. It let you do regex matching, mostly identically, to the PCRE extension. MS SQL is a bit bigger. It was replaced by PDO, which has very different syntax. We'd actually already started working with PDO, so we knew some of what we'd be getting into. We knew that included different functionality and different behaviors. There is also the key issue that our website relies on database access, so there's very little that we could test without it. So if you could only do one thing, which would you start on first? Would you raise your hand if you'd start with the Eredge extension? Yeah. Who'd start with MS SQL? Yep. We started with MS SQL because it's where all the risk was. All the risk includes the things that could derail your project and make it not worth working on. You want to learn those first, so you invest as little as possible in order to learn this. This was admittedly a very stark example because of how the MS SQL extension would get our ability to test and reduce more risk. We really joked that the project we were doing wasn't the PHP 7 project, it was the finished migrating to PDO project. But on any project, there are lots of questions like this pretty much every day. This is what planning is all about. It's about making a better napkin. Notice how with not too much effort we went from this to this. What I'm trying to say is you can get a lot of leverage out of test of planning tools. They're very simple, but they can be exceedingly valuable. What's nice about them is they don't require running or writing any code. That's what I mean about there being no engineering time involved. By focusing attention correctly, you can make the most of the engineering time you ultimately use. One thing I haven't been talking about yet is deadlines. Deadlines are very risky to talk about. Work can be very fluid. You can set up the wrong kind of conversation by pinning a project to dates. You want to talk with your team about what matters to them, solving problems and delivering value. You want to treat your stakeholders like partners, because that's who they are. Give them as much information as you have. If all you know is that something is months away from being done, don't be more specific than that. Share that information and share what you're working on to get more detail. With your task list, it's essential to be on the lookout constantly for new signals that will affect its ordering and contents. Having a task list doesn't mean you'll only ever cross items off the list. One benefit of having the centralized map is that you can review it regularly. Just about every day, you're planning document and try to poke holes in it. If you're not adding at least small notes to it every day, you're probably letting things fall through the cracks. It should be clear enough that anyone on the team can do this on their own. It shouldn't be so opaque or controlled that leads and managers are the only ones doing it. Everyone on the project will have their own insights, and they should have a path to surfacing them. So I want to talk some about the presentation of these lists. The earlier slide with the bullets was formatted exactly as we had it. In case it's not obvious from my slides, I like deliberately unsophisticated presentation formats. One thing I like about the simple bullet list format is that it's very easy to read. Your eye can really just go straight down it. You also have to think as you read it. You have to actively interpret and bring your own context. Other formats like Kanban boards and Gantt charts layer other information. This makes them more expressive, but it can mean you're prioritizing someone else's assumptions about how a project is going. Bad visualizations are more of an impediment than anything. The greater information density means you could actually have more wrong information. So rather than spending time tweaking visualizations, I try to direct my problem-solving energy toward the project itself. In one of our task list reviews, we realized we missed language changes. We'd read so much about how PHP wouldn't require you to change your code that we didn't read the upgrading notes closely. This realization took us into a new set of testing tools. But before I get into those, let me describe the basic issue. This text comes straight from the upgrading notes. PHP 7 changed the handling of indirect references. It's easy to eyeball code that might break, but how do you do that with 28,000 files? This is a job for automation. Specifically, this is a job for static analysis. This is a whole other suite of testing tools that doesn't require you to run any code. Static analysis is a way of checking for code correctness. A compiler will perform static analysis. If types are misused, code will compile. These tools can really be fairly diverse. A very simple tool is grep. If you want to run through input through a regular expression, you can find uses of a function, and that tells you something about your code. If you know a function is deprecated, you can write a regex that will find references. So you might have noticed that regex on the earlier slide that showed how we looked for function supplied by rege. Information like this can help you assemble a punch list of how many references you have to a dependency, which makes it easy to size the effort to remove those dependencies. Another helpful thing is PHP's built-in linter. You can run it from the command line one file at a time, and it will verify the syntax for you. The syntax did change some in PHP 7. PHP 7 MAR is a tool that identifies potential compatibility issues like those on the last slide. It's more or less powered by regular expressions. FAN is a more sophisticated tool. It implements actual static analysis using the abstract syntax tree generated by PHP 7. This was a new capability exposed with this version of the language. FAN can do a lot more sophisticated things like checking for potential tight mismatches and other issues you might not see until runtime. Both tools generate output kind of like this. This is the beginning of the PHP 7 MAR output for a test file. Much like compiler errors, a specific issue is identified with a specific line of code. These are tools you can run on demand, but it's also possible to integrate them into a build process. You can even run them as a pre-commit hook to prevent obvious errors from being introduced. One thing with these tools is that engineers always have to interpret the output. One thing to be aware of is that false positives are possible. This is simply because it's hard to statically analyze a dynamically typed programming language. When you're looking at these different tools, you want to think carefully about how much you invested them. In general, you want to start with running a tool in its weakest mode. The FAN documentation actually has a really nice tutorial for how to go about this. You then want to start by spot checking your output to see if it's actually helpful. Is it reporting issues that you don't know about and do things about? Think about how many issues it's generating and if you're going to have the time to sort through all of them and if it's going to be helpful or just more noise for your team. Also think about the ownership cost. Is the tool something that an individual engineer can run on their own or is this something that a team is going to have to support? One thing that we ran into with FAN, we were very excited about it because we really liked the idea of static analysis and type checking. The problem that we had with it was that a lot of how it figures out what types to check against is by going against your dock tags. So one thing we learned from this is that our documentation was very bad, which was actually something we already kind of knew, but it wouldn't help us very much to get plugged into a system like this, run it on a regular basis and just be reminded of this. So we tried it out, but it ended up being something that we didn't want to use. So static analysis. Like I said, the nice thing with the output is that it's essentially a punch list, which helps you figure out how much work you have to do to clean something up. This can also mean that it can give you confidence that your code is in pretty good shape. This kind of precise scoping was really valuable to us. We had some cleanup to do driven by PHP 7 bar, but nothing that would derail the project. Planning and static analysis generated a lot of tasks. We got into a state where there is simply a lot to do. This is a good place to be, but it can also be dangerous. Having fewer big questions to answer means you have less risk, or at least you think you do. You have to ask yourself if that feels right. If your risk is diminishing, that probably means the project is winding down. You'll need to actively resist momentum when that doesn't feel right. What we needed to remember was that our real goal was running real code. Fixing extension dependencies and static analyzer errors was only in service of that. It would have done our project a disservice to work through all our existing tasks before attempting to run real code. Since running code was a big source of unknowns, and therefore risk, that activity could provide us with important information that could reshape our priorities. As we worked through tasks, as we reviewed and re-reviewed them, we kept looking for opportunities in a PHP 7 environment. We had a few main options for code that we could run. The big target, the thing we were really going after was web requests, but these were the most complicated because this involves setting up a server, making sure all the dependencies, such as database dependencies, are working correctly. A simpler option was to deal with batch processes we had that run through PHP. These are essentially command line scripts, but a lot of these have older, less good code and code with very specific database dependencies, so attractive, but not actually that valuable for us. The last big target was different kinds of automated tests, so things like unit tests, integration tests, acceptance tests. These were the best opportunity for us, specifically unit tests, because they have the fewest dependencies. It was really going to give us an opportunity to run a very small piece of production code and see if it would run and also see if it would produce the results that we expected it would, because that's what unit tests do. We were able to do this in March 2016, and this is what it looked like, or really how the engineer who did the work reported it. So let's look at how this is framed. The engineer sent an email to the core team working on this project, but also our chief architect. We didn't run all of our tests, but we found some that would run. We described what we did, or rather he described what he did, and talked about the difference in runtime for those tests, which was quite significant. We were also able to gather some information about memory usage, which was also interesting. There are also some notes here about all the things that we had to do to make the unit tests work. So this was honestly kind of a hack-together project, but it did tell us a fair bit about the state of our code. It showed that we had code that would run with PHP 7, and that it would probably run faster and with less memory. And it also showed that our code would run correctly, and this was very good. The tasks that we had to go through to get the code to run in the first place helped articulate some of the things that we had to do to complete the rollout. So on its own, we didn't really accomplish very much in a sense by running a very specific unit test, but we learned a lot from doing it. And the email that I shared is actually a good example of managing the common risk that I described earlier. So this engineer did a lot of the things that I called out. He talked about the value to the business. He was talking about performance. He didn't promise too much too soon. He really said, you know, you know, there's a lot of problems, but things are heading in the right direction. He shared this with a broader group to give some positive feedback. And he also shared it with someone outside the immediate team so that there would be visibility into the work that we were doing. Automated testing is really an umbrella term. I talked a little before about unit tests, but other automated tests follow a similar playbook. You write software that opens a web browser and clicks through a series of changes. If you have scripts that you use for manual testing, you can probably automate them. This lets you get similar results faster. Accepted tests are what you call those tests that operate a web browser. You can also have integration tests which try to identify that multiple systems like a web browser, sorry, a web server and a database server are communicating correctly. Even though all these tests are pretty different in scope, you can actually use the same framework PHP unit for most of them. As a quick refresher, this is kind of what the code looks like. You have your own code and then you have test code that calls something and asserts certain results. You can run them with something called a test runner that'll produce output kind of like this. It'll tell you what it does. It'll tell you when things fail. You can attach certain messages to those failures to make them easier to track down. Automated tests are generally very seductive because automation usually is. The key distinction with automated tests though is that they're code. This is, I think, in a way kind of obvious, but the ramifications are worth considering. Like any other piece of code, you have to maintain them. Some of the thought you have to apply to test design is in how they'll be maintained. Poor tests can come back with frequent false positives. There can be a tuning period when you create more complex tests to ensure that they're useful. Also, like I said, there's a really wide range of approaches to automated testing. I'm honestly giving them short shrift by not saying very much about them, but when you're thinking about them for your own team, for your own projects, my general advice would be to ask the questions I've been asking throughout the talk so far. Testing is not about correctness, which is impossible. You have to choose your targets. The best way to do this is to identify and rat them as quickly as possible. You can leverage testing to get continuous feedback about risk as you build something. This requires knowing what you know and knowing what you don't know. It's especially helpful to create a shared map that the team can refer to. This helps keep people on track. The strategy served us pretty well. It got us to a place where we thought we were ready to deploy, but late in the game we started noticing erratic errors from our integration tests in a staging environment. The errors looked like this. This was a big mystery for us. Engineers usually like to brag about scaling problems, but I don't think engineers usually need to allocate 140 terabytes of memory on a web server. This was an issue we hadn't seen in our development environment, and like I said, it only started appearing in our staging environment. This was a new class of problem for us, something we didn't understand that we didn't know we'd encounter. Some people call these unknown unknowns. To get to the bottom of this we had to use new strategies and new tools. First, strategy. With these problems you have to start with broader strokes, simply because there's a lot more that you don't know. What I said before about not prematurely discarding risk applies here. You want to think about where in general terms the problem could be located. This is something that has to do with a web server, a database server, how they talk to each other. Within the web server is it part of your user land code? Could it be an extension? Could it be another process running on the same application server? Consistency. Sorry. When you think about all these places, you really then want to think about them from the least implausible to most implausible. This is really in service of not getting rid of risk too soon. You want to start thinking of ways you can test whether any of these general areas are where the problem could be occurring. And you want to look for ways to consistently reproduce whatever error. In the case of this project or this error, consistency was actually a clue for us. If this was an error that we always saw that probably would have suggested that the problem was within code that we wrote. Maybe some kind of runaway for loop that decided some data structure had to be extremely big. But because it was inconsistent and it involved memory allocation, we thought a lot about memory management inside PHP extensions. We had a few broad hypotheses here for what could have been involved. One API change going to PHP 7 was that internally we used to represent how big strings and other values are. So we went from using longs to something called size T, which was meant to make PHP more platform independent. A side effect is that it was not unusual for people to miss changing these things. This was something that we saw occasional fixes for in earlier revisions, earlier minor releases of PHP 7. We also looked through some extensions like PCRE that we expected to be a little fussier in this regard. Another possibility throughout all that, though, we didn't find anything. A broader possibility was that there were general memory mismanagement issues. These kinds of things that we could use an analysis tool called Valgrind to identify. We also looked through a tool called Valgrind to identify. This was a tool we'd used before and that we hadn't found anything with. So the fact that this could be an issue would suggest that our test coverage was poor. The broadest hypothesis we had was that there was some kind of shared memory corruption which was a very big target to try to hit. That said, we'd seen things like it with PCRE. In PHP 7 with PCRE where there's now a JIT added on which makes it much faster to execute regular expressions. But there were some early releases where there were problems related to it so we actually turned it off. The problem with this kind of issue is that you more or less have to inspect memory corruption as it is occurring. To do that, the best tool is really GDB which you could attach to a binary or an active process and inspect memory. This also requires you to reproduce the issue. So this was a hard approach which I'll talk about more in a moment. First I want to talk about catching these memory issues. This was something that we could approach using automated tests actually. PHP extensions have a framework for automated testing which you can apply here. One thing we wanted to do is create a controlled environment where we could run these tests, make sure our test coverage was good and run the tests with Valgrind. To run these kinds of tests it's actually a make target when you're building the individual extension. This will kick off the test runner which is kind of similar to PHP unit. It records information about what passed, what failed, and why. The test runner will also produce artifacts about the errors. One thing that's particularly handy is it'll spit out a shell script when a test fails that lets you rerun that test and only that test. Which if you have to bring that code into GDB to debug something very specific is actually pretty valuable. There is a website qa.php.net that documents this tool more exhaustively and you can really just read the source code for the test runner which is run-tests.php is in the root of the PHP source repo and any extensions if you want to understand how it works. The tests look something like this. They're single files with a .phpt extension. It's mostly centered around a block called file which is a PHP script to run but there are also sections that describe what the test is, what it's supposed to do and the kind of output that should be expected. The test runner will produce an output that looks kind of like this similar enough to PHP but it gives you some more information about the environment that it uses. It also gives you the option to run tests with valgrind which means you'll get slightly different output. It tells you kind of at the top that it ran with valgrind and if there are any leaks it identifies which tests leaked. An important thing to know about memory leaks is that memory leak and your code can still work. So you may have tests that pass but they will then identify as being leaked tests when you run them with valgrind. Valgrind produces output that looks more or less like this. When you run it with these extension tests it will generate an artifact file with the .mem extension indicating whatever happened and it can be pretty low level but if you're comfortable enough with C it can point you in the right direction to fix your code. The other tool here we use is something called gcov which is a new tool for measuring code coverage with with C. To be able to capture this information you have to modify your build steps for PHP extensions slightly. You start off the same way by running the PHPis script. When you're running the configure script you add on some special switches so that when the binary is built it will also produce special artifacts that tell you about code coverage. There's a tool called lcov which will read these artifacts and produce an information file which you can then analyze and visualize in different ways. So you kind of use that to reset any counters that may have generated run your tests to generate the artifact that show what code was executed and then you run lcov again to capture that difference. Following that you can take that file which here I just called coverage.info and run it through a tool that will generate a visualization of it. An easy one to use is one called genhtml which produces some output that looks kind of like this. It will show source file, it will give you some general statistics on which lines were touched and it will also color code them. In this case the top of the file is all common so that's why there's nothing to see there. So this approach showed us that our own extensions were actually in pretty good shape. This, well it didn't get us closer to a solution, it did end up giving us a more robust suite of tests for our custom code. This meant that our kind of last hope was digging in with GDB and trying to reproduce the error state. GDB is a pretty complicated tool so I don't want to say too much about it. There's a great tutorial by Richard Stallman, that's the link at the bottom which explains how to use it. The main thing to keep in mind is that you want to find the simplest means to reproduce an issue that you can run GDB pointing to a specific PHP script. If you have a core file produced from a segfault or another error, you can load that into GDB. You can also attach a GDB to a live PHP FPM or other kind of process if you need to set breakpoints and see how things are working on the server as a request is being served. When you're doing this kind of debugging it's important to use a special build of PHP. A debug build is different in two ways. One way is that it's going to produce special artifacts that GDB can use to tell you the names of functions and the code that is being executed at different points. Otherwise it'll just give you memory addresses which is not very helpful. The other thing that it will do is disable compiler optimizations which will basically make sure that you get a true line by line execution of your source code. Compiler optimizations can otherwise rearrange things in unpredictable ways. Something to keep in mind with debug builds is that because you are disabling those optimizations, the code may actually behave differently. This can introduce noise into the testing you're trying to do. Another nice thing to know about with GDB is that you can script it. If there are things that you're trying to do repeatedly you can record them into a file, load them, and run them every time you're trying to see how things work. If you're trying to make small changes to a test plan to understand where something is breaking this can be a helpful way of maintaining sanity. So after chunking along on this for a little bit we found the culprit. The culprit was not one of our extensions or rather not an extension we had modified ourselves but another non-core extension in this case APCU. We found a bug with it that there was an invalid right to a memory region containing the size of a PHP string and this is why at some point PHP decided it needed to allocate so much memory. It turned out though that this was a bug that had been fixed already. So there wasn't really any work for us to do besides build new packages and deploy them. Kind of anticlimactic but sometimes it happens. So things we concluded from this whole exercise we could have been a little more thorough about reviewing known issues with third-party extensions. We were relatively confident that that's where the issue was but we were a little I'd say low level and rudimentary and how we investigated them. We could have focused our energy sooner on extensions that operated on shared memory . We'd already had that issue before with PCRE and we assumed that the APCU extension was actually safe because we didn't write it so that was kind of a big blind spot for us. A very simple thing we could have done before getting into too much debugging was making sure we deployed the latest version of all extensions. We didn't have a really good sense of how often they were being updated so we assumed that this wasn't a thing that we needed to look into. One thing I want to touch on a little is this idea of correctness and whether we should have identified sooner that we could have been more correct in our understanding of what our code did. This is kind of a hindsight as 2020 sort of thing. Ideally we would have known sooner that this was a kind of issue but a lot of the choices that we made in this project were based on as much information as we had at a given point in time so it could be easy to say that you did things wrong after the fact but usually you don't know that until after the fact. After we identified this issue we did a final round of testing using a technique called replay testing. One thing we were concerned about is that the testing we were doing wasn't working through a broad enough set of use cases for our website. We were doing things that engineers knew how to do which is really a pretty specific walk through of flows through the website. So to get a better understanding of things real people did we looked at access logs so one thing that you can do is you can take your web access logs which tell you all the URLs that people hit and all the parameters going to them. You can feed them to curl or another tool that can make HTTP requests and then you can examine your logs and see if any errors are generated. This is pretty helpful and since it's something you can script you can burn through a lot of use cases quite quickly. Another thing to consider with replay testing is to combine it with stress testing. Apache Bench is a good tool for this. It lets you indicate a URL and say I want you to request this however many times and also a certain number of times you want to request a URL. Since web servers have dependencies and are usually handling more than one request at once this can help expose issues that might be present including the kind of shared memory issues that had been us before. Thankfully after working through the replay testing we were still pretty confident in how our systems were looking. So to sum up unknown unknowns the main thing to focus on with them is reducing their scope. More specific problems and more consistent problems are usually easier to solve. Take lots of notes as you work through these issues. You never know what information will be relevant. Our initial hypotheses were driven by things we noted earlier in the project. Over communicating ensures you get regular sanity checks on what you're trying. It also gives you opportunities to identify the familiar in what you see. Finding something familiar will give you an idea of what you're looking for and what you're trying to slash on to which might help you crack the bigger thing that's harder to explain. It's helpful to vary your approach as you go. All tools have different characteristics and different things that they're good at and different results they're likely to produce. When you don't know what you're looking for getting a mix of characteristics will improve the odds that you'll find it. Similarly try solving these problems with a partner. Different people have different strategies and solutions. These problems can introduce a lot of stress into projects and that's something you want to be very sensitive to. There can be doubt inside and outside the team that things are going in the right direction. So one thing to think about is reframing them. A lot of times the day to day of being a software engineer can be very rude and routine. You can do things that you're comfortable with but these kinds of problems are opportunities to break out of that comfort zone and you can do things. And hopefully that's something that is exciting to the engineers on your team and not, you know, distraction. Look for indications that the problem is getting resolved and you're making progress on it. When problems are very vague it can kind of feel like you're falling through quicksand sometimes and you want to look for anything that's going to tell you that you're learning something more about it throughout these problems it's just as important if not more important to work the room and give people confidence that the project is headed in the right direction. It can be very easy to disappear working on a project like working on a problem like this and that can reduce the confidence that your stakeholders have in how you're doing. Finally, you want to be willing to say when you've invested too much in a project there's going to be a point in time where the ROI tips over and something stops being worth working on and it's quite possible that this can occur very late in the life cycle of a project. The goals of your team ultimately have to take precedence over any one project. We ended up learning quite a bit from solving this problem. We made it easier to investigate these kinds of problems in production. We found some new processes for capturing information about production systems and thought about new ways to monitor and automate these kinds of solutions around these kinds of issues. Painful as it was it ended up being pretty valuable for our team. The next time we see a problem like this it will be much easier to resolve. This group of tools is really something called system testing and it's very generally easy to poke something big and see if it reacts. These tools are expensive which is why they're a last resort but sometimes that's what you need. At this point we thought we were ready to do a rollout. Considerations we had. Before I said there will probably be other big projects going on at the same time. In our case we were moving hardware from one data center to a new data center. This was a very important very visible project that was meant to make sure that over the next year and even more time we would be able to serve the amount of traffic we were expecting to have. This was a project that could in no uncertain terms be disrupted. Another issue was that our testing focused exclusively on our customer facing website but there are lots of other code that we had that is just separate from that. And finally just the general concern that no one wants to impact revenue no one wants to flip a switch and cause an outage or other kinds of incidents. The stuff we did to navigate this we were very careful about scheduling. We scheduled with lead time around the hard dates that we had with the data center move. We were very clear that we'd be ready to pause this project if the two collided. We were willing to say we might have to wait a few months to be able to continue with this project but it's more important that the data center move is successful than the PHP 7 migration is successful. We started with non-production environments. Wayfair has three primary environments development environment for the day-to-day work of engineers, staging environment which is a production like testing environment and our customer facing production environment. We worked through the development environment before we worked through the staging environment before we thought about getting any production traffic on servers using PHP 7. And by the time that we got to using got to moving that traffic over we did it incrementally. We started a handful of servers at a time before we moved through entire data centers. Our focus throughout this was specifically on the customer facing website. We were very careful about making sure that our code would work both in PHP 5 and PHP 7. So the most value for us was really in getting customer facing traffic on PHP 7 sooner because that's the group of people who would benefit the most from speed up. When we got into actually moving the production traffic over we started by agreeing on a monitoring plan. We identified the things that we were going to watch and who was going to watch them. We made changes from what we called a war room which was a conference room that we booked out where people from different infrastructure and software teams who are working on this project could co-locate, work together, shout out issues in case they saw them. We let changes bake in so we rolled out a few servers at a time, made sure there were no errors rolled out a few more the next day and kind of stepped through very slowly that way to minimize risk. Throughout this very incremental process we really, really over communicated our plan to the broader team. Not impacting others is different from not surprising them. So it's one thing if we cause some kind of disruption but it would be another thing if no one knew that we were making a change that could cause that kind of disruption. A big thing here is that with these incremental updates there's going to be this in between state from every server running PHP 5 to some more, some more running PHP 7 to everything running PHP 7. So you want to have it in the back of your head how you're going to plan around this possibility, how to make sure that your code is going to run in both places. One big thing here was that a project we had to do with moving from the MS SQL extension to PDO this was a project that we could do independent of the PHP 7 move. It's something that we finished and deployed while a lot of our servers were still running PHP 5 and actually detaching those two concerns made it more possible to separate those workflows and get things done faster with less risk. So because of those simultaneous states that's actually why we have simultaneous metrics we were running traffic through two groups of servers at the same time and this was actually a real asset for us because it allowed us to do that side-by-side comparison. It's a lot more compelling to look at a chart like this that has two series than a single series and then start high and then go low. At the end of the rollout one thing that we want to do confirm metrics. We're actually kind of suspicious of them at first because they seemed a little too good to be true. But we looked at things from a few different angles and felt comfortable that there were no customer facing issues that the numbers that we saw were real. We celebrated, we sent release notes padded ourselves on the back and shared our successes with the broader team and then we continued our slow roll over the rest of customer facing traffic and other services. The project all together started in October 2015 and actually only totally wrapped up a couple weeks ago. Most of our production systems were running on PHP 7 by the end of last year but there were some holdouts in different places that meant that we weren't finally rolled over until February. So to sum up again testing is about continuously challenging your world view. You can use tests to learn more about your systems. You can create feedback loops around your tests to make your systems more reliable. Testing is and should be a continuous activity. You don't just do it once in a walk away. That only means something worked once. Even after we deployed and started seeing those positive numbers coming in. So to sum up what we covered risk is the thing that you don't know. Talked about all the different ways you can identify and measure risk in your applications and talked a lot about product management strategies. So hopefully this will all be very helpful for you when you're doing projects like this at your own companies. Thank you. Any questions? You indicated that there might have been some differences between the memory management or the way it's handled in between PHP 5 and 7. I was just wondering if you had any side-to-side comparisons of how the memory usage on the servers was then before or in this transition phase that you all that you did. If there was any comparison there to the memory usage of the PHP scripts. I don't know the precise numbers offhand. We knew that it was lower. And what's really driving that is the structure in the C internals that stores user-land variables got a lot smaller. You mentioned also in the PHP unit test that the initial response said that it used a bit more memory than the comparison to PHP 5. I guess maybe that was just for the test, the unit test then. Yep. Okay. Thanks. Is that it? Thanks.