 I'm a senior Agile consultant with a company called Centaire. We do turkey software development, Agile transformation and consulting along with DevOps, consulting work in Milwaukee, Madison and Chicago areas as well as other parts of the country. I've been doing this for about 14 years, started off life actually in the IT operations consulting space, moved into software, been an architect, been a developer, do all this consulting, love to train people and so I kind of straddle the fence of understanding both the ops realm and the development realm. I love to try to present both sides of the coin to people to help them understand what's really going on in our DevOps world. I've also been involved with Microsoft Windows, you're going to see a lot of that today. I know for your Linux people is going to scare you a little bit, don't worry, you'll be fine. A lot of the stuff I'm going through today actually applies to any technology. So I'll be picking specific examples, don't treat them like gospel, you can substitute any tool you like through this space, but understand about how all the concepts fit together in the pieces and why they're important. All right. So welcome to Parts Unlimited. I'd like to start with the tale of a little bit of an imaginary team. If the name and the team seems slightly less imaginary and you recognize the book on the cover, you might understand why. This team started out in this type of a development cycle. So I write my code, I build it locally, I test it locally, I go copy a bunch of stuff over to the server, I go hack up a bunch of settings to make sure it matches the environment, and I go wait for somebody to yell. Okay. The reality was that this process wasn't nearly as forward of a loop as what people thought it was. Oh wait, somebody yelled, oh, whoops, forgot that setting. Oh wait, forgot to copy that over. Well, completely built that wrong, put that wrong DLL in there, and you know what? I just developed the whole thing wrong, so let's start over. There's a much better way to get a feedback cycle in here. Develop, use a continuous integration server to build, use a continuous integration server to test, package up your code, treat it as an artifact, deploy it, get feedback. This is kind of a where they're at. I'm not gonna dig a ton into how you set up this chain today. If you are interested in talking about that, find me later. But there's some key principles underneath it of what makes it work. Small work batches. So checking in, building every single check-in that goes into your system, regardless of what it is, is a good baseline. Automating those quality gates when you actually do the build and test. So I wanna make sure that I'm running my unit tests. I'm running code coverage. Any other quality checkers that I'm running, I'm doing that every single test trying to constantly give me feedback on what's going on. Most importantly, I want a repeatable process. I wanna make sure that during this whole cycle, I'm not introducing a ton of manual steps that I have to go through that are all error-prone. So if there's a goal in here, the goal is I don't want any manual steps in my process, I wanna be able to click a button and everything just works. There's an interesting problem though when we start creating what we call the shared sandbox. How many of you develop in an environment where you are not the only development team deploying to that environment? Okay, good. Here's the problem with that. Is we start developing stuff and we start deploying code and we start putting a ball in different spots and then all of a sudden somebody else comes in and develops that and another team comes in and starts checking in code and suddenly we don't know what's on, what's off and about halfway through the development cycle our product release team says, you know what, let's ship that. We're all sitting here like, that's not a good idea. So what we do is we start branching, right? And we start building all these branches but then setting up a CI server for a branch is kind of a pain and then we have to figure out how to deploy them all simultaneously and then we have to figure out how to get them into that environment and all that's just a pain so we're like, we'll just work on branches until we're ready to go. And now we have months and months and months go by before we actually get any feedback on that software because we actually aren't building it. Interesting sub-concept here and I'll just leave it to Ponder is if you actually get to a point where you can build an entire environment from scratch every time you check in the line of code, how many pre-production environments do you really need? Think about that. So here's kind of the typical thing. So I'm working along on my trunk and I create release one and I merge some code into there because I have a bug fix and then I create release two and I merge back release one because I figured that that one really isn't needed anymore. Then I create two dot one because I find there's something wrong with version two.do and I merge that one back into the trunk because I think I'm done with that one and somewhere along the line I lose release two. Not great. Here's what I like to see us get to. There are no more animations. One way to do that is with feature toggles. So the idea behind feature toggles is we are now hiding our code with code. So we are making the decision between deploying a release and actually exposing that to the end user is now a configuration inside of our code versus a full kind of deployment process. So just because I deploy doesn't necessarily mean that that functionality is available to the end user at that point. Modular coding practices are absolutely critical to making this successful. Without modular coding practices and the stuff that I'm gonna talk about today is gonna be very difficult to actually do this in a way that you have confidence when you're rolling out that release that things work with and without your toggles and that the way you're writing your code is actually testable and maintainable over time. The key next thing is we need to think about automated testing as a baseline here. The reason I highlighted baseline is that it will not test everything. It will not check everything. But it's like when you turn on your car and all the little lights come on and your dashboard, all the dummy lights they call them you wanna make sure that all those lights turn off before you start driving your car because I for one do not wanna drive my car without my airbag working. Same thing with automated testing. The automated testing are those dummy lights on your car so that when you turn it off or when you start driving the car and all the lights turn off you at least know that all the baseline functionality is now working correctly. I wanna pause for a moment and talk about what happens to an actual team when you start introducing things like this. I pulled data from one of the teams that I've been coaching for the last six months and just pulled all of their CI metrics from their server. So nothing advanced, nothing fancy. Almost every CI server out there can generate the data that I'm showing you now. But I wanna show their progression over time. So in the beginning you've got them stabilizing and learning how to use all the tools. They're mucking around with all the code quality analysis, figuring out testing, trying to figure out where they are. You can kinda see down here they actually increased their code coverage. So for people that can't see in the back the green line is code coverage. The red line is the number of code inspection errors they have and the yellow is the number of code warnings. There's a couple of interesting things that starts happening here though. They have these weird spikes where all of a sudden they get a lot more warnings, a lot more errors, their code covers starts dropping. Same kind of pattern here, same kind of pattern here. Without a lot of context it's kind of unusual of like, hmm, what's actually happening? This adds a little bit more context. These are shipping to prod. This is when their business teams basically said, don't worry about code quality, don't worry about making sure the software is actually right, just push it out the door. And so the team did that. They ignored quality, they actually turned off things that makes the build fail when these kind of quality gates drop and they just said, okay, we'll deploy. And they did. And they even tried in a couple of spots like there to recover all their code. So in fact, this was one of their lead developers literally spending just nothing but two days of writing unit tests to try to bring their code quality back up. And oops, they lose it. Further on list on the next deployment. So what would happen if I could just say I'm deploying all the time? It's an interesting thought. And the team's actually that, this team is starting that real experiment actually this week where they are actually starting to release every single check-in into production daily. So the system goes through all the gates and if all the gates pass, it actually releases the software with all the toggles turned off into production. And can't tell you what's gonna happen. Don't have that data yet, but it'll be an interesting thing to see what happens. So how are we trying to do that? Feature toggles again are fundamentally a way to separate features from deployment at runtime. It's oftentimes done as a service UI layer, but I'm gonna talk about some other fundamental pieces that help build into that. A lot of people will come to me and say, okay, well, that's really scary. I don't like this whole idea of features and toggles and flips, which is all embedded inside my code. And I come back with this. Every time you build user authentication into your system, you've built a feature toggle. Just the feature toggle is based on the authentication of the user and not a configuration setting. Toggles are a funny thing in terms of how they work because they really, really need to be explicit. So typical .NET example is you have a configuration manager, which is basically, I'm reading my app config file. I'm gonna grab a setting out of here. I'm gonna see if it's true or false. This is my toggle name. But the problem is that this whole setup's really brittle. If I change the name of the string or I change the name of the setting or I copy this in three different places, if you're a .NET developer, you realize this is a really, really horrible way to actually check if something's true or false. You can have a lot of flaws in this kind of a code. Here's a better setup. I have some provider that tells me that I can go get configuration settings. And you know what, I can swap this out based on what I need to in my application. If I wanna drive this with a database, if I wanna drive this with a config file, if I wanna drive this with some other deployment tool telling me what I need on or off, I can plug that in without affecting my actual code base. The other thing you'll notice is I'm using classes, enums, whatever you wanna use, an actual piece of code as my toggle. The real advantage of this is that I can now refactor this much, much easier and find every usage of this in my application without just doing a global string search and hoping I stumble across all the usages. If I drive all my toggles off of something in code, I can now easily find wherever that code is. I can also find other things like have I nested more than one toggle inside of one another? Have I made one toggle implicitly dependent on one another by running code analysis on there? There's a whole bunch of things I can start doing now because I actually have a codified representation of that toggle inside of my code. This is the big driving home point that I always make to people and people don't really seem to realize this. Toggles are absolutely technical debt. When you put toggles into code, you are making a decision to leave both the old code and the new code in your application. And people don't wanna really accept this. They say, oh, I'll just keep adding toggles. I've seen this and it looks like this, you know, flip switch plane of a 747 dashboard when they're all done because they don't know what they turn on and off and what they're dependent upon. So they flip this one on and it's like, does this do something? I don't know. What happens when I turn these two on together? I'm not really sure. So you end up with all these permutations and combinations that can really be a problem. Or people go back to this problem. It's like, yeah. And toggles have an impact. Every time we flip a toggle, we are effectively changing the code pass inside of an application. So they do stuff and they are going to have a performance impact in your app. So if you are going to use feature toggles, you've got to have something at runtime that helps correlate this and make sure that you're putting a spike into your system to say, hey, I've changed a feature toggle. I need to make sure that my app's still running the way I expect it to. So make sure that if you're flipping toggles, they aren't just hidden off in some database where the user can go and arbitrarily turn something on. Cause putting my app set on for a second, I was on that side of the fence. Let me tell you, when the app's CPU and performance and goes through the roof in your database operations, go three times the normal. And the Delta team said, well, we didn't deploy anything. You turned on a toggle. So effectively, yeah, you did. All right, one way to practice this as a team. I love doing this with teams cause it's a lot of fun actually. Try deployment kata. How many of you are familiar with the term kata as an exercise? Most of you? Okay. For those of you that aren't, the kata is a Japanese word, martial arts word for a term of practicing something or going through and rehearsing something. Try deployment kata with your team. So basically take a small application, doesn't need to be difficult to complex and walk through the deployment process with your team. End to end, including turning toggles on and off, including build, deploy, testing, the whole thing and make sure it's done as a cross-functional team. So your apps people need to be there, your database people need to be there, your network engineers, your developers and know what I like to call problem clusters, which means if there's a problem with the database, the DBAs can't just stay there and everybody else goes get coffee and then comes back when it's fixed and then oh, there's something wrong with the web servers, web guys stay around, everybody goes get coffee. A, you end up getting a lot of coffee during the day and B, it really doesn't help build that cross-functional domain knowledge and the empathy that's needed for the team to understand what's going on. A lot of people kind of get to this toggle stage and say, you know, this just looks really complicated and what I want in the end is this. My daughter loves driving one of these. It's real simple, you got a horn, you got a little thing and it steers. Real simple to use. And most people's applications tend to say they look like this. I'd like to point out this as simplified NAS. So, here's one step for it. Look at an version of control as a first step inside your application for code. I'm actually gonna show an example of this in just a second. The idea is for most of you I'm sure are familiar with it, but for those that aren't, I can basically go take dependencies and swap out their provider. So all they define is an interface or a contract inside my code that says, I need something to store files for me and I can go and make that the file system. I can go make that cloud storage. I can go make that my simplified NAS. Really, whatever I want to do to substitute it and make things work correctly. There's a detail here with feature toggles though that gets kind of interesting is you can actually make your container now build up your stack based on a feature toggle. So, I can say, you know what? If I have this feature toggle turned on, go use this cloud provider. If I have this feature toggle turned off, actually substitute an entirely different provider. And the cool part is most of your runtime operations tools can actually pick those kind of things up and say, oh, okay, you use this one. And you can kind of see your code running through it versus the other one. And the nice part about this is that if I'm swipping out providers, I don't have to put as many if blocks in my code. I can just say, you know what? I'm gonna build a brand new implementation of this so I can go to the cloud. And I'm gonna go shove that off in its own class, have its own implementation. I can test it separately. And it'll work just as great as the one that's currently running. But now with the toggle, I can basically choose which one I wanna use. And so I don't have to recompile my application every single time I wanna make the decision of where I want to deploy my app. I'm gonna stop for a second and take a look at a sample app that I kind of put together for this. Or still. There's a product called Parts Unlimited. So, team we were talking about. And I've done the same kind of things for feature toggles here. So you'll notice that I've got a feature toggle in my code and there's really nothing in it. It's completely empty except for the name of the toggle and how it's linked to my runtime. So in this case I'm saying I'm gonna use app settings to determine whether or not I'm gonna turn this toggle on or off. Really simple. The nice part is though when I use it my code, it's really clean to figure out what I'm doing. So in this case I'm saying, all right, I wanna be able to get recommendations but I want this to work whether or not the feature's turned on or off. So in this case when the feature's turned off I'm just gonna have the system return an empty collection back to say, yep, don't have any. It really does and the system really works but when the toggle's turned off the system's just gonna pretend like it doesn't know anything about that new code. When it turns on, the system's actually gonna run through, get recommendations, link them back to the product and return review in the system of those new products. So if we're looking at how this actually works in production we've got our recommendations website, toggle's turned on and so when I go and click on this it's going to show this. I'm gonna leave this up for a few seconds because it actually takes a few seconds to run. So what did I do? I introduced a performance problem in my code. In fact I might have introduced a performance problem that doesn't work. Oh, nope, there it goes. So, hey, now I have stuff. But that didn't seem to work too well, did it? So let me go and look at the performance of my system and talk about the pipeline that the team built. So they built the system where I basically have a build. I check in code, as soon as my build runs, as soon as my build runs I've got, I build the solution, I compile it, I run all my unit tests, I go and I package up the application so all the necessary bits, all the files, the websites, the config settings, all those kind of things and I go and deploy that artifact and publish it out to my deployment server. Now my deployment server is kind of neat because it can do a couple of different things. I've got it so that it can deploy a release that tells me exactly where each revision of a release is in the pipeline and it also has a pretty ingenious process built into it. So it goes and does a couple of pretty simple tasks. It's gonna go and get the package out of the repository. The nice part is that this package is now a sealed unit that it can't be modified. It's basically like a vault so now your IT and ops and audit people can all say this hasn't been modified since it was built. Nobody's touched it, nobody's modified it so I know it's the code from a verification standpoint that I ended up checking in. I'm gonna go and deploy the package so I'm pushing this one out to Windows Azure. Website saves me a lot of the setup in infrastructure time but then I'm gonna go and say, I'm gonna ping New Relic and say, I just completed a deployment and what New Relic does and most of the operations monitoring tools can do this is it's gonna check it end over end and say, oh, hmm, your performance looks different from the last time. Maybe you wanna take a look at that. So in a second, it's also gonna start a test run. So when I get into testing, I'm gonna talk about this a little bit more but basically it reaches back into the CI server and says, you know what? All right, start running integrations tests on this to make sure that everything works the way I thought it did. You can kind of see here that as I started going in my browser, I've actually got throughput, I've got events, I can actually see what's happening and I can also see from earlier this morning when I did deployments exactly when I deployed the code and what changed. So it's pretty powerful in terms of being able to see, oops, can't hit New Relic, what's going on in my application and what the performance of it is as a result of me turning a feature toggle on or off. Things can get a little bit more complicated than that though. Like what happens when I have new services, new database updates, new toggles, new, sorry, not new toggles, new database updates, new services and I'm trying to aggregate all of those together. Command and query response integration is one of the great ways to do this and one of the great guides I found that helps describe this as a story and talk through all the different details is a little bit of an older book provided by Microsoft Patterns and Practices called event sourcing and CQRS. So it basically goes and talks through their journey of building a CQRS-based website, all the infrastructure, all the challenges of doing upgrades, how you build event stores. I would recommend it as a guide for anybody regardless of what technology you develop in because even though, while the examples are .NET, I think they're basically applicable to any development team out there in terms of how do I build an event store, what kind of patterns do I use in building my app to make it responsive and how do I build a high volume application system that's scalable in the end. Testing, so yes, you have to do testing. Let's talk a little bit about what happens when we get to the integration testing level because I think most of us are pretty competent about I'm gonna write unit tests, I'm gonna go and test my unit test with a mock for the feature toggle, I'm gonna test it in the on-state and the off-states that I've covered both of them. That's all pretty rudimentary for most development teams out there. But what happens when I actually want to try to test an end-to-end transaction in the system? People usually go to a development language called Gherkin or Cucumber to help write specs so that their business people can understand what those end-to-end transactions are. I think there's a huge power in that but I think there's also some risks. This is what a typical business analyst or somebody out there would write as a spec. I'm gonna add an airplane, I'm gonna check that it's on the page, I'm gonna view it, I'm gonna add it to the cart, I'm gonna be on the cart page and then I'm gonna actually see if it contains the bomber. But you end up with this weird scenario where you've got a bunch of things that basically all mean the same thing. Selects and presses means the same thing, is on and beyond should be in the same thing. Web page page all means the same thing so you get all this redundant language and what this results in is when you actually bind this up in the back end, you've got all these problems of, I've got these duplicate steps that effectively mean the same thing that my developers now need to wire up to a functional system. So this, each time I'm on I've built a new method and because they're all different I now have to implement all of these over and over and over again. So one of the ways to do it is with something called spec bind. It's out there, it's available for .NET today, we're looking at writing versions for the Java, the Ruby languages as well. And basically what it is is it's a binding layer between a cucumber spec flow description language and a web automation framework. And the idea is I can now use a consistent common language instead of steps on the business side and a page model on the development side to help lower the development overhead of creating integration tests and I can create a common language for my teams to use which means my functional teams can actually work ahead of my development teams and define these specs actually before and the specs just light up as things get implemented. So it creates a common language. So a good example is I'm gonna go add something to the cart. Every time I click, add, select, do something on a page that involves that kind of an action I use I chose as a language step. But the nice part is that it's actually tense, correct. So in a Gherkin test you actually want your given steps to be past tense because it's something you've already done to basically get to what you're testing and then the when and the then at the end are supposed to be present tense because it's what you're doing and what you expect as a result. So the idea is all of these pieces now are lined up and are correct so that it runs. Here's an example of the same idea but now written with Gherkin. So I navigate to the homepage. I choose the model, I'm on the product list page, I select the first item on the product list page, I click view, I'm on the product details page, I click add to cart, I'm on the shopping cart page and I see something where the product, the first item in the list is a B10 bomber and the quantity equals one. Still very readable, a little bit more cody but now I have something that's very easy to use. All the steps are reusable and I actually didn't have to write any new Gherkin steps to make this work correctly. So then the question becomes, how do we link something like this to feature toggles? And the key is here that there are a lot of mechanisms behind the scenes that you can link it in but I like to link it back to the deployment tool. So if we look here in our deployment tool for the product, you can actually see in the variables that I have my feature toggles defined for each environment in here along with my database server. So I can basically say in dev, I want my show recommendations to be true and in every other environment I want it to be false. Real easy to see, real easy to find, real easy to read and I can't actually change this value without pushing out a new release. I have to actually do a deployment to turn on this feature toggle and the interesting part is that the system is performant enough to actually not do a real deployment. It's gonna go update the setting and it's gonna leave all the other pieces that it knows are the same alone but it's now put that spike back into the monitor to set it up and because my integration tests drive off of this data and go and ask for it and say, hey, what feature toggles are turned on? The test will actually run the appropriate integration scenarios based on whether or not those toggles are enabled. You can actually see a little bit of that going on in here back on the integration test. So the build that gets kicked off on this part, if you look at the actual test run shows down here the one test that actually ran because of the feature toggles being turned on. Actually, visual run. I can actually see in here in details how many tests were run, what was the past priority, the details of the given tests, what results and I know that when Jerry orders a free product because he gets the promo code and the recommendations are turned on but that actually works. So finally, as I've been talking about, getting feedback is critical to this process. Being able to run those integration tests and see what the feedback of your system has and the impact on performance is critical to making sure that your applications actually run correctly. There's a ton of good apps out there for it. I'm not gonna promote one over the other. I would say that if you're looking for a tool make sure that it's pulling user analytics and combining those two pieces of data for you. It's really, really critical to understand what your users are doing. Next to the runtime data because if those two pieces of data are separate, it can often be really difficult to actually figure out what is the user doing that's actually impacting the performance of my system. Finally, it's really easy to dig deeper for data in these situations. Almost all of these monitoring providers have an API that you can work in inside of your code. So make sure you're actually, once you've picked a tool and decide that that's what you wanna settle on, go deeper into their API and figure out what else you can do with it. If your KPI for your business is, I need to make sure that everybody starts buying this product and just buying products on my website is the main KPI for the business, instrument that so you can figure out how many of those are happening and whether or not the front end of the website being down is actually implementing your business KPI. Netflix does this. They basically only have one KPI in their whole business and that's movie starts per second. If they have an outage at three in the morning on a Saturday, they don't really care because it's not impacting your ability to start watching a movie a lot. If they have an outage at nine p.m. on a Monday, they've blown their KPI for the week because they've impacted a ton of people being able to watch the movie. There's an acceptance level, especially when you get to toggle us to understand that at some point everything is going to be down sooner or later. So you just kind of accept that, roll forward and understand how that impacts your ability to make customers happy. So kind of take with concepts. If you're gonna work with PE teacher toggles, make sure that you're codifying them so that you're accepting that they are technical debt. It's absolutely critical to understand that you are accepting something and to develop a cycle into your process that removes technical debt and removes those feature toggles once you know that the feature is stable and the experiment that the user actually wants. Go deeper into the UI if you need to. A lot of times people will just stay at a UI layer and say, you know what, that's not really working for me. I can't build something into my product that does that. Fine, but make sure that when you're actually doing that, you're going to look at how all those layers are composing together and building those tests on top of one another to verify that what you're actually toggling out and off works correctly in that environment. Automate the testing of toggles. This is like the number one baseline for something. If you're going to do this, make sure you have a testing platform in place or make sure you're at least starting on a testing platform that can test all of those baselines for you so that you aren't testing it over and over. There's a good quote from Google that says, automated testing turns fear into boredom. I really like that because I think in the end, adding tests into an application helps you build that confidence layer of what I'm changing doesn't affect the rest of the application. I know it's stable. I know my users aren't going to experience anything horrific as they're working on the app. Finally, remember that feature toggles are a way to separate deployments from releases. So if you're still working towards just being able to get a deployment out there and that not taking a seven hour period, don't start on feature toggles. Don't actually go for the complicated stuff until you've learned to crawl. Crawl before you walk, same kind of concept. Don't work on feature toggles until you've figured out how to build your code automatically, until you've figured out how to deploy it automatically and gone through those basic codders so that your team is comfortable with that because otherwise adding toggles into your application is just going to add a level of complexity that's going to overwhelm them. That's all I have this morning. Thank you. So if you've got two toggles, you may need to run four different sets of tests to cover all the different toggles. If you have eight toggles, it gets much larger. So how do you have to think about testing differently? So thinking about toggles in terms of testing can be complex because you do have more permutations and they're going to have them on and off. There's going to be more scenarios to run. The automation does provide a baseline but you do have to think about impact areas when it comes to manual testing as well. So you won't be able to test everything but that's also why it's important to maintain a business decision around how many toggles you're introducing into your application suite in general. If you can accept that risk and you're building that automation base as a reliable standpoint, you can actually do most of that testing and have testers actually working through scenarios for on and off. Most teams, I don't recommend any more than four or five teacher toggles as a baseline just to avoid the excess of permutations and nesting problems and things that can result of that. Have you ever used this for software, anything that isn't really software as a service? You don't have a website that you can just hide toggles in and said it's package software or something that maybe comes out every month or two and people could pick to never upgrade again. So the never upgrade deployed scenario, yes, future toggles work with this. Two things have to happen. You typically have to hide your toggles a little bit better because you don't want your users accidentally discovering them and then playing with all the flip switches but there are ways to do it. Actually, perfect case in point, Team Foundation Server 2015 actually has feature toggles inside of it, including the one that's getting shrink wrapped and shipped out to customers for on-prem deployments. The same stuff that's going out in the cloud that has the toggles, it's the same stuff that's on site. So it is possible. The example you showed with the test, I assume that you were meaning that that would run in production as you deploy the application to production? Yes. So that example is very simple, right? Where you want to add an item to a cart. How do you test a checkout in production? That's a good question. So tests get more complex as you're working through production scenarios. What I've seen teams do that have been successful is they will build accounts into the product or isolated areas into the product that they can run through those paths without hitting any major blockers. So they will segment off their system that says, hey, run this specialized account and they basically have a hard code into the account that says when you hit the checkout process, accept any credit card, but don't actually run a transaction. So there's a shortcut loop in the system. It does get more complex that at some point you can't actually test real transactions and that's what people are there to test. But again, if you combine that with the monitoring metrics, then you have a pretty good balance of tests as a baseline and then looking at your analytics to make sure that what users are really doing is actually working correctly. Cool. Thank you, Dan. Thank you.