 OK, hey, everyone. Thank you for coming to my talk. They didn't stop to think if they should, how to prevent the xy problem in your OpenStack cloud. My name is Sean Carlow. I'm an OpenStack architect with Rackspace, and I've been involved with OpenStack for six years now. All right, so let's get started. So I'd like to start with this famous quote by Dr. Ian Malcolm. Your scientists were so preoccupied with whether they could. They didn't stop to think if they should. So I think we all know this famous quote from the first Jurassic Park movie, kind of a foreshadowing of the first movie, well, the series in general. And as engineers and architects of private clouds, I think most of us have kind of been in that spot where a business unit or an application owner comes to you with a proposed solution to a problem that they have. You get tunnel vision. You start focusing on the solution that they've presented to you. You start working on it, figuring out the kinks, how to implement it. And you never stop to think, well, should I be doing this? So in this talk, we're going to discuss some examples of this, why it's bad, and how to work with your customers to come up with a solution that works for them, and the platform. OK, so to start, just a basic definition. What is the XY problem? The XY problem is asking about your attempted solution rather than your actual problem. This leads to enormous amounts of wasted time and energy, both on the part of people asking for help and on the part of those providing help. So this talk is actually about a variation of the XY problem. In this case, the user wants to solve a business problem X. They present you with solution Y. You spend time and energy researching Y, get it implemented, and only to discover the business was trying to solve for X. So a couple of baseline examples. Somebody asking you, why isn't config Y working instead of asking, how do I enable X feature? A couple of actual examples that I have been asked in the past. Somebody asking, how do I forward my SSH key into an instance instead of asking, how do I SSH into an instance? And lastly, how do I set persistent IP tables rules instead of asking, why isn't my machine receiving a lease from the DHCP server? Now, some examples in the cloud or enterprise in general. Going with a specific storage technology because it's cheap or because an executive knows the owner instead of asking for a storage technology that meets business needs. I think we've all seen or heard about this one. While budget absolutely plays a part in determining a proper storage solution, it can't be the only determining factor. So going with a storage platform just because it's cheap, it may not work out well for you in the end. And if you go with that storage technology that executive is buddies with, that may not work out so well either. Asking for higher over-commit ratios instead of practicing good stewardship of the cloud. We all know the cloud is not infinite, and especially when the business is responsible for all aspects of that cloud from the hardware all the way up. So it's up to users and cloud admins to practice good stewardship, deleting resources that aren't needed anymore, things of that nature. Now, increasing over-commit ratios may sound like a good idea at the time. However, this could result in deadlock processes and out-of-memory situations where the hypervisors out of memory killer will end up just mercilessly destroying VMs to try to save itself. And lastly, migrating a legacy application that cannot tolerate any downtime to the cloud without retooling. Now, moving legacy applications to the cloud is fine. It's what we want. However, doing it without any sort of rework to take advantage of the features and the power that the cloud offers you, that's not going to end well. I know it's an old saying, but OpenSec is not cheap VMware. So here are some actual real-world examples. Obviously, these companies' names have been obscured. So let's start with company A. Company A executives started with an open source initiative. So they mandated that they get rid of any closed source platforms or software in their environments switch entirely to open source. As a result of this, they pivoted off of a known good closed platform storage product in favor of Swift and Cef. Now, unfortunately, as a result of this, that resulted in stability issues, performance issues, and ultimately, a lot of downtime. Now, this is not a knock on Swift or Cef. They just weren't the right solution for this problem. Company B needed a file server in their environment. So they spun up a virtual machine, created a Cinder LVM volume, attached it, and called it good. Unfortunately, this created a single point of failure in their environments because all of their applications relied on this file server. So whenever the hypervisor went down or the Cinder node hosting the backing volume went down, their entire cloud was effectively down. Now we have company C. Company C had an initiative to move everything to the cloud. They said, all must be moved to the cloud by X date. Unfortunately, this resulted in a lot of just legacy migrations, like I mentioned previously, and applications started treating commodity storage like an enterprise grade storage solution. Now, the problem encountered here was, depending on the application, some applications had stringent snapshot requirements. Some had to take a new snapshot once a day, sometimes multiple times a day, and then X number of snapshots had to be retained at any given time period. With LVM, the issue there is for each additional snapshot layer performance degrades drastically. As a result of this, this customer ended up having volumes going into a deactivated state during additional snapshot attempts. Sometimes the volume group that was backing those logical volumes would just stop working entirely, causing the box to fall over. And of course, this would require out-of-band management in the end, because the entire LVM system would stop functioning correctly, and the box would not even boot. OK, now I've given some examples. So let's talk about why is this bad. So why is this bad for the company? Start with wasted money. So a company spends money on a new solution, software, contracts, hardware. All of that adds up. And if it's the wrong solution, that company will never meet its return on investment. Negative customer perception. If this ends up being part of a customer-facing offering and it doesn't work correctly or there are stability issues, that could cause negative perception from your customers. You may lose customers due to these problems. So why is it bad for the customer? Loss of productivity, and that's beyond simply having to learn a new feature or product. Perhaps it doesn't do everything they need. Certain features and functionality may be missing. So they have to figure out workarounds. Stability issues could cause outages or just performance problems. And this is going to make them frustrated, which leads to frustration. When things don't work the way we expect them to, we get frustrated. So that's not a good thing either. And lastly for you, the engineer, this could result in long hours because you end up having to try to make a product that's not really broken work how the user's expected to. You could end up working late nights. So perhaps that configuration changed that they had you implement resulted in some sort of downtime. So be ready to be called at 2 AM because a critical application is down. And lastly, angry conference calls. When things don't work, your users escalate to their boss and to their boss's boss. And you end up on angry phone calls where you're yelled at. OK, so we've talked about examples, why this is bad. Now let's look at some factors as to why a solution may be presented to you. And in this case, why it may be bad for the platform and a bad solution in general. So for starters, you could be dealing with developers who have no operations experience. Not a knock on developers. It's just uptime, stability. Those are generally outside of their view. So they may not be factoring these things in. We've always deployed the application that way. To quote Grace Hopper, the most dangerous phrase in the language is, we've always done it this way. In the case of the legacy application migration I mentioned previously, back in the legacy days, they may have deployed it a certain way. They assume from an orthodoxy perspective that they should do it the same way in the cloud. And that's not necessarily the case. Or the solution presented may not be bad, but can be wrong for the circumstances. Take the file server I mentioned previously. It's not necessarily a bad solution in that it does get the job done. However, or excuse me, and given if this was done on dedicated hardware with redundant all the things, this would be fine. But just spinning up a single VM and a single volume in the cloud, it's not going to cut it. Okay, so let's look at situations where the solution presented may be right for the application or the business, but not necessarily for the platform. So they have context that you don't, right? There may be business requirements that have sort of forced them to go with one solution. There may be only one solution that does this thing. Maybe the solution isn't stable yet. So maybe the solution is right for the business and for the platform, but it's just not there yet. It's just not mature yet. Maybe it doesn't really matter. So in this case, you could be dealing with, say for example, a monthly billing report, right? So as long as that runs once a month, they don't care about stability. They don't care about performance. As long as that report is generated, they don't care. And in this case, most likely what'll happen is you'll never hear about it because they'll just spin something up and work with it. Lastly, maybe the SLA or risk of downtime is a predetermined risk associated with the solution. So they may be taking a calculated risk and they're fine with it. Or maybe an executive made the call and you just don't have a choice. Okay, so how do we solve this? Well, let's start by asking questions. What's the question they're trying to answer? Basically, what's the problem they're trying to solve? What are their use cases? How do they envision the end users interacting with the solution? Do they have performance requirements? If so, what are they? So maybe they're replacing a legacy application, for example, try to find out what performance requirements the legacy application currently has and work from there. Now, if this is a new solution, it could be up to your discretion. And lastly, why do they think this will solve their problem? Never be afraid to ask why. You're not attacking them, you're just trying to understand where they're coming from. Next, approach the situation with an open mind. They've thought the end users have thought of things you haven't. So just keep that in mind. Always approach with an open mind. Be ready to listen. Approach with the middle ground in mind. So the business unit has the application context. You have the platform context. Try to find a solution that will meet somewhere in the middle that will solve both use cases. Lastly, understand your customer's predicament. So there's right ways and wrong ways of talking to your customer. Just be sure that you don't approach them with a sort of attitude of, I know what I'm talking about, I know the right solution, and you need to go with this solution because that's what I said. And that's it. Thank y'all.