 Okay. Well, hi everybody. My name is Adam Vinch. I'm with Time Warner Cable and I'm going to talk about configuring Swift with Puppet Swift. First things first, I have all these slides available here, so get a picture of this one. You can download them. I left the speaker notes on there to make it pretty easy to look at at home. I've worked on deploying and operating Swift for about four years now. I'm part of a DevOps team at Time Warner Cable who own the design, deployment, and operation of an OpenStack cloud. It's used to support mainly internal customers and some external customer-facing operations. So this talk is really meant to be a blueprint for those considering using Puppet for an initial deployment of Swift, as well as for someone who wanted to transition away from another tool set to a Puppet managed tool set. So the TWC cloud, just a little background on production cloud there, it's four replicas that cost two regions. It's only about 24 nodes per region. We do have many expansions planned and some in progress right now. This was really set up for high availability and disaster recovery with two replicas in each region, allow customers to access stuff even if one of the regions blew up or something. It's good for a really broad range of object sizes and throughput. It's not necessarily a performance tune cluster. But for example, it backs things like a cloud drive on employee desktops so they can just share files that way, believe sender volume backups are on there and some customer-facing web pages. So it sees all ranges of traffic, small, big, videos, whatever. On the side, we're also working on standing up in a razor-coated-based cluster. This is also using some new features in Puppet Swift that I'm developing. And we'll probably use that for a more high-performance setup, maybe for some video content. So at TWC, we use Puppet, Ansible and Jenkins, and Docker to deploy our OpenStack cloud. And I found that Puppet was a pretty good fit for deploying Swift. So the founding members of our team in this room, Clayton, Matt, they had already set up a pretty good Puppet infrastructure and deployment pipeline. So it was a natural fit to try to move away from another solution to a Puppet solution for Swift. A lot of the Puppet OpenStack modules were started by a guy named Dan Bode. He had a lot of the early development. Since then, there's 50 different active contributors or more on the Puppet OpenStack modules. And just in the last year, they've also been moved under the OpenStack BigTent. So they're the real deal. The support Ubuntu, Dabian, RedHot, CentOS operating systems, I know there's some other variations people use them for. The Puppet OpenStack team, it's led by Emilia Mashi. He's the PTL and he helps drive weekly meetings and design objectives and interaction with the rest of the OpenStack community. When I began looking at the Puppet Swift module, I found out that a lot of the examples were really generic and there was no real guidance on sort of how to piece that all together. Because Swift is so flexible and can fit so many different applications. So this is really supposed to be a blueprint for how you would piece together Swift and then also some supporting services on those nodes to monitor your Swift and make sure it's operating correctly. So I started off with describing some what I call deployment patterns and then we'll step through a few other different sections that will build out a complete deployment. This all assumes though that you do have what I would call as a base node profile, something on your nodes that can provide package repos, IPMI tools, KDump, network configuration, just general sysadmin type of stuff. From there though, Swift can be pieced together. I consider two main patterns that I'd cover here that cover quite a bit of use cases. First one being a general deployment pattern where your proxy node runs just proxy services, object node runs, account container, and object services. And then there's a slightly more tuned one called performance deployment. This is maybe for a situation where you have many small objects and container lookups are slowly like to speed them up. In this case, you move the account and container servers over to the proxy node and host them on an SSD drive. And Swift will support however you want to piece this together. I'm just going to step through these two main cases since they're a majority of situations you'd hit. Puppet Swift itself contains a lot of different classes and defined types that you can use to configure the Swift service. So kind of step through those. It's good to understand at first the Puppet role and profile design paradigm. And a quote from Gary Larisa of Puppet Labs said, a Puppet profile is simply a wrapper class that groups higher lookups and class decorations into one functional unit. A profile exists to give you a single class you can include that will set up all the necessary bits for that piece of technology versus a role is just the mapping of host names to what the nodes actually do. You include profiles in a role. So this assumes that you're using a similar sort of pattern. I think most people are these days, but it's good to look at. So we're going to spend some time really stepping through each of these. These are the core of the Swift functionality that you'd be configuring on your nodes. And then we'll look at supporting services to back that stuff up. So these are abbreviated profiles for the object and proxy nodes. And I'm just going to kind of step through and tell you what they include. Some of them install packages for you, some set up configuration files. You'll notice that I didn't leave every attribute in that I'm configuring these with because that's very specific to your environment. However, if you were to look at each of these classes in the Git repo, they will describe the default parameter and what that does. So I'd encourage you to just sort of step through that and find out what you need or find me an IRC and I can help you with that. The proxy node we start off by install including the Swift class that includes the distro specific Swift package as well as the Swift Comp file. It's required by all the other classes, so it's sort of the first thing you'll include in your profile. The next is the Swift Proxy class and that does, you know, installs the Swift Proxy package as well as proxyserver.conf and this will work, you know, on Red Hat or W and whatever. It just sort of abstracts that away from you. On a proxy node, you need a lot of different middleware to provide things, maybe dynamic large objects, auth token, whatever that is. And those are actually configured in individual classes. And you'll notice that they are their own separate class. Each one of those you'd include in this proxy manifest and set some attributes and they all end up basically in proxyserver.conf. Kind of give you that functionality. An outside module that we have to include next that's not part of puppet Swift is memcached and this is used to cache auth tokens on the proxy nodes and sort of speed up that account lookup under high load. For memcached, you would now include Swift Proxy cache. That's a list of the IPs, basically a list of all your proxy nodes in the cluster that have memcached running on them so that they can talk to each other. And this is, again, assuming you're using Keystone. If you're not, there's some other modules you can use with this. Swift Proxy Keystone actually configures the user that Swift uses to talk to Keystone and authenticate tokens coming into Swift from a user. And Swift Proxy auth token is how you can figure what endpoint Swift should validate against for Keystone. So this has your proxy server up and running, validating tokens, sending it out to object nodes. Next on the line is some monitoring stuff that we'll look at in a little bit at Swift dispersion. This is a class that configures Swift dispersion populate as well as the comp file needed to run that. And then a generic section I have called sync rings we're also going to dive into since there's just a whole lot of directions you can go with that, kind of just abbreviated it there. On the object nodes, there's similar stuff included. We include the Swift class to get the Swift package. The next thing though is a wrapper class that I'll also dive into later and I call it mount drives. What this does is basically mount ring devices to drives on your node and do that mapping for you. On the general deployment pattern, we want all of the different object servers, right? So we want object, account, and container all on the same node. There's a class Swift storage, all that will do this for you. You can further customize each of those servers by calling it out as Swift storage server and the port name, the port is the name of the server. That's how you reference which server you're tuning is if you look into that class and Swift storage all you'll see those are the default ports that are used in pretty common across Swift. Final two things in the object node there, Swift drive audit.conf and a cron for Swift recon. These are important monitoring tools to have and we'll look at them in the Icinga section a little bit more. So stepping ahead here to the difference in the performance deployment. So here you wouldn't want to include Swift storage all on either node because you're actually splitting the account and container servers out from the object server. And this is very similar, instead you instantiate each server with the port number and then it's required that you pass in the server type. So you'll see in the proxy node I've now put type container, type account and it's going to be running those servers, it's going to have those packages installed, those configuration files, et cetera. The object node now is just running the object server so that's how you'd split that out. And again this is where you take your account and container ring and you run it on SSD drives in the proxy node so they have more direct read access, it's a lot faster. In benchmarking I've done it's a significant improvement. So how you handle ring deployments is also very unique to your environment and kind of the situation you're in there. Swift does contain a bunch of different classes and stuff that are used to dynamically build and rebalance a ring. A lot of this is done with exported resources so for example you declare on a node I have a ring device, it's this drive and it's got this name, puppet runs, the resource is exported, the ring builder node would then take that and build a new ring and push it out. It's helpful for a small development cluster, maybe a virtual cluster, your testing code out on. Some people do use this a little more widely but I really prefer to have very strict calculated changes to my rings that I do outside of the cluster. So I'll use Swift ring builder which is a core Swift tool. I'll modify the rings, I'll put them up on a blob server in our environment and then you use the W get puppet module to pull those rings down to your node. Puppet Swift will notify any Swift classes that need to restart when those rings change and it's good. I do key this though, right? So rings are keyed based on the cluster name and a ring version that I store in Hyera. So controlling which ring to pull down and when. Even more secure method that I've done in previous deployments that I'd like to do in puppet is using the MD5 sum of the ring and telling Swift only load the rings if their MD5 sum matches this pattern. This really ensures you're not accidentally pulling down the wrong file which could be catastrophic depending on how you have that set up. So the device drive mapping was a little tricky. I had inherited a cluster where device names were pretty random. Some of them were sequential, some of them were all over. So if I'm re-imaging this node with puppet, how do I tell it which drive IDs to mount? What I did is just create a little wrapper class around Swift storage XFS here. You see, I then use a YAML file in Hyera which is all the object node names or for example the proxy nodes if they have account container drives. I pass in a hash of the drive name to mount on and then the device ID associated in the ring. So this now becomes a source of truth for the ring in this environment. I actually generate new rings from this YAML file. Just kind of feed it in there. This allows me to sort of take puppet and drop it in place of whatever was managing it before. A more sane method if you're starting a new cluster is obviously some dynamic device naming. Device name is object node device 0 to 1 and you just store those in the ring. That way you don't ever have to do any mapping. It's just assumed that drives are mounted based on name and incremental drive number and that would save you the trouble of having to map it out later. But just want to point out this is a pretty simple way to do it. It's worked well for us in production and we'll continue to do it. Next look at some basic monitoring. So you've got Swift running on your nodes. You've got rings on there. You don't have whatever previous monitoring was on that node because you ripped it out. So you need some basic Icinga checks to sort of sanely monitor this cluster. At TBC we use Icinga to pump warnings into hip chat and critical alerts while also pump into pager duty, call the on call person, maybe call me, fix whatever's going on. First one on the list, Swift Drive audit. This I feel like is even more critical than the last one. Even the last one is more important to your customers. Swift Drive audit is a core Swift tool that scans kernel logs for XFS corruption or XFS error messages. So for Icinga you basically make a wrapper class around that that runs that check, scans the logs and reports back if a drive has shown any corruption. So you know, oh, I have a failing drive on this node. Maybe if you have a larger cluster that generates a work order, take it for data center ops to go and replace that drive. The next part of Drive audit is it can also be used to unmount a failed drive so that Swift can work around it a little bit better. And that's also a core tool. So good thing to add to Icinga is a check for unmounted drives. And you do this by using Swift Recon, which, again, another core Swift tool, the Recon endpoint is exposed on object nodes and it just gives you a bunch of different data on, like, unmounted drives, disk space usage, almost anything you could really want. And you can pump that, again, right into Icinga. So that's check for unmounted drives. Swift Dispersion report is done by checking to see if a deliberately distributed containers and objects are currently in their proper places within the cluster. It's a core Swift tool. Puppet Swift helps you configure that from proxy nodes. And I have this report back up to Icinga. And it's kind of a nice high-level view of, you know, our objects where they should be. If they're not, you could consider maybe there's some file system issue that's causing replicas to not land where they should. Just a good top-level view of, hey, is everything okay? Icinga pending replication time also important. Probably more important to track over time than just a high-level view. But if you look at Icinga, notice, hey, usually I have zero async pending and there's 10,000. You better get in the logs and find out what's going on. Then the next thing is a bunch of operating system level checks that you'd have on basically any node. File descriptors, is Puppet running properly? What's the load like, disk space? And then most importantly, from your customer's perspective, is a basic Swift crud check. Every 30 seconds, create, upload, delete an object. And this will be one of the earliest indicators that something's gone wrong, is a check like this failing. This is what this, you know, looks like in one of our environments for an object node. You can see Puppet agent, hey, it's running fine. Check drives for errors, no errors found. All devices are mounted. All right, hey, this node's pretty happy. Nothing looks crazy. So that's helpful. Same thing for a proxy node. Only you can see Swift dispersion report output on there. The next step that I'd be working on in the next few months is we actually have a menasca installation set up in our environment. I'll work on pumping stuff in there like time to put an object, error rates, basically tracking any sort of request to Swift. And I can then better model the performance of the cluster, find out what's slowing it down or speeding it up at different times of the day. This is a whole other bucket of work to get into in a whole other presentation, really. Final part here, coming up, performance tuning. And this is performance tuning the operating system that Swift's running on as well as actually tuning Swift itself to better fit your environment. Tuning the operating system is a very complex thing. So what I've done here for you is put together a list of the top values I found. This is from searching email threads, message boards, the actual Swift deployment guide, the OpenSack Swift deployment guide. And some of them are pretty critical, right? Your proxy under high load might run out of connections. And now that you are deploying this with Puppet, you're sort of in charge of ensuring it has what it needs to do this. So this is a couple of days of work, but you sit down and read through these, understand how they apply to your environment, adjust them, benchmark, find out does it help you. Just good to be aware of. And again, download the slides, read through these, see if it helps you out at home. Next, though, you can performance tune at the Swift level. And there's a whole lot of fun stuff you can do here if you're benchmarking your cluster. I found, for example, on an object node that had about 34 disks, I got a significant boost in performance going from eight object workers to 16. I later learned you can tune that up further by changing a threads per disk value even higher. And I learned this by this OpenSack Swift book by Joe Arnold and members of the Swift stack team. Really, if you're going to be running a production Swift cluster, you should have read this entire book and understand it pretty thoroughly. Otherwise, there's just going to be a bunch of blind spots that you're not aware of in operating Swift. So it's a great one. They pass them out for free a lot of times if you just ask them. So really well-written, good info. Those guys have been working on Swift since the very beginning. Important part, though, is in those class declarations and those profile manifests earlier, I left out all the different parameters you can pass in. But if you look into those classes, you'll find you can tune almost any of these Swift settings in those classes. So you have the ability to tweak the cluster however you'd like to match your environment. Final piece you can't forget now that you're running Swift yourself there is log rotation and logging. That's a pretty important part that gets overlooked. By default, Swift would just log to syslog. It's kind of messy, can't really split out what's going on there. Puppet has an R syslog and log rotate module you can depend on and use. These are some examples and links to getting these example files. With our syslog, you basically can break it out on server type, an account server, an object server, and then log to that specific file so that you can come in and track problems on that server or narrow them down further. But that's not enough. You'll need log rotate in place so that you don't fill up your root disk or your log partition and cause a bigger issue. Maybe rotate on a number of days or a log size, something like that. So some work and progress stuff here in Puppet Swift. The first one, it's been a lot of work, it's had a lot of great feedback from the community and I think it's pretty close to getting merged. It's the concept of managing Swift with a Swift init-based provider. Right now in Puppet, there are distribution-specific providers that say, as a service started, is it running, I need to restart the service. And they're not super graceful for Swift. Swift init is a tool built by Swift for starting and managing Swift processes. I implemented this as a custom provider of the service type and this does things such as gracefully reloading a process, allowing connections to bleed off before it restarts and it'll allow you to also further expand into running dedicated replication networks where you need to start Swift servers out of more than one configuration file. The module currently can't support that. This review should help pave the way to get that in there. I mentioned we're also running an erasure-coding cluster. That requires storage policy support. That's not really in Puppet Swift yet. You can kind of hack it in there using Swift Config if you need to. I would like to get this in in the next few months as a core feature. And then in an erasure-coded cluster, you actually use the Swift Object Reconstructor versus Swift Object Replicator. So we need to put support in there for that and kind of test it out. So more info. You can look upstream at the Puppet Swift work. You can find me on IRC, OpenStack operators as well as Puppet OpenStack room as Vinch and I can help you through this stuff. There are a ton of great reviewers at the Puppet work. So if you have a change, put it up. There's a lot of really good feedback. There's people that work for Puppet Labs that work on OpenStack, all that. So it's really good community. So any questions? Everything up until this is upstream. Yeah. Okay, thanks.