 From Austin, Texas, it's theCUBE, covering OpenStack Summit 2016, brought to you by the OpenStack Foundation and headline sponsors Red Hat and Cisco. Now here are your hosts, Stu Miniman and Brian Graceley. Welcome back to theCUBE, SiliconANGLE Media's flagship program. Go out to the show, help extract the signal from the noise. This is OpenStack Summit 2016. Really happy to have on the program, first time guest Rich Hague, who's the Global Head of Reliability and Operations with Patty Power Betfair and Super User here and first time attendee at the OpenStack Show. Rich, thank you so much for joining us. And it's great to be here. All right, so Rich, Global Head of Reliability and Operations. Tell us a little bit about what's your job? Patty Power Betfair, what do they do? Sure, so it's the longest job title in the world, I think. It's vaguely looking after two areas. So I have a site reliability engineering team and they're really looking at maintaining the site, keeping it available. And I have the operation side, which is everything above the hardware that runs all of our Linux production estate. So between those two teams, we keep the lights on, we keep the site going, but we also look at getting software from our developers through to production as quickly as we can. All right, and the company? So Patty Power Betfair has just emerged from Patty Power and Betfair quite recently. So if I talk about Betfair occasionally, I mean Patty Power Betfair, apologies in advance. The company's one of the world's largest online gaming companies, gambling companies. And it's been, it's about 15 years for Betfair. I'm a little bit longer for Patty Power, but very technically led across both brands. All right, so yeah, to totally understand, if the site's down, you guys aren't making money. So talk a little bit about what led you to OpenStack then, how that fits into kind of, you've obviously got websites, but what do you think of, what is cloud to you versus websites and where OpenStack fits? So the business is very technically led. It's growing very aggressively and has done since its inception. So what led us to cloud really was something that would give us that scale, that ability to react to the demand that we ourselves are creating by growing that business and growing that product. Coming to OpenStack was a bit of a wider question, I guess, so we were looking around to replace and bring up to date a lot of our infrastructure. It had served us pretty well up to a certain point, but as things kept growing, we needed to be able to put the next generation in. So we looked around for how we would do that. We're a regulated business, so we couldn't just go out to an AWS or somewhere public, and we had to be able to actually kind of point to the systems and to the boxes ourselves. Long story short, we ended up looking at an OpenStack solution, so KVM, OpenStack on top, and a whole host of orchestration and tooling on top of that, which would give us the ability to have that production workload and continue to run that at the scale that it needed to, but also give us a chance to increase and improve our automation and our tooling into the next generation to speed up all of the codes through to production that we were trying to do. Give us a sense, obviously rules, laws are different here in the States versus where they are, but give us a sense of what's the scope of what people can make bets on, is it just sports, is it all aspects of what's going on, presidential elections and so forth, and then how have things like mobile sort of impacted that, or people can get information online, interact with the site online, help us understand the scale and scope a little bit. Yeah, sure, so the business is primarily sports betting, although there are some special markets that will allow you to bet on some of the weather or who's going to be in the next president election or who's going to win some TV show. There's a few of those as well, but it's predominantly sports, and most of the business is predominantly in in UK and Ireland and around Europe. It's been quite important for us to go for regulated areas so that we can make sure that the experience that we put in place is right and proper. So there are some places like the States where there are places that you can't gamble on our site. The sporting industry is quite interesting. We get some sort of periods where we have great big events, and we have an annual event in Europe called the Grand National, which is, I guess it's a bit like the Kentucky Derby, it's the biggest horse race around. So actually trying to put in place a system that can scale to meet those kinds of demands, have its own challenges as well, especially when you can't take a public cloud and just burst up and burst down your capacity for that one day. So these things all, that kind of sporting side of it plus the occasional big events as well, it's quite an interesting challenge. You sort of get this lumpiness of, how do you deal with that? For sure, well, one good thing is because it's a sporting event, you can see it coming from a long way away, so we can plot this on a calendar. To deal with it's quite interesting, you spend a lot of time testing at that scale to make sure that you can cope with that kind of demand. And there are lessons that you learn from each event that you'll then roll into the next one as well. So, Rich, you talk a little, if you can, explain a little bit the OpenStack environment, what projects you're using, what partners you use to put the thing together. And one of the things we've been looking at is unlike some technologies where it was like, oh, let's stick it in a little corner and test it and do something that might not be important, we've been saying you should do something that's important to the business, has to be something bigger so that it can succeed. So, has that a setup? Please share a little bit about it. So, the products we're using, we've got KVM virtualization at the bottom, we have Red Hat's OpenStack platform on top of that. We have a whole host of tooling that, a lot of it we put together ourselves from various open source projects to give us that delivery side as well. We've partnered not only with Red Hat, but with Nuage Networks, and Nuage bought with them the software to find networking, which was quite a pivotal part of the project, but quite an interesting decision at the start of the project. It was something we hadn't done before, it was quite new for us. And in fact, we talked to a lot of different analysts and looked at a lot of research to try and work out whether we should take advantage of this software to find networking when we started the project. It's about a year ago from now. And as many came back and said you definitely should, as came back who said you definitely shouldn't. So, we spent a lot of time talking to Red Hat about how could we use this, how could we make sure we were picking the right partner. And between ourselves and Red Hat, we chose and partnered up with Nuage as well. And in fact, it's worked remarkably well. The software to find networking gives me the ability to deploy and to mutate the network at a pace that suits the developers, but also gives us a performance boost as well over traditional networking where perhaps we had to hop out of the network to a firewall or some security device and back in again. Now we have these distributed firewalls that sit around the hypervisors and give us the performance improvement as well. Inside the OpenStack stack itself, do you know how many projects you're using? Gosh, no, I don't. I would have to get one of our technical guys to list them all out, yeah. You talked a little bit about, you're regulated within a regulated industry, which people tend to think, okay, it's regulated, it's going to move fairly slow and people are able to do manual tasks in IT. But at the same time, you've got software development, you've got continuous integration. Talk a little bit about how that fast-moving software development aligns to also having to do compliance and maintain the network and how do you find that right balance? Yeah, I guess it's a tricky balance. And when I look around at other players in similar industries, it's certainly something they struggle with, how they can give the right level of compliance in a way that people are happy with. For me, it all comes back to being able to automate it. I've got the saying, I'm trying to coin at the moment, which is dot, dot, dot as code. I'm trying to do everything as code, trying to encode everything. So whether it's deploying our application, whether it's deploying a part of the infrastructure, or whether it's embedding some of the governance, some of the checks and audits that we need to do, if I can, I try and encode those, we try and put them into pipelines, we try and make it so that, instead of testing once in a while, why not test every time? If we can encode it, if we can put it into a pipeline and test every time software goes through that pipeline, why would we not do that? And the compliance side, the auditing side likes that as well, because what you're left with is a fantastic log of activity that is absolute and complete, that it's not a point in time check. Yeah, that's a great point. For a long time, people thought, well, the clouds are less secure, I don't know how I'm going to do auditing and compliance, and at the end of the day, you say, well, if I have records of everything we did, and they're done repeatedly the same way every single time, that's an auditor's dream, because they know where the information is, and they know what the process and tasks was. Not just repeatedly, but reproducibly, so it doesn't matter who's doing it, it's still that same output every time. So you can come here for a few days and the staff will keep working, and it doesn't, yeah. Yeah, absolutely, so that's been very helpful. We put a bit of time into looking at ways that we can internally as well for our own audits, so that we can embed things like security hardening, checks, whether they're lesson learned type checks, or whether they're hardening types checked for patching, and how we can encode that and make sure that that goes through its own delivery pipelines. So we want to get to a point where every time a developer spins up an EVM, the base image of that OS is something that is already hardened, already patched, is already up to date, and that's happening behind the scenes for them. They just take the latest gold-starred OS build that's ready to go and take that through, and we're happy that that has been through all of those processes and is hardened and is safe and secure. Right, so Rich, for peers of yours that might be saying, hey, this sounds kind of interesting, I'm looking at OpenStack, and you get a little guidance as to a few things, timeframe, how long you spent kind of planning it, how long the rollout went, did that meet expectations, budget, and the other thing, kind of the operations, your people, what kind of training, re-skilling, moving people around, I know it's a big question, but kind of the same thing. Yeah, and I went and it's been a big project. In terms of timeframes, one of Betfair's values was pace, and it's something that we've lived up to for this project. So I think it was probably around a year ago we started talking about, should we be looking at an OpenStack project, could we do this, could we get it to scale and work, could we get some of these advantages from it. After engaging the vendors, after choosing the partners that we went to work with, we went into a proof of concept, and that was a, we time-boxed that to four weeks, and in four weeks we wanted to create a very base OpenStack setup based on the hardware that we chose and based on the software stack we were putting together, and just run a small set of functional and performance tests against that to make sure that the output of the RFP, we could see it for real. And that was fun doing that in four weeks. A lot of the vendors we were working with kind of eyes went wide when we said, hey, we want to try and do this in a month, but we achieved it, and it also got the partnership working as well, getting the people working that pace. We bought in some of the guys on site with us, they sat with us, they co-located, and in four weeks' time we had proven that we could do this and it looked fine, and that kind of unlocked the budget and the steps to then go to the next phase. That's our pilot phase, which was growing the seeds of what would become the production infrastructure. This is now in two data centers, it's fit for purpose in terms of all the tooling, all the monitoring and everything else that sits on top of this, and we were aiming for around six months to go from zero to ready for production, with 100, 150 hypervisor type nodes making that. So that itself was quite aggressive. Our pilot would end up being bigger than a lot of our reference sites, entire production sites were, but we had the confidence from the initial proof of concept that we could do it and we knew what we were going to do to get there. So we're just at the end of that pilot phase now. We've just put some workload live, we're just starting to now ramp that up, and we're into the third project, which is my migration project, which is probably the longest of those, and it's probably going to stretch around 18 months, and we will now have to take around 200 different applications and move them across from our old legacy estate into the new estate, and it's not just the lift and shift because it's part of moving them across, some of them will have some architectural changes to make to either run active-active or to be able to be deployed immutably rather than just continuously deploying on top of an existing box. But we're also using this as a chance for all of them to adopt the delivery tooling to make best use of the pipelining, the monitoring, and all of the checks and tests that we're doing as part of that. So not only is it a migration project, but it's also a very nice cleanup of the estate, getting us into a very fit state going forward. And then the fourth project, which will be a decommissioning project, will then help us take down and take away all of that old estate, one to get rid of it, but also free up some room in our data centers for the new estate as it grows out. So it's a fairly long period of time, but we've been going at it with some pace. Okay, so on pricing, it's open source, so everything is free, right? Ah, for sure, yeah. Yeah, no, it didn't work like that. So I'm not going to go into too much about the budgets, of course, but for us, the real benefits were around trying to provide the pace of delivery for our development teams and providing a reliable infrastructure for the business. And those were really the key things that we were looking at. I guess the thing I want to poke out without going into too much detail is did the, where you are right now, were there overruns? Is it priced about where you expected? Any surprises that people should look out for? Yeah, so no financial surprises at the moment. In fact, if anything, we had a nice surprise. The distributed firewalling inside New Arge meant that I had some risk budget put aside for some fairly fat firewall devices to sit on the outside. Then actually we've realized we don't actually need, in fact, the security model inside New Arge is good enough for what we want. So pricing-wise, it's been one of the nicer projects. And completing this arc, talk about the impact on kind of personnel operation skill sets. Yeah, that's an interesting one because it's one thing landing in the technology, but trying to land the cultural change is much more complicated. Now, we're quite ahead, I think, at Paddy Power Betfell. We're quite a mature company in terms of the DevOps approach, but in terms of developers owning their own products from inception right the way through to production and to eventually being decommissioned where we do quite a good job. What we've done is we've taken that model and we're putting it into the world of infrastructure as well. So this comes back to the dot, dot, dot as code. Our network engineers, our storage guys, the guys that keep the lights on on the physical estates, they are now also starting to be able to make use of the tooling and that kind of DevOps approach whereby we start to treat these things no differently than any other software application we might have and those teams get as much say and as much ability to change and kind of plan what they're doing as the development teams would do. So yeah, there are some changes. I think it's for the better. Hopefully, just like in the software world, people will stop doing the monotonous tasks over and over. We can encode those, we can give them tools to do that and they can spend their time doing the much more interesting work. You gave a great example. You sort of went from a pilot, you wanted to sort of force yourselves and Red Hat to work a certain way. You reached a milestone where you said the budget's got unleashed to go bigger. You're still in the process of building some of this out, operating and building. Have you figured out sort of a new set of language or a new set of metrics to talk to the business and say this is what it's doing for us or are there things yet that they go, oh, this is bigger, better, faster, whatever met what they were looking for? So probably the most impactful are where we are looking at the time from checking in a piece of code to it landing on a production estate. That in the previous estate really varied and it varied on the tool chains that those teams were using, whether they needed specific exotic hardware, et cetera, et cetera. What we've been able to get to now is a place where to get yourself a new machine, to roll your code to it and to put it through a series of testing, if you exclude, let's say, the crux of the test packs, which could be, I guess, a variable size, we're down to minutes to be able to do that. And previously that was, perhaps days or even in some occasions, weeks. So what I'm hoping, the biggest impact we'll find on the business will be that developers will be able to much more quickly take their code from checking through to production and be able to do that repeatedly without having to raise tickets or wait in line or go and have to talk to someone to try and understand their requirements. Yeah, so you could literally now sort of connect the dots between business idea and execution in measurable, measurable ways now. Yeah, and what I'm hoping we'll do is we will start to track these. Paddy Pavettef has quite good at measuring everything we can. So whether it's, I don't know, CPU spikes or whether it's some of the process we're doing, we will try and look at the data and drive our decisions based on that. So what I'm hoping we'll do is we'll start to be able to track these delivery times, the testing times and that will become a set of metrics that we can then use to try and make these processes even more efficient. Whether that's changing tooling or changing the way we use the tooling or run stuff in parallel, we'll have all of these options open. So Rich, how would you characterize the maturity, performance, scalability of the open stack environments today? So from what we've seen so far, it looks pretty good. We are at the start of our migrating things into production and I'm sure there are challenges to come. Some of the infrastructure that we run is very highly performance. There are thousands of transactions per second per node. So we have to make sure that we put in place something that can cope with that. But we're looking forward to things like the Ironic project where we should be able to provide a bare metal provisioning using that same tool chain and all that same processes, but give us a bare metal box that can cope for some of those more exotic requirements. And then that opens itself up to perhaps containerized solutions in the future as well that can sit on top of that. So I'm happy that we've got something that's fit for purpose now, but I'm also happy the roadmap going forwards seems to be going in the right place and it will provide us with the capabilities we need. Yeah, I have a term I use sometimes called data feedback loops. I mean, you're now reaching a point where you can move software faster. You're going to get feedback from the site reliability team on things we're working. Can you talk at all about how you use data to help you make decisions and who it gets shared with? Is it just within your team? Does it share up to the business? Excuse me, one of the things I'm particularly keen on is transparency and data. So when we are measuring things or looking at things inside the business, we try and share it with anyone who would like it. Sometimes people will think of a use for it that we hadn't thought of, so they should have it. It should be open for them. In terms of actual metrics, I'll give you a good example. We run a time series monitoring system called OpenTSDB. It's another open source project. And this looks at all of the time series data across our production estate. And put it into scale, we consume something like 100, 120,000 transaction points per second every second of the day, every day of the year. And we have absolute granularity of that per second all the way back for about four and a half years now from when that project started. Now that wealth of data is fantastic, not only in how other systems are running now, but forensically, if I want to look at something that blipped or I want to look at a key sporting event last year to see what the scaling factors were so they can forward-apply them to this year. So it really is very data-driven all the way down to that kind of per second stuff. And we use it daily, and it's shared across as many teams as we can get it to. Yeah, that's fantastic. And that's really interesting because now you're going to have a common language you can talk about, right? Consistent things you can look at and what's relevant to your team might be different than what's relevant to your sales and marketing team around an event or something like that. Absolutely. Interesting, interesting. So Rich, with hindsight now, I know you're not completely done with the full rollout here. What advice would you give to your peers? What would you say, you know, I would change a little bit different or, you know, move a little faster or, you know, wish I had done this, you know, involve these people? You know, what did you learn that you might do a little bit differently? That's a good question. I'm fairly happy with what we've done so far with some advice from my peers. I think you can't underestimate the importance of the people side of a project like this. And that's not just the people inside your organization but it's also the people that are partnering with you and your suppliers. One thing that we've tried very much to do is once we've chosen these partners with Red Hat and New Osh, is that we wanted to treat them as if they were our own. So we've tried to reduce that kind of vendor-client barrier as much as we can. And a really good example of that is some of the times where we've maybe had a technical issue and we've all been sat around trying to work out what happened, why did it happen. Paddy Pavettfeld does a very good post-instant review, a blameless post-mortem where everyone very openly speaks around what they think happened. There's no finger pointing, it's no one's fault. We always use it as a way to try and learn to help us going forwards. And what's been quite interesting is bringing the vendors along for that ride, getting the partners involved in this as well and watching their teams go from a very consultant-client approach at the start to actually now rolling up the sleeves and getting involved in this and being part of this kind of blameless post-mortem stuff. But I think that kind of goes down into the bigger question of if you're thinking of embarking on this journey and you're thinking of picking some partners to help you do it, try and involve them as soon as you can and involve them completely. Bring them into your business, have them spend time with your people and really treat them like they're part of your business. All right, so Rich, this is your first time coming to the OpenStack show. What's your experience been? I don't know if you've been to Austin before but love to hear the show, your peers, the city. Yeah, it's my first time to Austin. It's my first time to the OpenStack Summit. It's big, is my reaction. Everything's bigger in Texas. Wow, yeah. I was amazed at the scale of this place actually. Yeah, Betfair has some scale but this is big. It's been really great. So the different tracks, the sessions, the keynotes, it's been fantastic walking around and being able to cherry pick between various things that you're interested in. But it's not just the sessions that are going on that you can attend, it's the guys in the booths, it's the other clients, the other people that are using OpenStack and to spend some time with them and share their stories and their discussions around what they've done and the problems they've had but also what they're looking forward to in the next few releases. That's been fantastic. That's great. Sort of last word in terms of where you see this going. I mean, do you guys expect to sort of actively try to participate as part of OpenStack? Do you feel like, look, in working with Red Hat they sort of act as your proxy on behalf of you? How do you feel like it's the right interaction between you and the community, whether it's in terms of just working in the open space or maybe eventually writing code? Yeah, so we would like to get very involved. Panic Power Betfair has a very good history of being involved in open source communities and I don't see this as being any different. Yes, we want to commit stuff back. In fact, I think we've recently just put around 40, 45 ansible modules for some of the new large networking stuff and push that back out to the community, so we'll continue to work providing answers to problems that we see or issues that we found and sharing those with the community. That's all part of the process of taking something open source. But it's also, I'm quite happy that we can talk about what we're doing, that we're happy to be here at places like theCUBE talking around our experiences in this process so that the rest of the community can learn from us but also maybe listen and think, hey, okay, I'm doing something similar to that guy. I've got something that might be able to help. Let me go and have a chat with him. So on all of those things, I'm very happy that we are part of this bigger community and I look forward to being an active contributor going forward. That's great. I know we get to a lot of meetups, we get to a lot of events. I mean, we could have played Buzzword Bingo. We were talking about DevOps and infrastructure as Kona. It's great to see it implemented, see it in production, see it at the non-unicorn Silicon Valley. You know, everyone sort of worries is that just within Silicon Valley? It's great to see what you guys are doing in the rest of the world, taking advantage of this interaction between your business, vendors, the open community. It's great to see it in practice. Yeah, it's great to be a part of it. All right, well Rich, really appreciate you taking the time. One of the super users here at the OpenStack show, we'll be right back with more coverage here of OpenStack 2016. You're watching theCUBE.