 Hi, my name is Jeff Bozinski. I'm a software architect at Yahoo and here up on the stage with me is Joshua Harlow Hi lead programmer on the open stack project at Yahoo today I'm gonna talk to you a little bit about how we're using open stack at Yahoo and Give you a little bit of background on some of our use cases some of the challenges We've encountered and where we're going with the project So first of all I want to sort of set some Dimensions of sort of the problem that we're tackling at Yahoo so when When we talk about infrastructure at Yahoo Yahoo, we're talking about some fairly large numbers to support this kind of Audience where the top ranked internet audience over 800 million monthly users global Across many countries many languages and we're mobile 300 million 390 million mobile users In terms of the scale behind that if you look at any one of the web properties or Mobile experience is talking huge numbers millions always in the millions huge amount of storage huge amount of user activity and In order to get open stack deployed across our entire infrastructure. It needs to be able to scale to these dimensions And not only scale, but grow We're growing rapidly 25% jump in on-page interactions, and that's kind of typical across The breadth of our properties and mobile applications and so forth. We're seeing tremendous growth And it's not only growth in terms of traffic. It's also growth in terms of developer talent behind all of those experiences Excuse me We've been making a lot of that acquisitions for talent over the last year and we've been hiring tremendous number of developers and That's what sort of makes it all work those developers and What those developers of course want is power more of it more Network bandwidth more compute power More storage. It's never enough, right? Yeah, never whenever your whenever you're a developer You always want more resources the company can never give you enough even at Yahoo It's like we always we're always hoarding hardware and that sort of thing and that's where open stack comes in we've deployed open stack on three continents over a half dozen data centers many clusters and With open stack. We're providing that compute resource that power to those developers so they can bring The new experiences to the world In terms of our use cases the place we started was in a thing called open house every developer at Yahoo today gets Automatically on boarded to one of our development clusters that are spread across the world Asia Europe North America a developer can create Resort can acquire resources and do development on any of those clusters across the world. This has been it's been huge Right, I can give a little more details. Yeah, we set up this with horizon as you can sort of see the front page First page there looks like horizon We spread it there's a good a certain quota for the number of developers that we want to allow them access to They've been very responsive to it like gives them instant access to virtual machines, which is something they never really had before so they've been very happy with Having to get rid of desktops in a way They're able to provision these VMs in in seconds when they wanted to try out a new application or try out a new package They've been very very happy that we've been able to provide that something that Yahoo Has tried and it has really succeeded this time and we've got a lot of great feedback for that So this website really starts to help make that like the initial steps into the developer lifecycle in a way Yeah, and developers love it. They've been thrilled It's you know used to be when you came into Yahoo, you'd get this underpowered Linux box underneath your desktop I think by the time But the time you got it all hooked up it was already obsolete, you know, like 2 gig of RAM dual core And it's like, you know, whatever OS was on it when you got it that was kind of where it stuck And now this is like you'd be surprised something as simple as this has just been Transformational really and the developers now are like most the most common request is how can I get more quota? I just want more and of course, you know the way they used to hoard hardware people like Josh, of course Now they're doing that with VMs, but that's that's great. That's what it's there for and we're pretty happy about that Some of the other use cases that we've initially tackled CI is a natural the variable loads as developers sort of come into work and check things in and need to do builds on commit and so forth Open stack is a natural for that The elasticity and the flexibility to accommodate these workloads another thing You can imagine that we need to do a lot of his browser validation you know whenever one of those new web experiences is Created there's a tremendous number of Browsers that we have to do backward compatibility testing on and That's something where open stack has been a great fit You know, we we've plugged windows into the open stack infrastructure and we've got a lot of Validation testing going on effectively so that's been awesome and And finally, you know aside from the sort of the internal tools and sort of the data use cases. We've had a lot of those On the data side, it's sort of early days still And it is so as well with production So now when you go to Yahoo potentially you're actually going through an open stack cluster when you're getting that user experience We've sort of been applying it to you know, you know sort of it's still early days And we've sort of been applying it to things like peak and seasonal loads and so forth But the goal here as we get more experience and more confidence in the system is to roll it out across the entire Yahoo infrastructure, so you can imagine the scale of that and You know sort of some important characteristics that we're going to need as we're doing that that we want to get to And that's where I'm going to talk to the challenges a little bit So You can imagine, you know, if you when you go to a when you go to a web page on Yahoo You can imagine maybe you're just getting some content from a web server, but it's of course, it's much more complicated than that An application, you know save finance or sports or something like that is comprised Just in a single cluster of many thousands of hosts and you're engaging Your request will spawn off many other requests to a deep stat a deep and wide stack of systems And so the scale of that is something that open stack needs to handle that kind of an application cluster You know a couple of thousand hosts That's kind of the size of a lot of people's open stack clusters in general so we're trying to build to a much larger scale and We've had some challenges along the way And as part of that scale also you've got a dimension of elasticity You can imagine traffic spikes when there's a big news event might Result in you know a need for significantly more resources and so open stack As we get further down into the production use cases. These are some of the things we needed to solve for us and that we're working on Reliability is huge. I think everybody feels that way But when you're sitting in front of a lot of consumers and your revenue model is based around advertising When you give a user a 500 or a sites not available You're effectively not able to pay the bills so Reliability is really crucial for us and again, you know going back to that elasticity part of the equation if We're trying to spin up a lot of instances to meet some type of Surgeon demand the system needs to be able to respond as well and we have had some Challenges and reliability. I can talk a little bit about sure So we've there's some interesting situations that I think that we've had to handle that probably I mean most people have had here as Well, like we've had if you use your cluster backed by q-cow images You sort of have to deal with power failures We've had some issues with that that there's being worked through in the community I think as well with q-cow corruption those kind of things. We've had to build automated tools around to help repair them Consistency of state is a big thing that I've been working on and as well. I think yahoo and others That to help improve the reliability of the whole system. So we're trying to build out some libraries I've been working in the Havana cycle and ice-house cycle to do that kind of stuff There's gonna be a session later on that as well. So that will overall in the end will help make this Really highly reliable system that everybody can depend on because we definitely need to and the reason why it's important When you think about the way the way we're running open stacks, so we're from an infrastructure team You know we provide this compute resource for people How they're using it, you know, we have no idea effectively we've got you can kind of think of yahoo is a large collection of startups You've got a lot of independent entities that are all iterating on their Application stacks independently, so it's not like we can go and tell them, you know, like oh, we're shutting down the cloud You know go use the other data center. Yeah, the applications are sort of built for reliability and high availability and all that good stuff But the foundation has got to be a solid infrastructure. We're coming from a hardware world There's expectations that are built in to a lot of the applications Just it's just sort of legacy and you know, obviously when I had to be at that same level with cloud It's kind of a contract that is kind of you don't want to set those expectations But still the the reliability is very important We can potentially be impacting a lot of users if we have infrastructure problems that are sort of due to due to our implementation and Then operability, you know Again with so many data centers and so many clusters in the data centers and the fact that we can deploy open stack It's sort of a Constrained size in terms of hypervisors that that gives us some challenges as well Our ops team is sitting here Some of our ops folks are sitting here in the front row and they're probably not exactly thrilled with a number of clusters that we have And it's a it's a pretty big burden on them to do some of the upgrades And maybe you want to talk a little bit about that some of the challenges that we've focused on you know I've done great job. So thank you from the front. Thank you So there's some things there's a lot of things that you have to sort of adapt to when you're running and open Stack cluster and just in virtualization in general and it's been it's been challenging for a lot of us But some things like how to tune KVM how to deal with memory and restrictions quotas and when Linux isn't very happy It will eventually and when it's pushing this memory boundaries They'll eventually just kill one of the VMs almost at random So there's certain aspects of the system that are are new to us I think and new to people that have been used to bear metal that you you have to a lot of knobs to turn So you can turn you can turn your KVM before tune your KVM performance You can tune your memory performance So there's a lot of new things that we're learning as we go through this process There's quota consistency issues that we've hit. I think other it's been getting better in ice house in Havana So there's lots of different back ends for the virtualization drivers There's all all these kind of things all these moving parts. So It's good in a it's good in a way that it helps the very diverse community But it does also we need to once you have all those knobs that turn you need to come to a standard set of What is the best set of knobs? So those are our issues that we've had to deal with our obstacle I've been very helpful in understanding that and we as we as developers have also tried to make that Easy for people to use I hope Another thing that we fit is upgrades like we we've been running about three or four versions of OpenSack And we've done live upgrades, which means we're not turning off VMs Like and shifting over to a whole another cluster So that brings into whole packaging concerns and how do you pass your package not corrupt the VM state? How does it migrate the database correctly? So there's been a great great work in the community to make that possible and it's been very Very interesting to push that boundary at least internally to try to establish all these processes around RPM upgrades Package upgrades and testing a lot of it's involved a lot of testing from our QE folks as well to make sure that When we do an upgrade some from say Essex to Falsum or Falsum to Grizzly that nothing is actually destroyed in that process And it's and we haven't we've been doing good so far. So I'm proud of that So yeah, so all these sort of all these kind of challenges We've had to work through I think the community is starting to realize that it needs to become a little bit easier And I think I'm glad that that's happening and I'm really thankful for everybody for making that possible Yeah, so in the long term what we'd really like to get to is like to get to a nice balance between Sort of the right number of clusters in a given data center not too many not too few, you know There's good reasons to have more than one But we don't like I want a hundred let's say So that's something that we're very interested in working with people to solve and we'll be doing some work To solve it ourselves, and I'll talk a little bit about that in a sec Some other some other challenges simple things like cloud transformation, you know, we're a fairly mature company People have sort of a the contract or the life cycle of hardware in mind when they get VMs It sounds kind of crazy, but it but it's true And so trying to change that mindset a little bit of a challenge and also a lot of the infrastructure tools the things We put around to open stack again those were designed with hardware life cycles in mind You know you sort of do batch things to hardware and that doesn't really quite work in an on-demand world And we've been working to change some of that internal stuff at Yahoo, and of course we love the community We've had great success on on Hadoop as Yahoo We're sort of synonymous with Hadoop in a way And worse, but we're but on the open stack community, you know, we're still learning how to be effective But we are contributing we've been very active in the foundation Sean Roberts here is Sort of leading the charge both in the foundation for us and on community we hold meet-ups at Yahoo regularly have large user groups and We want to help build that community. We think that's healthy for open stack, and if it's good for open stack It's good for us, and of course we're contributing code Maybe not as much as we'd like to we'd like to contribute more code And we're working pretty hard to do that and we're building out our teams and growing our teams and we'll be doing More code commits here at the summit. We've got a lot of a lot of people present I think we've got 15 or 20 yahoo is here that we're in purple shirts You should go and harass some of them ask them about yahoo. I sure they all know They can tell you a little bit more about what we're up to In folks like Josh are doing some talks this week before you talk about that Let's move forward and talk about where we're going because I'm running out of time So first off the way we view we way we view open stack is it's not just for VMs We're very interested in the bare metal. We're leveraging some of the bare metal stuff, and that's pretty important There's always use cases that you know are very performance sensitive and where bare metal allocation makes sense And so we're we're already using that stuff lightweight containers. I think are huge You know we we don't really run, you know in a way We're not a public cloud we're not selling a product per se and we don't have untrusted workloads lightweight containers represent a pretty appealing Place to run compute jobs without much of the overhead there So that's that's pretty important to us and then some of these other things like load balancing to give us that Elast is the database and then maybe Josh you can talk about a couple of these other things Oh, yeah, I'm doing a I've been doing for the last at least six months So this this project me you've heard about past flow, which is trying to sort of organize workflows for different parts of open stack It's getting some success there I think there's a session actually after this about it with glance So there's also a talk tomorrow a couple other sessions on the on the design some of you can check that out We've done some also some work internally on fault detection and the self-healing concept where we need to instead of getting Customers that would basically complain to us via bugs or other contact points We want to try to predict that before they Before they rip like open a bug to say there's a problem here So there's these automated tools that we're building and hopefully open sourcing that we can actually do some of this self-healing or fault orderly fault detection On behalf of our users instead of having to wait for our users to actually report that Upgrade information upgrade is sort of expected There's been a lot of work that we've done I think as well and others have done around the whole upgrade of open stack and Sort of the processes that are involved there Hopefully we can get more public with that at least I'd like to and we can Try to make sure the community is healthy in that area. So yeah, there's lots of stuff going on Yeah, I mean, I think we'd love to obviously love to work with community and a lot of this stuff as well Josh is already working with a lot of people on the task flow stuff You know, we try to share whatever we think applies or is is worth sharing with a larger community Especially on the upgrades. I mean, I think this is a pretty critical area anybody who's running a cloud needs to you know manage sort of the chaos of the upgrades and You know with sort of the agile mindset of today you want to do be doing more frequent releases I know rack space is kind of solved. I'm not quite sure sometimes how they do it We're we're we're getting there, but we think there's a lot of stuff that needs to go into the system to help Make that possible So it's kind of a short talk, you know, it's only 20 minutes a little bit of a teaser We're gonna try to take a couple of questions here So if you want to go up there and ask a question, that'd be great Otherwise, you know, please attend one of our other talks or design sessions harass the yahoo's that are here in the front row and ask them about yahoo and all that and Drop by a booth. We'd love to talk to you. Yep. And Anything else Josh? I think we're good. We'll open up for Q&A. Yeah, anybody have any questions. We put us on the spot tough tough ones are okay Maybe a shy audience. Oh One more thing I did want to mention actually while I'm up here If you stay in the hall after this yahoo, Japan is gonna be here talking about some work that they're doing with brocade on Elbaz that's load balancing as a service. So that should be pretty interesting and If there's no questions, I think we're gonna sign off anybody. Okay. Well, thank you. Thank you guys