 Hi, thank you for coming. We are Going to present on running a private cloud open stack as a business Welcome everybody Start off with a couple introductions Megan We're gonna go through a couple of introductions And then we'll have some talking points that we're going to go through. We are not very slide heavy We thought we would talk through a lot of the information and then open it for questions We want this to be as interactive certainly as possible So even feel free during our presentation to ask us questions If there are things that you're wondering about or if you're looking for some clarity as well So my name is Megan Rosetti and I am with the open stack operations team at Walmart. I joined open stack Back in the Juneau release cycle and I have been working on the program management side of the operations team I'm Andrew Mitry. I'm currently one of the leads on the Walmart's cloud team I've been doing cloud for large companies for about four years Both at CSC on a Department of State contract as well as a Comcast and now at Walmart And I think kind of the premise for this talk or the genesis of this talk is you know as we built out private cloud You know usually private cloud starts with a small Technology-focused team engineering let's get a solution out right But as the cloud starts to scale and grow We started to find that it's not just about you know Having a bunch of engineers thrown into a team and saying go deploy cloud They're there an ecosystem starts to grow in the larger cloud environments And so we wanted to share a little bit today about those lessons learned and and how we scaled the team and what What type of people we brought into the team to be able to service our internal customers? And you know, I had an awesome boss previously that was I used to say hey We need to run our private cloud run open stack as a business within the company And so that was kind of the premise and the genesis for this talk So, you know speaking of you know our last gig at Comcast, you know when I started there the team was about Four people so we started off with four engineers and the manager was me at the time and saying let's let's let's knock this out Let's build private cloud out at Comcast So within about three years we glued that to over 30 people And a lot of different types so we're going to talk about some of the growth considerations as we grew that out at Comcast So with the growth considerations as Andrew was talking about it's not just engineering It's not just throwing more engineers or developers or architects at the issue What we ended up looking at doing was really focusing on big picture. This is evolving it revolves very quickly Typically, you're not tripling your team growth within just a few years So with that comes a lot of consideration You need the business support to go along with that engineering team along with that project management Certainly plays a key factor You also need to look at internal evangelism as the cloud is growing more and more you need to have Those people we ended up having a few people actually that would literally travel to talk about with customers looking at big projects moving on to the cloud making sure that applications are cloud ready from the start and Not just having something come on board Find that maybe that wasn't the best solution and then trying to troubleshoot as as it's in production as it's going along And then also management grows as well You look at your team growth Now do your teams need to be realigned? Do they need to be repurposed refocused and all of that tends to happen very quickly as well so You know as we we started to start off doing things at Comcast one of the things we found out quickly was like You know cloud for a lot of people within an existing organization is challenging hard to understand And so how can we provide the best level of support? And so one of the things that we we stood up that kind of was a natural fit was an internal IRC back then and said hey You know We're gonna staff this with some of the best of our team come join us We can talk about best practices and it's a public forum, right? It's a group chat and so people are even a lot of time just sitting in there learning about what other people are talking about over time we migrated that to slack but you know The idea of having this group place where everybody can land and have real-time conversation was a huge win But one of the things that we actually found over time was that we also needed to scale that with people, right? There was so many questions and so many conversations happening, right? That we started to actually have to assign people to a position of like hey for these next few days or this time frame You're gonna be quote-unquote, you know responsible for conversations or making sure connections are happening there One of the interesting things that we also did is recently We joined not just like support discussions, but like also strategy discussions and technical discussions all in that same room So it becomes like kind of a learning atmosphere As we started to scale that out we also realized the need for internal events and Internal cloud summits and calls and learning and so we ended up onboarding our internal evangelists, right resource kind of like You know Mark Collier made the reference on Tuesday about how Accenture had a call about cloud at 9,000 joint now We never had 9,000 joined but that's pretty impressive But we did have hundreds joined when we would do a call or we were doing internal summit They were one of the most popular events internally And we would have we would highlight some of our users actually our big data team that spoke earlier today here You know was one of the key users about how to use the cloud and they would give examples and I think that helped a lot with Adoption with income gas in terms of onboarding those workloads and making sure those workloads were successful from the get Instead of trying to troubleshoot further on. Yeah So we hit the next line Yes evolution of a production team from technical to there we go it didn't yeah growth considerations You want to handle that certainly As your cloud become successful and this is Quite frankly a very very good problem to have but one that you need to solve very quickly is Capacity and deployment as your cloud is growing you find as it is becoming more and more popular People want to be on it more troubles very quickly in the company as to how successful and how much this platform meets the needs needs of a Variety of projects that doesn't just fit one particular type of project. It's pretty well wide-ranged. So with that you have to Continuously look at where's your capacity? You want to make certain that you leave enough room? You never want to bolt build out a hundred percent on a platform You leave about 20 percent to make certain that as people are Rolling applications on as they're spending VMs up Taking VMs down that there is that room for them to be able to do that But along with that you need a deployment schedule and you need to be able to It's tough to do but you really have to look at trend analysis You have to really plan out some of the bigger projects that you have coming on You talk to your customers about what are their growth plans? What are they looking for in a year in six months? And you need to work out a deployment schedule along with that That also comes into hand with upgrades That's something you're you're looking at not just version upgrades, but also security updates as well So your overall health of the platform of which you have Literally a rolling schedule of what is going to fit the customer needs without being Intrusive to the different applications that you have running So that brings up a good point one of the things that we ended up doing is so we had hundreds of internal customers and And it got to the point where hey, you know We can't do this ad hoc and even scaling a ticket system to meet those needs was a challenge So we ended up standing a CRM and and actually hiring somebody kind of like a sales sales engineer type person to say like Start managing the conversation with these hundreds of customers asking about where their roadmap where their capacity planning is Tracking the conversations because the team as it scales like conversations were getting lost and hey I talked to this person what not and so that's where this kind of idea of like we need to evolve as a business, right? You know, maybe a lot of startups start off with you know Just a couple of technical people but as they grow they need these types of resources And so we actually found that we needed those similar resources, you know We need an evangelism. We needed somebody to help with the reporting and analytics and all that type of you know Communication out to the customer and out to leadership within the organization One of the other challenges was as we matured, you know, everybody was asking well how much is this cost, right? You know initially it was like hey, it's cloud, you know, like come get the cloud come get you hooked It's like, you know, here's the Kool-Aid, you know, you know as free just come get it But as you scale and especially with larger projects, you know, everybody's like how much does it cost? How much does it cost? We ended up deploying salameter. We ended up using some showback tools to actually generate a bill For our customers so they could see how much Now of course like any large organization getting to a point of charge back It's quite a challenge. It's probably years out for for most organizations But at least for larger projects to come back and say like yeah a petabyte of storage is gonna cost as much 1,000 VMs of you know XL VMs is gonna cost this much yada yada You could get a big picture of what your cost was right and then maybe some trading of budgets We actually found though here's the challenge right is while we could estimate how much this structure this infrastructure cost Right, we might perform at slightly different levels when comparing it to a VM more or a public cloud levels So that's where we came up, you know looking at what is the total cost, right? Including both the labor the infrastructure whatnot and How well we run a specific workload, right? Because we might run those workloads better on OpenStack than you would on a public cloud less oversubscribed, right? Well that can greatly affect your TCO So those are type of things that we started to have to model out And we actually built a team that would take all that data actually simulate a workload like you know One of the key workloads there was running the X1 setup box We'd actually simulate that workload across different environments or look at the production workloads What's it take to serve up that number of customers, right? And then tied that all the way back down to infrastructure cost Well, that is staffing as he starts to grow out everybody wants to know what's my TCO to run this workload on that cloud versus on Public or on a different private cloud? And so that was another growth area within the team. How can we model that? How can we do that? Certainly a big topic of discussion with any type of team growth and especially with the rapid expansion of cloud adoption has been being able to sustain team growth and By that it's not just hiring again. It's not just hiring engineers or developers or managers It's looking at how do we do this and keep the cohesion of a team? How do we keep? The information where it's been which is amongst the team instead of getting scattered or pulled in many different directions And along with that you can look at additional tools adoption Maybe further automation in which relieves some of the stresses of the routine tasks Maybe those should be automated Maybe there's another way to look at some of these so you're constantly reevaluating your workload And then you're looking at building out a team as well and very quickly So you want to try and ensure that the team is on board. It's going to be rapid growth You're bringing in a lot of new people and you're really looking for that team involvement You want to bring anybody new on board? You want to make sure that the people there are? behind them supportive of that new team member and the way that we went about this was through a very team-oriented hiring process all team members reviewed any and all resumes that came through Team members were involved in phone screens. We didn't have the entire team So we weren't having a phone screen with you know 15 or 20 people But we'd select at least two to three team members to be on those phone screens to talk to those individuals And then the feedback went back to the entire team again There was discussion. What's the next step? Do we bring this person for an in-person? If they came in then they met with the entire team and granted that can be very overwhelming for someone coming in to interview But it also keeps the entire team on board and if somebody wasn't due to for Any reason if somebody wasn't able to attend one of the interviews they could put out to the team Make sure that you ask this question There's this follow-up. So again, there was that ongoing discussion And then afterwards the entire team made the decision Do we move forward do we bring this person on and that wasn't the end of it when somebody came onto the team? they were involved in a full team training meaning that there is a schedule put together for training individuals and It was Rotated throughout the entire team. So it wasn't these same individuals always training on the same subjects It was rotated through and we always made sure that newer team members They trained as well. You're not skipping over anybody in the team because again, it's a it's a process and what we really found is Through this and through this complete transparency with the team that there was buy-in for somebody new starting there was not a time when somebody showed up and Nobody knew who they were or where they should be or what they might be working on or what background they had or If they were in the right department even when somebody came on board They had a full layout of okay. Here we go. This is what we're going to tackle The other part of having a team schedule is especially in an operations environment You're dealing with a lot of interrupt driven work and it almost never fails that you're going to have something unforeseen happen You'll have something You know never an outage never an incident Never, but it allows room to the schedules posted if you need to you you can shuffle things out well And this was something that we found actually to be very successful In doubling at that time doubling our team And then you take it another step further you look at the entire training process with the team I know I think different companies have different names for this Personal development days. I don't know how many companies may or may not do that But as in taking an eight-hour time period once every couple of weeks and allowing team members to Spend that time on Something relevant to the cloud team, but it doesn't have to be something that was for instance, somebody wasn't taking eight hours to Study the ins and outs of salameter that they've already been there already is me in that field But somebody taking a day to learn more about Ansible if they don't have that experience Or other features and services We know that there are always new projects coming out being able to take that time Really dive into them to be able to evaluate us is something that we want to bring on is to something that we do Want to offer to customers giving that time? We also found that rotating schedules rotating on-call rotating customer support and Whether that's with customer tickets or as Andrew was referring to the customer support channels as well Trying very diligently not to have the same person Always answering are always responding And to be very honest. We also found that by setting a schedule We always had somebody we knew there was coverage. There wasn't a question about Okay, maybe maybe I'll just be quiet for five minutes and see who else picks it up There was that accountability Which was very important to be able to move everybody through as well and then training as a whole Training is huge You have new training and you have ongoing training New training certainly with somebody coming into the team and then also with different services and features that are coming out So deciding okay, maybe somebody who wants to be a smear more in this field You send some people to that type of training and then they bring it back to the team in the form of a brown bag Or in the form of shadowing Especially on an operation side and With engineering as well definitely we tried to do a lot of cross training. So if we have somebody and operations who is More into well not more but it was also interested in the storage side Setting up time for those people to work together And we found that that cross training Really helped just throughout builds throughout future planning and if there had ever been any outages or anything along those lines And then also with Training as far as Making sure that the newer individuals are also involved in training as well You you certainly go through kind of another level of understanding when you have to turn around and Make certain that somebody you're working with then understands that there's sort of a whole nother whole nother level To ensuring that you understand it yourself as well So the other opportunity that we had was As as more customers were coming on the call they were asking for deeper dives around How do we on board on the cloud and how do we become cloud native and so we started to put together resources that would actually focus around You know loosely what we called sales engineering or consulting or enablement You know that could actually travel out and meet with these teams and spend you know days saying hey This is what what your application should look like this is how you should do it or these are what we've seen as best practices and what works What was interesting is we'd also take that information back and say like well these are the things that we need to do on cloud to better support these workloads and whether it be from a performance or a feature perspective and That would turn into the backlog that we had on the team about delivering out to cloud So, you know as part of this story So, you know a Megan and I were at Comcast You're probably wondering why we're talking about Comcast wearing Walmart vests We just recently joined Walmart and we're hoping We're excited about a challenge to do something similar for for all Walmart or the rest of the Walmart team and Would love for any of you to join us so paying us, you know, we're hiring You know, it's an exciting challenge at the fortune one company. They're the largest company in the world And and so we're excited to kind of hopefully take it to the next level there So what we'd like to do is really take the time and start answering some questions and really dive into maybe some areas that you're wondering about or even some some problems that you've experienced in trying to build out more as a business and There is a mic up here. I don't know if it's gonna be easier to try and pass it around or We can repeat this one So the question is who were the customers that were not suitable and why not For for cloud who were who wasn't cloud native and why not? And I think that's a great question And I would almost challenge to say from an application or so I'd almost everybody has a part of their application That might be cloud native, but maybe not the whole stack so one of the approaches that I thought we took was very successful as You know the the the leadership did a great job saying hey cloud first think cloud like go cloud native right and people would come say well my my my technology stack will never run on cloud, right and There might be parts of that which is true But you know almost everybody's got a caching tier web tier an app some kind of tier within that application That is a decent fit for moving to cloud, right? It maybe isn't persistent it can be slightly Re-architected to work in cloud and so what we would work with teams be we're to identify those things and say like hey We're looking at the residential email platform, and yeah, you know what caching can work on cloud Just fine, and they actually needed to add a ton of it, right? Why don't you instead of buying a bunch more physical boxes to do that try doing it on cloud? Hey, it worked Let's scale that up. They start to scale it up Well, what about moving over this piece that we need to right? If we just do one little change here now We can move that part of our stack over maybe the core doesn't move But all the other parts of their stack move and then that team when they're starting to look at what's the next gen platform? Right because they want to they want to update what they're delivering right they already understand called concepts because they've deployed You know one third of their stack on cloud successfully, right? They've managed it. They know how to operate it They know how to monitor it they know how to do all those things and they're like well when they go to the vendor or they Decide to design it in-house or however. They want to approach that problem They're like well we need something that fits the cloud native model and no longer needs somebody else from outside to come and say like This is how you go cloud native they get it right and because they see those benefits of cloud So yeah, I mean to be honest what we weren't onboarding like Oracle rack workloads on the cloud, right? We were looking for things that were cloud native Excellent question what so the question was was this cloud team formed with an existing it or was it outside existing it? Initially this team was formed within a group called product engineering at the time and it was a group that built products And it was probably outside what you would define as existing it And it was chartered with being able to deliver infrastructure faster for those products Over time the success of that team enabled it to be kind of unified and become the de facto platform across all Comcasts But still operated under a group at the time called platform technologies And and the idea being this has become the de facto platform that we deliver our product services and even some of the back office it on top of Excellent question So the question was regarding TCO. How scientific were we about it? How much in detail? You know, how did that affect the output and comparing with you know other cloud options? so TCO can be incredibly complex and I think our challenge actually might have been not being too scientific because once we go down that route It might take months to get an answer versus Trying to estimate some of those I would say most of our challenges actually in calculating cost We're around what parts of labor do we include? You know and separating those things out like if there's a team working on new features But they're not being necessarily leveraged Do we count that as part of the TCO things like that because labor is obviously a very big part of the cost delivery for cloud I think though over time that we iterated on that model trying to be accurate So I mean we counted everything from data center space to network to labor Trying to bucketize them in the right categories. We were monitoring measuring actual workloads deployed across those different clouds I mean, I don't think it was precise to the end degree You know, but it was a pretty good model But it did take a significant amount of resources to bake that model and get it out there But I think it was worthwhile because in the end it did show that the investment was worth it So I don't know that I can say numbers, but you know, it did show that it was a worthwhile investment No, it was cheaper that much I can say. Yeah. Yeah, it was cheaper. I think I could say that safely Yeah, significantly cheaper any other questions. Well, thank you everybody If you guys have direct questions, we'll be available here