 Hello. Welcome, everybody. Today, we're giving a talk on a paradigm shift, leveraging private cloud to encourage scale and resiliency at the app layer. I'm Andrew Mitry at Walmart. This is a. I'm Shreya Dharabasin. I'm also from Walmart. And Rick Malik also from Walmart. Thank you all for being here. So we're going to kind of dive right in. And then we're going to leave about 10 to 15 minutes at the end for questions. And we can go deep after that. And we'll be available afterwards as well. So this is, I think, designed to be a little bit more inter-level in terms of how we've, in the past, at Walmart and also Shreya and I come from Comcast. Previously, I've built out applications to scale on top of cloud. So we're going to talk about a few of those use cases today with permission from Comcast and as well, Walmart. And so we wanted to kind of start off with a few definitions, some basics, just for those who are new about some of the paradigms that we encourage the applications as they onboard on the cloud. And one of the first things is the idea of horizontal scale out. We want, as load increases, that you add additional units and have them act in concert. We want to be able to split those workloads across those units. We want to offer a promise of linear, infinite scale. And we want to start. So this is one of the things that was interesting that we did quite a bit at Comcast is we, instead of calling to an application, let's say like a residential email team and saying, let's move over to the entire residential email platform, why don't you start with moving over your app, maybe your web, maybe some caching tiers, things that work well in the cloud paradigms. And that gets those teams, those DevOps teams or whatever types of teams, familiar with how to use cloud and how to understand that paradigm so that as they undergo transformation in their own application, they guide and lead those things in the right direction for the current parts of those apps that aren't cloud-friendly or cloud-native today. And that actually worked really well as a model in onboarding new applications. It's not take on the whole world at once, and we've got to fix 5 million things and redesign all these things. Let's start with what works, move that piece over, and then chip away at the application. One of the other things that we found early on at Comcast is that having a scalable commodity block and object store was a key to our effort. And I think we do have a seftalk on Thursday, kind of going into some of that as well. Another key tenant in the paradigm, you want to go over elasticity? One of the things that we require tenants is to be able to grow and scale their environments as the load grows. So this required that the resiliency be shifted to the app versus the infrastructure being able to provide that same sort of resiliency. Some apps were, by inherently, better suited for this, and other apps had to be worked on to get it up there. The elasticity also meant that the app layer had to be able to scale as demand for the service grew. And they had to do it dynamically so that you're not wasting infrastructure holding onto a big footprint and then using it only once in a while. Andrew already covered the promise of linear in a scale. The other thing we drove our app teams to think about is failure of the infrastructure itself and to move some of that resiliency into the app tier. So if you're in the cloud, then you need it to be able to support running your service even when an entire region or an open stack data center went down. And this kind of drove the principle of plan for failure. So teams were constantly required to think about how they would react when the infrastructure was in that resilient in the first place. Yeah, and some of the things that we did in that space is we did do war gaming where we would simulate data center and network failures. And we also had a different support model for cloud applications. So for example, we actually losing a hypervisor node was not a pageable event in our cloud operations. And as an SLA, we gave that up to our application owner saying you should be able to survive losing a hypervisor or two. And actually in our example, this did happen during peak. And the application actually came back and noted it as a fault that they didn't even realize that they lost it. The application self-healed and scaled out more VMs and was able to continue operating. So maybe we'll dive into the case study a little bit. So at Comcast, we had a group called X1 Apps. And they were chartered at the beginning of 2014 to build out an app that would stream all the various video feeds for the Winter Olympics. And the Winter Olympics were in February of 2014. They got this charter at the beginning of January. So they had basically a month to not only build the app, but scale the infrastructure. And so they were able to go ahead and deploy an app in a month. They were able to scale up based linearly on the demand from the users for the Olympics and then kill that infrastructure over a period of time. Part of the way they did this is they built kind of an internal pass tool leveraging J-Clouds to orchestrate OpenStack and was then extended to bring elasticity to VMR2. That's how some of the stuff that they had done at Comcast. But I think one of the most important things in a previous keynote from Disney, the guy there said, nowadays everything has to be fast, fast, fast. Time means money. And being able to make that infrastructure available to our application so that they can deploy elastically was just key. And as I mentioned earlier, they had deployed across all these VMs and had set up, they were using actually various load balancing techniques so that they could lose any one of those VMs. And then a zookeeper would detect that they had lost that VM and spin up a new VM somewhere else. So this team already was deploying into AWS. And they loved getting API access to OpenStack and kind of like took that immediately. Most of their app workloads were the front-end web tier and some caching tier. They built J-Clouds-based orchestration tool to work with our OpenStack APIs and also AWS and VMware. And with that, we'll hand it off to Rick to talk about the next case study. Let's shuffle chairs. Sure. Oh. Good morning, everybody. How do you like our vests? Who's here from the United States? So you know the vest. But those who aren't here from the United States, we wanted to sort of give the complete Walmart experience. We actually like to work pretty hard to get a hold of these vests. So if you work in Walmart and IT, you don't have to wear the vest. So this is just for fun. So maybe we should give a little context on what Walmart is for those who aren't. Yeah, I've got a slide for that. I'll talk at the end a little bit about the company. But for right now, I wanted to talk about moving applications that are not cloud native onto a virtualized environment or onto a cloud. So how many of us have moved applications that are your legacy traditional and to your applications onto OpenStack or just a couple? Three, four, five? OK, good. Because I was worried that would I be too technical or too general or what have you. So it sounds like we're all pretty new to this. And I wanted to share with you our experience moving a couple of multi-billion dollar sites that are not cloud native into OpenStack. And sort of what we learned by that and why we did that. So Walmart.com in the US is nearly 100% cloud native. And the way that happened was as we were working on building out our cloud capabilities, the compute nodes, the management nodes, all of the OpenStack infrastructure, all of that, the data centers. The development teams in parallel were working on rewriting the site from top to bottom to be a service-oriented architecture approach to the way they develop the websites, which is different than the way they develop in the past. So this was a new experience for them. It took a lot of time. Took a couple of years for them to do the rewrite. We did, like a lot of other companies, I think we tried several things with cloud first. We tried our own APIs. We were going to have the Walmart API. Tried to do it ourselves. Looked at a couple of vendor products. One from Microsoft, I believe, and from another vendor as well. Finally, at that time, OpenStack was really starting to gain some traction around the Grizzly timeframe, Havana Grizzly timeframe. So we acquired a company that I'll talk about later called OneOps that had a tight integration into OpenStack. So that's how we landed on OpenStack. So that's what we were doing while the application teams were rewriting their applications to be service-oriented architectures. But what's true for .com in the US is not true across Walmart Enterprise. It's a 50-year-old company. There's a lot of best-in-class solutions from their day deployed into the data centers. And we have it on our roadmap going forward to try to turn the ship and work with the application teams to get more apps rewritten to be cloud native and try to see how we can go faster. So we were really pleased to have the folks from the Comcast team come over and help us to do that. And I think we're well positioned for next year. So transformation is really, in my view, sort of an iterative process. It's not a black and white. And I think Andrew alluded that, too, with some of your experiences at Comcast. You start with what you can and then move forward from there. So as in nature, I said migration is a journey. And I think we'll all experience that. So I want to talk about a business, a hypothetical business, but it's based on a real incident, a real business. If you're late 40s, like me, and you grew up in the United States, you grew up watching the Bugs Bunny Roadrunner Hour on Saturday morning. And as Wiley Coyote was trying to get the Roadrunner, he would order various products from Acme Company, Acme. So in the US, Acme has come to mean just a generic term for a company. And so I wanted to give you that context. Acme's a pretend company we're going to talk about. But its story is based on real life. So they had a five-year-old ATG e-commerce platform for its members. It was running on bare metal hardware, Solaris Spark. What they wanted to do then was to migrate 35 applications that made up this website. They, again, enter monolithic enterprise applications to Red Hat Enterprise Linux on x86, put it on a private cloud and start to become cloud native. They had some business goals. They're the typical business goals of smaller, faster, cheaper, better. But 25% increase in operational efficiency. So in other words, they wanted to provision in minutes or hours versus weeks or months. They wanted a 50% increase in site performance. So that can include things like page response times, as well as checkout. And a 75% decrease in hardware costs and data center costs. So the footprint of Solaris Spark was quite large in the old data centers. And moving to cloud opened up other opportunities for either using that space for something else or those servers for something else. There were a couple of constraints that the architects had starting out. One, we have an approach for the company's private cloud that VMs are ephemeral and short-lived and can be easily thrown away and replaced without any degradation or harm to the business. Block storage was also something that was not available to ACME. They didn't have the funds or choose at this time to invest in it. So those were two constraints on the architects starting out. So their approach was to fully leverage OpenStack, along with the tool called OneOps, which I'll talk about in a minute, to transition and start transforming to cloud native. So they created a new VLAN in each data center for the ACME eCommerce site. They put every application behind a VIP, as if it were a service. There was no host-to-host communication at all. They had a policy in the company that every application had to be deployed into two clusters, or what we call them regions, two cloud regions and two data centers. So I did a little bit of math and worked out that eight VMs would be the smallest application footprint. If they had two VMs in two cloud clusters or regions and two data centers, so 2468, we found that as a best practice, it's slightly better to over-provision non-native applications in your private cloud to try to begin to force the problem of infrastructure stability up higher into the application stack. So they ran into a couple of challenges, of course. Apps that required, these are old legacy apps, apps that required only a single instance. Those were eliminated right off the bat. They didn't even try to put those onto open stack. And there were two out of the 3035 applications that required a single instance. What the team found was, though, that if they had decided to invest in block storage, one of those apps could have migrated, maybe doing something like running it in an active, passive type of configuration and then maybe leveraging dedicated hypervisors using host aggregates in open stack. They didn't explore that, but that was such. That was an option that they realized sort of after the fact in hindsight. Their e-store application used sticky sessions on the local VMs. And by that, the impact of that is that if a VM dies when a customer is browsing or checking out, they're obviously going to lose their session and they're going to have to re-log in again. The business accepted that risk for the short term, but that's obviously pointing to another best practice that we can think about. And that is the developers must move session management outside of local VMs because they're ephemeral. Some sort of key value store or other solution outside of the ephemeral VM. So the way I look at it, I spoke about it as being an incremental journey. Crawl before you walk, that kind of thing. So what did they accomplish? So we got to celebrate some wins by getting these sites onto the cloud. So they migrated from Solaris to Linux. They migrated from physical to virtual. By leveraging OpenStack, they have a self-service agile infrastructure. By leveraging one ops, they have a cloud platform with application and service abstraction. The development team is making fewer assumptions about their deployment environments, which is what we want them to do. We don't want them to always count on server with IP address, blah, blah, blah. Always going to be available to them and for them. So we need them to start thinking differently. The team started thinking, obviously, more in terms of elastic and ephemeral VMs. Historically, last year, during holiday, Akame's e-commerce site was not scalable at all. They went into holiday, locked and loaded with their best estimates and what they thought they would need and how they would perform. But now, being on the cloud, they do have horizontal scalability as well as they're seeing a 4x improvement in site stability versus being on the old Solaris Spark physical infrastructure. So they know that they've got to work now to start decomposing their site into services which can be deployed and scaled independently. So one way to do that is to maybe sort of freeze the code base as it exists today and start building microservices sort of around the edges of this monolith, if you will. They're thinking about design patterns to decouple the new from the old, creating API contracts that make the legacy stuff look like microservices to that which they're writing. And over some time, I think what they'll see is that they've totally surrounded this legacy monolithic code base with something that can be retired at some point. So the process of rewriting just can't happen overnight. The business still has to continue moving forward and meeting business goals and objectives. This is one way that some have approached to do it. And I think we're acme companies on course to do a similar thing. There are a lot of cultural things that also have to happen. I've spoken about a couple of them. But another one is we need teams to align to a DevOps culture. Cross-functional teams that own the product from development all the way through operations. This is how the company is trying to align their development teams. And I think all of these things together, the acme's now on a course that they're beginning to address these. And these things, plus some others, are going to get them to cloud native. But you've got to start off small and continue to iterate. Now I'll talk a little bit about one-ups. Has anybody heard of one-ups? Not very many. So one-ups is part of our platform as a service offering at Walmart. It sits on top of OpenStack. Currently today we have 3,000 developers at Walmart leveraging one-ups, deploying 30,000 new or updated services per month. And around 3,500 applications are hosted within one-ups. So that's pretty impressive. What the company would like to do now that it's reached this point in maturity is to release this to the open source community. Because we've seen success in the Walmart enterprise, and we feel that there's an opportunity for success in other enterprises as well. Having this out in open source will be not only the right thing to do, but it'll be a good thing to do for the community. So what does one-ups do? So it delivers continuous cloud-based application lifestyles and empowers the enterprise to take on new projects and drive growth. So it's collaborative and visual. It's model-driven. It's a library of best practices. It's cloud platform abstraction. It's self-service agile infrastructure. It's a platform for rapid, repeatable, consistent provisioning of application environments and backing services as well. So it really enables a continuous lifecycle of management of complex business critical application workloads on any cloud-based infrastructure. But of course, we're concerned about OpenStack here. So as I mentioned before, it's logically placed at Walmart. It's logically placed in the PaaS layer in the same way that OpenStack is logically placed in our IS layer. And because of that, one-ups abstracts both our platform as a service offerings and our infrastructure as a service offerings for our developers. Therefore, one-ups is platform. So in terms of lifecycle, you define application workloads based on architectural and application requirements. You provision environments by mapping the design output against operational requirements. You then monitor and control those environments to maintain the required operational levels. So one-ups will help the application to horizontally scale, contract, replace VMs that die for some reason and need to be replaced. So it's really that kind of tool and more in that it also helps the business to begin working in a consistent way. In other words, you don't have developers going to the horizon dashboard doing things one way. You don't have developers with direct access to hypervisors or VMs and going and deploying their own environments and then pushing code manually or something like that. You're trying to get the whole company to follow a consistent process and that's where one-ups can help, workloads. So one-ups provides design catalogs for applications. So you can create designs, custom designs, and then save them in a private catalog. You can share them across your enterprise to have architectural consistency. And then going forward, you can share them with the open source community. One-ups provides operational best practices for many platforms, including relational databases, no-SQL databases, messaging systems, and others. You can create your own custom packs for operational best practices. Again, share them across your enterprise or share them with the open source community. And one-ups provides a library of components. These components, they encapsulate lifecycle management for many infrastructure resources, not only servers and storage, but also software artifacts, such as OS packages, repositories, and many others. You can create and package custom components. You can integrate with many cloud services. And you can share custom components with the open source community. So that's really the promise here, and that's where we're hoping to see this tool sort of grow and find its place in the community. Portability, out of the box, it's going to support three cloud platforms, obviously OpenStack and OpenStack Cloud Providers, Azure, AWS, and then we'll see how it's extended. I don't recognize this slide. So holiday, 2015, so by the end of the year, they want to have this out into the hands of the open source community. You can go to one-ups.com. Keep an eye on the blog posting there if you're interested in learning more about it and checking it out when it's available. So a little bit about Walmart. Walmart, there's many divisions around the world, many subsidiaries. SEIU group here in Japan is Walmart in Japan. Vital statistics, huge retail footprint, 11,500 stores worldwide, selling a wide range of merchandise, focused most recently on grocery home shopping, revenues of $486 billion. And 2.2 million employees. Now I say all that to say this, and this last slide will be my final slide. There's room for 2,200,001 associate, namely, we're hiring and we're looking for people just like you. And who here has an easy time hiring and finding OpenStack resources? So then you know why I'm putting up this slide. I have to take advantage of this time. So huge company, challenging problems, exciting problems to solve for next year. If you're interested, talk to any one of us. We'll go on to our Careers Portal and search for cloud or search for OpenStack. You'll find jobs that way, or you can reach out to any of us and we'll help you through the process. But we're always looking for talent just like is at this conference. So we're going to be the first to deliver a seamless shopping experience at scale for 260 million customers around the world. So we are going to integrate the physical stores with the e-commerce presences with mobile and give the customer a seamless shopping experience that saves their time as well as taking advantage of Walmart's everyday low prices. And private cloud, honestly, is integral to all of that. And for that reason, we're looking for people just like you. So with that being said, there's some contact information. We do have the presentation posted under, right now, under the event, you know, the event, what do they call it? In sked.org, yeah. In sked.org, we have posted. I'm assuming shortly after the presentation of the video will be uploaded to YouTube. I'll put a link to the presentation there as well to help you. And then feel free to reach out to any of us at any time if you have any questions or would like to explore any other topic. So thank you very much for your time this morning. We'll open up for questions. Open up for questions. And we'll be available outside afterwards, too, for questions. Sorry. No. The concept is what we did was we created ourselves that which we spoke of for our developers for our private cloud, which is open stack based. Because this was a company that we acquired, they already had hooks into AWS. As we were getting closer to, we knew we wanted to open source this product at some point. As they were ramping up to get ready for the open sourcing of it, they added the hooks in for Azure. So that's sort of how it progressed. But really internally for our private cloud, it's open stack. I would think that's a good analogy, actually. Because everybody, I mentioned we've got 3,000 developers leveraging this process for 3,500 applications, 30,000 times a month. So I would think that's a good analogy. If that's the culture and that's the way the process works and flows, it further sort of minimizes the chance of shadow IT. I think that's a good point. Another question in the back. That's a good question. I don't know that I have a good answer because I don't know that I personally am that close to it. I can talk to it from the Comcast side a little bit. It has been a culture shift on the Comcast side. Teams are, I think, adopting wider skill sets in terms of being able to support. There needs to be a, the developers have to have a deeper understanding of what the stack looks like. I think, of course, there's different attitudes towards that culture shift, but I think in the end they find it empowering because there's nothing more frustrating than not being able to deploy what you've developed on a reasonable time frame. And I think the wins that they've created like encourage the developers to be able to take some of that ownership on. I think it also encourages them to code for scale and for failure and things like that so that they're not getting paged in the middle of the night, right? And of course, we do experiment some with hybrid models where there's still an outside tier one that does some of the basic monitoring before escalating, things like that, so they're not necessarily getting paid for every event. But it is a culture change within the company. I wanted to mention one more quick thing before I let you guys go. There's this book and it's only 50 pages. It's called Migrating to Cloud Native Application Architectures and it's free from Pivotal if they're here. I don't know if they're here. Matt Stein, I don't know if Matt Stein's here. I didn't ask him if I could plug his book but I read it a couple of weeks ago and literally every page that he's got in this book I'm like either we're doing that or we're talking about doing that. And it goes really deep. So I would recommend this to anybody who's in this session thinking about these things. Migrating to Cloud Native Application Architectures and the author's name is Matt Stein. Different VMs are using containers to wear different services. Well there's another level of abstraction in OneOps. They're called assemblies, application assemblies. So each service becomes an assembly in OneOps that can be deployed independently of others. So you have really a continual deployment pipeline happening at that point if you've got that many services, 3,500 in the system that can be always 30,000 times a month being updated or new services added. So an app has let's say 10 assemblies and each of them is a VM like the manifestation of runtime manifestation. Probably multiple, each of them is multiple VMs. Multiple VMs. The assembly is a combination of resources not just VM. You can define like this is the, think of it as packs for what software goes on the VM, what load balancer you want to use, which places you want to deploy to. So there's a lot of metadata which is there. The end deployment running itself right now is on VMs. But it's not necessarily tied to be VMs. So OneOps could support different deployment models, possibly containers or bare metal in the future as well. Which could be managed by OpenStack, right? That's right. Yeah. So we are actually actively exploring those options today. I actually hear two things. You kind of use like a microservice approach in which you abstract the failure to the one layer below. And you also do this for applications itself where self-healing capabilities are built into the application to support that. Yeah, I might have misspoke earlier. I might have said microservices when I meant to say services. So what made you, what do you think is the true benefit of having the self-healing capabilities added to the service itself as opposed to having it on a cluster-based technology such as mazes or something like that? Yeah. Well, that interestingly is covered in the book. So I imagine there's probably lots of different points of view on that. But the idea that you push as much to the client as possible, which includes things like low-balancing, horizontal scaling, that seems to be the part of the definition of cloud native, at least as far as I understood it in the books that I've read. So to me, that means microservices. I mean, that means containers and other things. But, you know, that's just, this is my opinion. I mean, I think one of the things that we get with self-healing, right, in the application layer that we were seeing before is the application developer is the one that has the best understanding of how to, you know, and manage those sessions or other clients or whatnot. And once you start to push that on us further down in the stack, the less understanding that the infrastructure can have and deal with that type of failure. So we wanna give the intelligence on how to deal with resiliency or elasticity or scale as far up the stack as possible. Great. Yes. This was, Walmart, was this a very kind of top-down driven approach to get all those silos working together? Yes. So based on that experience, I would say that you need the CIO to help push this through with his VPs and things, you know, or her VPs. Because it's a massive cultural shift. And, you know, workloads that were in infrastructure are now being, you know, responsibilities are being pushed up into the application layer. And teams are having to think about how to code differently in order to be successful with Cloud. And teams are having to take on ownership of things that they've not had to take on ownership for in the past by aligning their services to business capabilities and then really owning those capabilities from top to bottom. So there's a lot of change happening. And, you know, we're not 100% there yet ourselves, but I think we're on a path that's gonna align with Cloud-native application architectures. All right, any other questions? Thank you guys very much. Yeah, thank you. Available after. It's been fun. Thank you.