 OK, we need to present our little experiments with orchestration in the OpenStack Cloud. We'll get to sort of the technical problem statement and stuff in a little bit, but to begin with, for those of you who may, OK, we'll go the old fashioned. We'll just press buttons on the keyboard here. For those of you who may not know about PayPal, we are a publicly traded company. We were a part of eBay until mid-last year, and we split away. Second quarter, 2016 results, $2.5 billion in revenue. This quarter, I believe, the results just came out last week. We're a little above that. The 29 transactions per account went up to 30 transactions per account, which is a little low. So I encourage all of you to use PayPal. Braintree is another, so you can see all the PayPal properties. Braintree, Venmo, Zoom, these are all acquired by PayPal at some point. Integral part of PayPal business now. We have 188 million active customer accounts in Q2. In Q3, that went up to $192 million, a bunch of things. But essentially, the company's strong, big, growing, healthy. I think the stock went up by like 10% in the last, after the earnings report as well. Going to the infrastructure, right? So we have thousands of engineers over tens of millions of Java nodes, CC++ code. We do over 1,000 releases every year, thousands of deploys literally every day. Our cloud is, for those of you who haven't seen a PayPal presentation before, we deployed in three different regions, all in the Western United States, Salt Lake City, Phoenix, and Las Vegas. We have 10-plus availability zones distributed across those three regions. We distribute those availability zones into what we call the production cloud and what we call the Dev QA cloud. The Dev QA is where practically all developers directly interact with OpenStack, with Horizon or CLI or API. They'll create their VMs, create the block volumes, right into Swift. The production zones is where it's strictly controlled. The pass layer or a release engineering team will deploy applications in there. Within the production zones, you have sort of an active, passive deployment system. Right now, we are moving towards an active-active. So you have applications deployed across availability zones and across regions for maximum availability. Within those three regions and the availability zones that we have, we have close to 500k cores deployed, 10,000-plus physical servers, 100k-plus VMs. I think the number was closer to 150k the last time I checked. Close to 10 petabytes of storage that we have in the system. I think we have close to somewhere around 2,000 to 3,000 PayPal applications that are all running in our private cloud, not a public cloud. The key thing is our web tier and mid-tier applications are practically all running in our open-stack cloud. The data tier is still outside of cloud. I made a presentation at New York in the open-stack days east. We are looking to start moving our Kafka Elastic Search, NoSQL databases, into the cloud starting in 2017. And I think as we make more progress, maybe in the next summit in Boston, we'll talk more about that. Sort of a quick overview of the journey that we've had in 2012 or so. We started with 16 servers. These were DECOM servers that IT was going to throw away. We said, hey, can we have those? We played around with it, deployed open-stack SX. This was purely a POC. And by the time we were done in 2013, we were actually taking 40% of our holiday traffic on an open-stack cloud with our web and API tier deployed there. In 2014 and 2015, and remember, we were still part of eBay until mid-2015, all of our platform as a service layer, which used to run on bare metal, started deploying on open-stack cloud, total complete PDLC support, all of our web API and mid-tier is now running on open-stack. And of course, when we split away from eBay to move all the PayPal workloads from eBay data centers into PayPal data centers, we built the world's largest availability zone, running open-stack, 2000-plus servers, three Nova cells. We don't like... How we operationalize it, we can give a separate talk about that, but three Nova cells, each with about eight or 900 hypervisors, 2000-plus hypervisors running in a single availability zone. In 2016 and 2017, like I said, we're moving stateful applications into our cloud. We are focusing a lot on efficiency and reliability. And of course, going with efficiency and reliability, automation becomes key. So there's a lot of focus on automation, making sure that there is as little manual intervention, as little manual deployment remediation as is needed. And that's kind of what this talk is about, how we are orchestrating into the cloud in a more automated way, in a more scalable way. Before I go further, if there are any questions, we can take them in the end. If you want to interrupt, that's fine too. We can answer questions in the middle as well. And I have a feeling we can run through these slides very fast or there'll be a lot of time for Q&A at the end as well. So this is sort of the state of affairs that we had maybe in the beginning of this year. And I'll go over what this picture means. So all of these nodes that we are talking about here on the right side of the screen are all your compute servers. Now the compute servers, in order for them to realize that they are compute servers, they are controllers, or they are MySQL nodes, or whatever have you, they go to a puppet master through a load balancer. The two nodes are running the puppet master. Puppet master in turn goes to Foreman to pick up the config for each of these nodes. Puppet master has a database that shows I have these many nodes in the system. And say for a controller profile, this is the config for these profiles. So now when one of these nodes comes over to the puppet master, what it does is it goes over to Foreman, grabs the config, and then compiles the entire puppet profile for that node. What it used to do was now, again, multiple reasons. One, this becomes a bottleneck real fast. So in order for us to deploy a complete AZ, and we went through that as we were upgrading from one OpenStack version to the other, literally for 1,000 nodes in the system, you would take hours for it to have all the puppet config deployed into each of these systems. So we would be done with the code upgrade within less than an hour. And then we would wait hours for each node to upgrade to the latest config. The next slide, I sort of highlight the problem. So the master, of course, does not scale. We have issues with large number of servers because it's compiling the puppet code for each of those nodes. We have complex dependencies. So wherever you have multiple different subsystems within each node, getting to the dependencies where RabbitMQ profile is linked to the NOVA profile, NOVA is linked to Neutron, all of those things have to be compiled by the puppet master. And it can get out of sync very fast. Monolithic deployment, what happens is because the master has got to compile the puppet code for each node. It has to compile the entire puppet config. And so now when we upgrade any one module, we literally are upgrading or changing everything, redeploying it in each node, right? Faced rollout on all servers was a pain. I mean, again, this is not necessarily an issue with the puppet master, but this was something that we didn't have a really good mechanism for. So when we would roll out something using puppet, it would actually roll out everything in the entire easy, which is not such a good practice. Any time we had failure, any one of the nodes failed, the puppet compilation failed in the puppet master, we would have drift because now we had nodes sitting with different configurations. We wanted them to be all identical. Puppet infrastructure management, the puppet master itself, the web if something failed, the foreman failed, we had essentially created a single point of failure. So that itself was a pain to manage. The solution itself lay in deploying a masterless puppet system. We eventually went with that. I was talking to Yuri, who used to be PayPal's nominant, he says, oh yeah, it took you four years, but you got there. It took us some time, went through a lot of challenges, but finally we've deployed that. What we have is, so no puppet master, we'll get to, let's do one thing, let's move on to the next slide, where we actually describe what the system looks like. So this is sort of what the end state is. So there is no puppet master, no foreman in the infrastructure at all. Each node, effectively when you onboard it, it gets a puppet environment package that's onboarded on it when it's onboarded using Ironic. The node runs puppet. There's a puppet factor that's deployed into the node at onboarding time. And at runtime, it basically gets the appropriate packages and the config from the wrappers. So it's very simple scales very, very easily. You can do phase rollouts by changing the puppet factor for a select sort of nodes. And instead of deploying it everywhere, you deploy it in 1%, 5%, 10%, 20, 50, and do a phase rollout. You can control each availability zone. You don't have to roll out in all 10 plus availability zones at the same time. And the data itself included in the package, it's very hierarchical, which allows you to sort of manage those dependencies that we were talking about earlier that get complicated very fast. It allows us to manage that very well. With that, I'm going to hand over to Venkat, who's going to take you over to the rest of the presentation. Hello, everyone. So where are we with the current journey? So we have already rolled this out onto four AZs. And we have started seeing the benefits. Things are working pretty well. And a little bit of insights into the acronyms used on this slide. The VSB, WISB that you see here is what it should be. And WIRI that you see is what it really is. So basically, the idea of VSB and Vary is to capture the drift that we have in the system. Last year, we had an outage. And most of the problem with that was we had different patches and drift in the system. So this VSB and Vary models enable us to catch all those drift. And we will have the system with node drift data. Another key idea of this masterless puppet project is to have the config data also under version control. Like earlier in the foreman model that we had, when we changed the config details, we don't have a history of what has been changed. But when you move this config data into a git, then you have a version control of what is getting changed. And you have a track of the changes. That's one of the key idea of moving this into this masterless puppet and having the config data also go under version control. Moving on to the next slide, we have already started seeing the results of this masterless puppet. And especially with the build and deployment time, earlier it used to be ours. And now it has got reduced to minutes. As Anand mentioned, easier rollout of changes. Like this is not specific to this model. But as part of doing this infrastructure change, we also started rolling out, actually modularizing the components so that we can roll out in the phased manner. And also, as he said, in the earlier model, we used to roll out the changes to all the computes in one shot, which was not a good strategy. So in this model, we have the control and we have the leverage to roll it out in a phased manner, like one person, five person. And if things are not working well, then we have better control to roll back the things. And if everything is looking good, then we can roll it out to all the computes. As again, Anand mentioned, we have less infrastructure components, like we got rid of Foreman, we got rid of Puppet Masters, so we don't have to maintain these servers anymore. And as I mentioned about the drift management system, this vis-v-enviary leverages us to capture the drift. And we are working on building dashboards to capture all the drift details. The next steps, since we saw the benefits already in four Asys, we have plans to roll this out into all other Asys in the next couple of months. And we are also working on building the dashboards to capture all this drift details across Asys. Currently, we have a hypervisor level config data captured as part of the vis-v-enviary details. We also plan to capture the application configuration details drift also in the vis-v-enviary data, so that, like for example, a NOVA configuration change, whether we have rolled out 3.2.1 to all the computes or whether we have 3.2.0, we can capture those kind of application level drift details also using this drift dashboards. The integration that's mentioned here is about the configuration management system. We have this preparatory system that was developed and open sourced later. This configuration management system is where we define the vis-v-enviary modules, models, and then we capture the computes, updates this VIRY data using the puppet fact. That is, when the puppet runs in the computes or in the controller nodes, they get the puppet fact information, and they publish it to this content management system DB. And the key thing that I would like to emphasize here is treating the config data also as code. So not just changing the config data in some key value database or in a flat file and not having a track of that changes is not a good thing to do. So we wanted to be able to track the config changes also, and it's a good idea to have them as part of the version control as well. Here is the contact details, like if you guys have any further questions or anything, you guys can feel free to reach out to us. Like I said, we were going to run through these slides pretty fast, so there's a lot of time for Q&A. No questions? So like I said in the beginning, Kafka Elastic Search, No Sequels. All of those run outside of Cloud today, and we are looking to move them into Cloud. We can talk more about that offline, because this is really not the topic here. But I'll be happy to sync up with you offline. So that's where I sort of pull out my product manager card and say, I don't know. What we'll do is, Ritesh and Raj and PK, who worked on this stuff, they'll know the more technical details on this. We can get back to you on that. And also, do you know how you're orchestrating your execution of 1%, 5%, 10%? So I believe there is some Ansible scripting that happens that allows you to change the higher data for specific nodes, and sort of roll that out in that face manner. Thank you. All righty, everybody gets 20 minutes back. Thank you. Thank you.