 Okay. Welcome everybody. Welcome to the session on Amadeus OpenStack journey. What we're going to talk today is how Amadeus built an enterprise cloud using VMware integrated OpenStack and NSX. So when we had this opportunity to come do a session here, we were kind of thinking what would be like a good way to tell our experience with OpenStack from VMware and NSX. So what we thought would be good and valuable to the community was to share a good story and this is the Amadeus story of how an enterprise is using OpenStack and VMware to run a business critical application on VMware infrastructure. So the way we structured this whole presentation is around how an organization took a key set of business drivers turn them into requirements, and how that requirement actually translated into product architecture and how they were able to use this set of technologies to meet their business goals. That's probably the scope of the session, right? Quick introduction, my name is Sai, Sai Chaitanya, I'm a product manager on the NSX team, and I'd like Arthur to introduce himself. I'm Arthur Knoppel from Amadeus, introducing VMware integrated OpenStack with Amadeus. Cool. So that's the agenda that we have in the mind, and I'll actually kind of step back and let Arthur drive because he's been very, very closely involved in this whole journey for Amadeus with OpenStack. Okay. Thank you, Sai. Before we dive into VMware integrated OpenStack or our private enterprise Cloud solution, a bit about Amadeus, we are a service provider to the travel industry. So if you're booking a flight via kayak, Expedia or something like that, it's likely that you book it to us. So we provide flight management, departure control, a reservation ticketing for airlines, for hotels, cars, rails. We make about 3 billion euros revenue a year. We are an internationally located company, so we are in Nice where our research and development is. Our data center is in Munich, close to Munich, headquarters are in Spain. We have sites in Miami, Boston, Sydney, Bangalore, London. Business model, we get a transaction fee in case there is a booking. We do about half a billion transactions per day, about 10,000 requests per second, and we have a 99.99% SLA uptime. So B2P knows us, public doesn't know us, but so in this world, in the travel world, we are known. What were the business drivers for our undertaking to get to a private Cloud to an enterprise Cloud? There were internal drivers, there were external drivers, internal drivers, it's basically part of our digital transformation or our digital roadmap. How do we respond to what we need to do beyond compute virtualization? That was one question which we had to answer. So how do we provide resources and services quicker to our end users? How do we provide infrastructure as a service? How do we cater for an CI CD pipe, which is used by DevOps and R&D, which is basically going in parallel to what we do with infrastructure as a service and it's exploiting features of what we can provide with OpenStack. Also we wanted to provide a platform for our next gen Cloud native applications. So that's the internal drivers. We had immediate drivers from customers who came to us and said, okay, yes, you're good to provide our next gen services, but they have a very high SLA. They have a SLA of five nines and then we said, okay, how can we do that? We have to do it with a different approach on how the applications are built and how the applications behave. They have to be full tolerant to hardware failures and in turn it would require then a platform which is like OpenStack infrastructure as a service. What we will do is we will and have already two data centers in the US, in Santa Clara in California and in Ashburn, Virginia. We are currently doing user acceptance testing with the customer and we go live in production in January next year. So the whole undertaking started off in June last year and now we are at the stage to do UAT. So the requirements for the business, it's in general, we have to have a higher turnover. We have to be more satisfactory to our end users. They want services, resources now. They don't want it in weeks or months, they want it now. And also the CI CD requirement of R&D and the uptime, that's the business requirement, I just mentioned it before. Technically, what were the requirements here? Getting to fault resilient application architecture, automatic recovery on fault, starting up new what we call isolation zones, over availability zones, in case of failure. We are using container-based technology, so we're using Kubernetes wrapped into OpenShift. The whole thing is basically modeled-driven, so you can put down your application topology and heat templates or in the future in Terraform templates and deploy your environment onto your platform, make changes to it in a standardized and automated manner. This is a picture of the layers we have, the lower half, the blueish bit, what you can see here is a VMware integrated OpenStack, basically this bit here. We're using VMware hypervisors like ESX for compute, NSX for network, Reson for data stores, vSphere managed by vCenter. We also are using VMware technology to operate the thing, so we're using as the log manager of all this infrastructure which is called Loginsight, and we're using operational dashboards also from VMware. So all this lower layer here is basically a VMware provided solution. On top, this can be considered as the pass layer. This is a distro of open source components, which are compiled by our R&D into what we call Amadeus Cloud Services, so we have an artifactory. We have, we're using for registry of services, we're using Consul for building containers, we're using OpenShift Kubernetes, we have Databrase as a service, we have our own instrumentation for the application. So this is all part of the CI CD pipe and the application bit, platform as a service, and the lower bit, this is what we're talking about here, this is VMware integrated OpenStack. The way this is put down, we have a topology, we have four hosts where all the management, the control plane is hosted, so we are all itself. We have Resenter running there, we have NSX running there, we have our monitoring like login site and VROPS running there, and we have the data plane, which is the payload cluster. We have about 33 of those, and we have three edge clusters. So that's how the lower bits you saw here before, how this is distributed. Then, okay, when we look, what does that mean for a developer? So a developer, if he has to make a change, he clones a project, he tests on his laptop, he commits to stash, he creates a pull request, he builds a change, and at the point when, for instance, a topology change is required to the application, at this point in time, he is basically independent from what he used to be. He doesn't need to call ops anymore, a sysadmin or a network guy or firewall guy, he basically does it himself, so that's a huge advantage. And in the future, which we are not at this stage yet, but what we wanna do is the QA, the regression testing and the further push into a user acceptance test phase or a production phase, this will be automated as well. And this will be handled by DevOps. What we are using for getting the application topology together is a set of modular heat templates, Yarnel files, where you basically have a huge integration stack, which is offering everything we have, so basically any type of cluster, which is available in our environment, networks, subnets, distributed routers, firewalls, this is all put down in this integration stack, where upon Amadeus specific rules, for instance, permissive or restrictive rules for firewalls are in individual Yarnel files or heat files. This is then offered to DevOps and they take the elements they need to build the application topology. And I mean, if you think, okay, what we do now, we basically create a project on OpenStack with a given quota and then an end user can deploy using these Yarnel files or this heat templates, it's entire application topology, the number of networks it has, the networks it has connected to databases and payload servers, firewalls, et cetera. And he can deploy this, run it against heat and this takes, in our case, for about 50 minutes. So the project creation itself is a matter of seconds. The deployment of the application topology of the stack, it's a matter of 50 minutes. The application deployment included. So that's 50 minutes. If you compare it to what we used to do before, okay, you have to have a business case, you have this as well. Then you purge a service, you get it in the house, you cable it up, you rack it up, you put an OS on it, you put a network on it, you set up the firewall, et cetera. And that is a matter of weeks and months, in our case. And with the solution we have at hand at the moment, this bit has already been significantly reduced, literally down to 50 minutes, which is quite good. I mentioned before that this one customer requirement which was to have an SLA of five nines, where again our R and D people said, okay, we have to use a different type of technology. What we are doing now is we're using a circuit, so-called circuit breaker technology from Netflix, I don't know if you know that, which is built on dependency graphs. And in case of fault in one, what we call isolation zone, isolation zone, this is just stopped and build a new onto another availability zone. So now as isolation zone is basically all the elements you need to have an application running. So it's network, it's payload, it's data stores. And in case there's a failure detected by our instrumentation, it's just shut down and as a next step, it will be started anew on ESX host which are available. When we looked at this requirement initially coming from the customer, we said, okay, can we do this by a public provider? Can we even do it in a public cloud? But granting this very high SLA, the cost for it, even if people were willing to do it and was 40% higher than doing it ourselves. We also looked at multiple open stack or not open stack was never really so much a question for us. So we looked at Mirantis, we looked at Red Hat. So we looked at various distributions and we in the end decided then to go with VMware. I have another slide on this because we have a longstanding relationship and some other considerations. But it's not that, I'm trying to say yes, it's not that other distributions are bad. It was just a fit in our case. We're also using NSX as our SDN. In my context, this is a bit of a byproduct because we are also independent from that. We also looking at the next-gen network in our data center, how it would like. We had a proof of concept with Arista Palo Alto VMware using NSX and as such we also used it in this implementation. We are using some features of NSX also in this cloud context. We're using, for instance, the statistical logic routine which where we saw immediate benefits in reducing its west traffic collateral calls because basically you stay on the same ESX host or you go via VXLan to another ESX host and you don't go to the edge anymore with this implementation. Other than that, I don't know if you want to talk about benefits of NSX or what. Sure. So the key feature here is distributed logical routing. So distributed logical routing is a feature where you push the routing function to every hypervisor. So if you have two VMs that are on two different networks what happened in the past was all traffic used to go all the way to like an upstream router and then come back to the host. So if you have like for a simple example like a web VM and app VM, so the traffic would always happen to some physical router, right? So and with the case of NSX, the routing function is pushed down closest to the VM. So if two VMs are like if a web VM and an app VM are on the same ESX host, the traffic never leaves the host. So what happens as a byproduct of that is very good improvements in performance and which is what Arthur is explaining right now. So distributed routing is one function which we kind of push down to the host and kind of distribute it. And similar functions are distributed firewall which is the capability to use open stack security groups and go implement them as a stateful firewall on every hypervisor. So in addition to that, I'll just kind of touch base on the last point there is basically NSX has the ability to extend logical networks across data centers. And I think that's a very, very powerful future and it's part of your roadmap where you can do disaster recovery across data centers. Right, I mean yes. I mean, we did do a proof of concept where we did do a switch between data centers using NSX, so. But in this context, that was not the main point. But why did we use a VM where, I mean if you remember this talk from this morning, from I think the lady from Gardner talked about this bi-modular IT where you have your critical services managed in an ITIL fashion, ITIL type of way and where your DevOps type of new approach would be handled in the cloud. And in our case, and also if you remember what AT&T has done at a very large scale, they must have a quite qualified workforce and not only two or three people, but many. Otherwise they wouldn't be able to pull this through and pull this off. In our case, we are still pretty much concentrating on our mission critical services. I mentioned before departure control, et cetera. So if this is, I mean, if this was not working, air traffic would stand still literally. So we have to be there really cautious, which also means that we don't have so much personnel or qualified personnel available to implement by ourselves solely by ourselves, something like an enterprise cloud. So that was one reason why we partnered up with VMware who are long-standing partners with us anyway. So we have, I think since 2000, I don't know, since over 10 years we have basically running our workloads on vSphere. So that was one decisive factor also that we get basically all hypervisors from VMware. Once we know that the new ones like NSX and Visan, plus we have also the appropriately delivered by VMware like the log manager and the operational dashboards. So that's why we decided for VMware. We came up with a design. We had this design counter-checked by a competitor really by Mirantis and the design we came up in the U.S. in the two sites in the U.S. And also the one cloud we have in Munich was approved as a valid solution. So how did we do this? We didn't, I mean, how shall I say? We were pretty much under pressure to deliver something with a customer constraint, the customer commitment dates in the back. So like I said, in June last year, we started off with VIO 1.0, that was an ice house. Then until end of the year over Xmas actually, we integrated VIO 2.0.1, which was on Kilo, which also had a lot of bug fixes we found in the meantime, incorporated. That was our blueprint. And this is what we did implement here in the U.S. and also in our production cloud in Munich. And the next things we are now looking at is stability or probability in performance. So this is roughly the type of phases we are undergoing. Mentioned already, yeah, we started off with 1.0 ice house. We found about 27 bugs. And mainly when this happened, mainly when people were creating and deleting stacks. And we cannot say that there is a specific pattern. We cannot say, oh, this is all on VIO or all on NSX or all on OpenStack. It was everywhere. We had NSX bugs, we had VIO plugin bugs, we had minor OpenStack bugs. We had ambiguous stack configurations. We had insufficient underlying hardware, meaning not enough capacity, running with one edge host only at one time, for instance, important point in time. So, important point is we cannot say that there was just one thing to look at, it was many. But as far as the software is concerned, VIO and NSX, that was all consolidated. And we can say, I can say that we are really in a stable situation since then, since we loaded VIO 2.0.2. For the stakeholders who are the people working on this, it's tiny really. I mean, the Boris from Miranda said there is this enthusiastic sysadmin where sysadmin, we have this one guy. We have a real good network guy. We have a guy who understands Puppet, Ansible, YAML files, he understands Cloud. We have a regular Linux sysadmin, and this is about it. So, these five people together with VMware basically built the slower layer you saw before. The consumers of what we, of this Cloud are a bit bigger. So, we have a DevOps team of 12 who are doing the application change management, if you wish. And we have about 70 guys who are composing, building this Amadeus Cloud Service, this CICD type with the Archefactory and Service Registry and building the containers, deploying the containers. What we have, as I said before, we have about 40 hyperwisers, three for the edge, four for the control plane, and 33 for compute. In production, we will have five racks per site with 16 blades for compute, one rack, one for Hadoop and one for Oracle, and two for couch base. This is our specific setup. Two management servers, and three edge servers, so, and we will be looking at approximately 1400 instances, VMs. Best practices, the way we did it, we were pretty much driven by external circumstances, so to speak. I think it would be prudent if you develop a Cloud strategy which is where you would consider what is your digital roadmap? Which workloads do you want to put in the Cloud? And would these workloads have to undergo a transformation of the application itself, or would you place them one-to-one? What is your region concept? Would you be globally distributed? Would things like that? And obviously, you also have to understand what your shop is in terms of this bimodular IT. Then I would start with, and this is what we did, just use the things you really need. So, after all, what do you need? You need compute storage network security. So, we have Nova, we have Neutron, we have Cinder, we have Keystone, we have Glance, obviously, for the images, and we have Heat for the orchestration. We're looking at Chargeback in the future, so we might use Cilometer. But at the start, don't overcomplicate things. Just use what you need, what you really need. And maybe it's not so good to start with project dependencies in your back, because then you're lacking of really preparing and thinking of how you want to build out your, or implement your strategy. What we are going to do as next steps is we're gonna do quota management, recharge to business units, supply pool management, onboarding, all these things. We further have to improve our end-to-end probability. So, I mean, often, if you have a problem, what do you do? You look at the Neutron database, you look at NSX, then you look at vCenter, then you look at Loginside, and to integrate that, that's a bit of a challenge, and this is where we're working together with VMware as well. And we're looking at how we want to set out our regions. I mean, at the moment, we have two regions, one on the East Coast, one on the West Coast, and one in Munich, but we want to expand to the greater Munich area, or maybe within Germany, so that we have more sites, that we have a true regional concept. So this is pretty much what I have to say. Should I sum it up? Sounds good. So I kind of started the session by telling that, we want to share with you a story of how an enterprise customer is running a business critical application on OpenStack, and Amadeus was one good story. So with being VMware, we get to see a lot of customers doing OpenStack, right? Some of them go very well, some of them go reasonably well, and some of them don't do that well. What's kind of really important for success with OpenStack is having a clear driver for what you're trying to do, and then translating that into clear requirements, right? So if you kind of go back, think through what this presentation was about. There is a net new application, a cloud native application, it was OpenShift in this case, and the cloud native application needs to be built in a highly resilient fashion, finance, and to do this, so that those are the clear business requirements, right? And the team was able to come up with a very nice architecture, and in this case it was VIO and NSX to support this application. So I think having those clear understanding of requirements and a clear driver is kind of very important for execution. And the second aspect that was also kind of very striking with this team was basically in the terms of how they structured their execution. They first started with a clear qualification, and then they went through, like stabilizing the base platform before going ahead and trying to do something like stress performance. So when you have a very clear execution plan and kind of break it into phases, your chance of success is pretty higher. So I think that sums up pretty much our story for this session, where we have people running the so-called cloud native applications already in production on OpenStack, on VIO and NSX. So at this point, I'd kind of open it up to any questions that you guys have. Do you want to go to the mic and... Thanks for your insights. My question is actually related to NSX's story with OpenStack. How you guys are able to manage other networking tools, like NSX, or for example, there are some open source with which OpenStack is working like OpenDelight. How does the other networking tools fit with OpenStack in your experience? So are you trying to manage NSX with OpenStack Neutron or with OpenStack in general? How do those two fit together? Well, we manage NSX with NSX. It's part of this. So in this solution, what Amidius has done is they've run OpenStack, which is VMware integrated OpenStack, and OpenStack obviously requires Neutron, right? In this, I mean, you can choose to do no one networking on Neutron. So for their use case, they required Neutron, and the Neutron backend was provided by NSX in their case. It was completely provided by NSX. And the other key thing to kind of touch up on complete upon your point was, Arthur kind of spoke about those heat templates. So the heat templates captures the application blueprint, right? So it captures it both at the infrastructure level and that at the application level. So what's important as part of to kind of tell is the heat template that they're using in day to day also captures the networking constructs. What that translates to is every developer change that gets done, it actually is actually tested in a manner that's actually, it's going to be deployed in production. So it's tested with the same set of networks, same set of load balancers and routers. So that's to answer your questions. Thank you. You mentioned you had 40 hypervisals. Are they all in one big cluster or multiple smaller clusters? No, they're distinct clusters. Okay, and the second question, you mentioned reason. Is the storage all reason connected or do you have any external storage and what type of storage? Yeah, okay, we have three distinct clusters. So one for the control pane for management, there we have four, a cluster of four. For the edge, we have a cluster of three and the rest in the case of 40 is 33 for the compute. So these are three distinct clusters. For Visa and Visa we are using only on the control plane on a management cluster. What we're using for the data stores is local storage. So for a couch based Oracle Hadoop, we are using local storage. So we created a flavor where when you specify the server type, you would automatically use local storage. And for the compute, which has very little need of data store, we use Tintree, Tintree shared storage. Any more questions? Okay, sure. NSX, are you using VXLan or STT? VXLan, yeah, it's VXLan. Is there a particular reason you're using VXLan? Is it because you're using the distributed logical routing? So then, the whole solution uses VXLan. So you're using VXLan, you're using VXLan, you're using VXLan, you're using VXLan, you're using VXLan, you're using VXLan, you're using VXLan, you're using VXLan, so when we, in VIO and NSX, the whole network virtualization is provided by VXLan. Okay, what was the reason to choose VXLan? So the product was kind of built with VXLan because VXLan, the NSX product is kind of been there in the market for more than three, three and a half years right now, right? And at that point, almost like five years ago, there was a, people were working on a draft and the draft happened to be VXLan, right? And the product was implemented with that draft. So VXLan is the standard that we, all NSX deployments are running today. However, there's work like in upstream communities to work on next generation headers and those are genuine. Thank you. Do we have any more questions? Okay, thank you. Thank you for taking time to attend this session. Thank you.