 Test test. Okay. Good afternoon everyone. I'm Huai from eBay team, COW team, and I'm going to do some instruction of our evolution of our dive environment setup in eBay and If you want to take a nap after a heavy lunch, I think I can understand So this is my agenda So why we need to have a specifically topic around being the how to set up the dive environment and What's the challenges we face to setting this up? And also I'm going to tell you the history of this process evolution And also we are Highlight some lessons learned and challenges we had now so why we have a Specifically topic of the test and the dive environment setting up So the first will look at a scale We are running a pretty big open stack distribution on eBay side even after the split up with PayPal The scale is still pretty big and we have more than 20 components running for this whole infrastructure at the service including the core open stack services Nova Neutron Keystone and Other services like monitoring logging So we need want to find a Same pattern to setting the all these services for Dev and test environment So that when testing one day is pretty critical for the PDs and the QE's developers and then the quality engineers For example, when you have with some issues running in a site, how can you reproduce? You want you probably want the same environment as you had in the as you have in the production environment? For example, you probably want to some new features some cool stuff, but how can you make sure? What do you have running in the dev event can also running in a production environment and then we have a The open stack will release every two times every per year and how we can make sure the version back work Compatible during the upgrading so we think this is pretty Critical for us, especially when we're running a live sign you have to make sure your Services running all the time So we want to have a mini Mimic environment that setting up with as same as the production and As I said, we have a lot of services not Not only in the core open stack services, but also the monitoring services Also, we had some plugins that for the for example the Nova schedule of plugins that running outside So the Dev stack is not enough for us And then we will we are using the puppet to deploy all the services in our site for the cloud infrastructure service and it wants to use the same for the our dev emails We built up a lot of put up many a lot of efforts to building the whole puppeted thing up So we want to using the same thing So you can have the same way to deploy your service in the dev into the production and meanwhile you can also testing your puppet Puppeted code for the deployment so you can make sure okay. I can use upgrade this puppet code puppet and manifest and then it will Work when I upgrading in my production Also, we want to use in the same package built for all the components in the same topology Because we are running in using the normal cell for each availability zone and they also want using the OBS we are running a pretty fairly large SDN solution in a site so we want to have the same thing in the dev environment and This thing has to be simple because you may you for example when you have a new hire You don't want him go through a lot of document to setting his own Dev environment up and It should be repeatable So seeing that you have a one test environment have a one dev environment you want set up another one This should be a repeatable process and all the process the thing Should be still working when for example when you upgrade into the kilo version still it should still work So this is our the challenges we had so we running At first we are using the Sheld hardware lab for our QA and the That developer environment And before that let me introduce you introduce some deployment process, you know eBay as I said I you we are using the puppet to follow all the services department for our cloud services so For each of services we have Puppet a module and All these modules are deployed to the in the puppet master and we are using Foreman as ENC server To our services, so How many people here are familiar with this form and the puppet master. Can you show your hand? Okay Let me go a little bit of detail to this former and the puppet master thing So we have a so we want to have our cloud services as Resilient to fail right because every hardware can be failed. You don't want to today you have a Box fail you need to have some you want you need some menu process to be this is up So what do we have is we using the puppet and the Foreman to do this For each of node it was For example for I have I have I have a new Nova node one for adding to my environment What I need to do is I haven't I have this box by some provisioning process What I need to do is we I add this node to the host group of the form in the Foreman so Foreman will then know that this node belongs to a Nova host group and then in the Foreman also we Related this the the Nova puppet class with this host group So for when when the node come up saying hey what I should do and sending a request to the puppy master Puppy master then we are sending a ENC request to the Foreman Foreman will return back. Hey you belongs to this Nova Nova API host group and the you should apply this Puppet class and the puppy master then we are do some comparison and then sending back to the node This is the catalog you should do and in a node puppy to agent will then download the Nova package Nova API package and Setting up all the configurations and the starting up the Nova service. So this is how we Using this puppy to the follow department So back to my previous question. So when you have a hardware failure What I should do what we need to do is we on board a new box and the registered to the Foreman same Foreman host group then it will automatically taking over the starting up the services for example, Nova Neutron services like this So this is how we do the department in a site and at first which in our dev environment, we're using the same thing we're using the Bellmetals and we're using Foreman the puppy master as the seat and we do some Manual population of the pair of puppy the parameters in a Foreman for example a version in the MySQL database So this is the good thing good part for this is we're using the same configuration management to at the same as the in the production But the bad things can everybody can imagine. So this the hardware has some limitation we every time I want a new thing I need to have some new hardware the hardware has the hardware resources has limited has limitations And it's pretty heavy process and hard to repeat and Everyone if everyone using a shared environment I'm a Nova developer and he's another some other guy's Neutron development Everyone only in folks his own part and some later on some days later We'll find the whole dev environment will be broken because we are not maintaining so well, right? After period of time, we need to do some cleanup So we have a Another solution we call Pinocchio, which is based on the cloud is cloud over cloud So so we think about this way we already had we already has a cloud running on So we can ask clowns to get some computer resource and the network resource and we're setting up another cloud on the top of this cloud In the our dev VPC the dev private cloud So we during this process the Pinocchio to we are asking the Pinocchio to actually is is an oxygen engine in this diagram It if you're asking it ever take the input from the Jason The Jason we are specifying which version you want the Nova version Neutron washing the monitoring service version And do the topology so well you want to put do you want put Neutron and the Nova in a similar Do you want to using a normal cell now? How many normal computer you want? Yeah, things like that. So take this to as a input it will then invoking the normal API to the production cloud environment to get some VMs in dev VPC So you can you can get a bunch of the VMs and then it will running some script to invoking the puppet class In this diagram, we are not using the Foreman and the puppy master because it's we are using the locally local locally apply of puppet So after you get to the VM Pinocchio, then we are trigger the the puppet run so everyone will then follow the topology we defined previously to set up the Nova service Neutron service and In other services in this in your club. So it is It is better than previous with what we had for the hardware share the lab It is we still using the same configure mentioned to using the same puppy code to do the department process as as we do for the Production environment and it's all automated So what you need to do is using a command to do and the input input define the topology configuration and the reverse define the versions you want and done several Pretty I guess ten minutes later you can get your own dev cloud But it is still heavy because I Need to wait a bunch of minutes to win to to wait my dev cloud out Up and it's not flexible the the meaning of the not flexible is For example, I don't currently I have a dev cloud starting up Without the Nova cell but later on I want to test some future with Nova cell what I need to do What I need to do you need to another dev cloud with a new configured topology the RCI is currently on this model and The previous the previous the first the three Components is at same as upstream. We have a Garrett review. We have Zoe and we have we have a check-ins so it's the same as the Country the community did community does the difference isn't on the behind So it will trigger a pinocchio run in the approval job of Garrett if you are then you develop starting up a dev cloud all of ems cloud on cloud and Then setting this up and then after that it will trigger a 10-piece drone So the tempest actually running against a cloud that actually has the same topology and the configurations in our side So as I said this is better than previous hardware lab But I still have some issues because not flexible and the pretty heavy So we're using color the color is we have one every service is had is a Docker container and We do did some customizations for each of the image that a color project provide And upload this image as a parent image to our own Docker registry and For this For this time for this version we need to ask a cloud because we already have a cloud right and You can ask cloud to get give you some compute resource give you some compute and Then after you get this compute you can define the environment in this environment you define Okay, I want which host I want to use with Nova should running on which host and you choose should run it on which host and You define this you can use in the Docker compose that provided by color project and then it will launch each launch all the Container on this Docker hoster So it is pretty flexible because you can think about this when you have a VM you do something wrong you probably your VM is screwed up you need to clean this VM out and you It's pretty hard to clean this VM. I probably you need to get another VM But for this for in this time, what you need to do is just stop stop the process right and they start another one It's pretty clean is All these once you did something on this VM in a Docker hoster in a Docker hoster and You want to do another another thing you just stop that and each time it's isolated pretty it's isolated with each other and For all pattern we didn't using the multi-host Ansible yet because when we do we may do this in Multi-host Ansible is not supported yet. So what do we what do we did is we? Plugging the puppet code in the image. So every time the Docker Container starting up it will run in the puppet. So it is again is it is it is using the same public code that we deploy to our production The reason that's why we why don't we using every time we using a new image is because we want backward compatible ability in our production environment so we can cause in our production environment We're still using the using the puppet to a deployment and the the better another better party for this one This is still partially automated. We need to have this Image upload first right and we need to have the have the all these Docker hoster provision the first and for this Using by using the color we are using the killer version as I said we didn't using the Ansible multi-host yet and We need to to make it work we Added the puppet in the Docker So we can using it's pretty important for us to using the same code in the production and the depth and Also all the environment while I rose are injected as popular facts So puppet when the puppy the running it I can get to the facts that he injected outside and Modify our public code because previous public code is using we are using is adapted for the VN and we're using to we have to use it as a microservice model And the color project doesn't didn't support no one sale and the overlay networks and This is a user case is perfectly for our For all the new approach to setting up the development dev environment and testing environment We had Abrogation upgrading in earlier of this year from Havana to Juneau and We decided to upgrade them one by one one service by one service I think we did the keystone first and then Nova new charm So we building up the way we did this is we're building up a full Havana environment and We do the upgrade one by one and once we upgrade the keystone and we trigger a full test in the in the whole testing environment and so in the testing environment that you have a Juno keystone and all the other services are Havana and then after testing if it went through we were then running a Juno tempest test Juno test the cases on this environment and then if everything goes through we will then to upgrade all the Nova to the Juno and again do the testing both both for the Havana version and the Juno version So it's pretty flexible by doing this way by doing the by doing the Container way is pretty flexible for us to replace all the services in the The same environment so let's incident we need to automate everything I know this automation thing has been talked about by everybody, but for us is It's important right because it's about infrastructure build up so how you build up your infrastructure in your production and you need to do the same thing for your Dev and the testing environment and The all up automations in need to be means repeatable It doesn't mean You do this automation once you have a bunch of scripts and then after you upgrade in Juno your scripts Cannot work anymore and they expect the failures Especially it will be flexible for you when you're using contender as a your as your component or open stack service so every Part can be filled and you can you want you probably especially for important for your development process development environment You you want to have the flexible flexibility to change each of the service and You want to have a It's the debugging is friendly because the user of this pattern is that developer so they want to do these things and It's important for us is the speed So how how long you can take how how long you can setting up? Devil environment is important for us. For example, if you are developer you want to setting up New devil environment if it takes more than like 30 minutes You're gonna what are you gonna do right you want to have a new environment if you can you if you have the Capability to setting up a devil environment in minutes, it will be awesome, right? The challenges we had We are not using the this docker thing in our production department yet So that's the reason It's be it's be the wheel that for the for For color project we still need to have a puppy that running inside of the image And we need to have more elements to receive We need to have the elixir elixir search UK server and the graphite and the graphana and We are using the topics as monitoring and The HAC setup these all these things is not down in for our container based thing and Right and we had some drift Between us and the color color upstream code The reason is we have some specifically Thing for example or OVF thing. We are using the NSS to run in the overlay So we have some specific specifically thing for our environment And we want to moving to the multi-host services in fact when for our Kubernetes cluster we are using this way and We are using this multi-host services by Ansible to do the whole whole whole cluster setting up and We need to think about how to do the Oxygen including the follow-up color project. For example, you have some dependencies between each other, right? you want you want to The services dependencies should be oxygenated by your oxygen layer Okay, I think that's it Questions. Yes We as I said, we we are the Kubernetes cluster. We are we are investing are using that thing and We also have invested something on the heat because heat is actually a orchestrating layer for open step We also use that thing to Try into using that thing to building up to orchestrate it all that environment setting up right, so heat can Heat on the heat it can heat on the Each open stack service get a compute get a new network and Stop starting up the services. You mean the new image We so the the issues we do not have because every time we we want to do the upgrading We are using we clean the remove the previous one and the starting but not the one and in that side Inside of that Docker container into our running a puppet up to setting the Ola services up We are still evaluating that thing. Okay. Thank you