 Welcome everybody. Welcome to what are very fine open Compliance and testing production environments. Oh, sorry So I'm Robert. I'm a platform engineer at any nice. I'm Oliver I'm a part of the same team as Robert is so we are both part of the in-nines data service team and The daily basis more or less. We are working with the open service broker Open API sorry and we are here to share our experience in using and testing the open service broker API in production environments so Few years ago We have encountered cloud foundry and a few people in our company thought hey, this is awesome. This is cool This is a game changer. Let's use it. So for a few years now. We have a public pass offering based on cloud foundry called any night. So the same the name is our company name and In the in the in the early days our developers realized that using build packs and pushing Apps is really really cool But on the other hand when it comes to the begging services that came with cloud foundry in the early days It wasn't That great anymore So we saw an opportunity here to create data services that run that are production ready and we could ship to our customers So after a while we Put together some kind of mission statement Fully automating the entire lifecycle of a wide range of data services To run on cloud native platform platforms Across infrastructure at scale So pretty long mission statement that might sound pretty scary, but it's pretty awesome Okay So with this mission statement, we started with Postgres and MongoDB data services We created them and provided those service offerings and our public pass and Yeah, the the the open service Broker API is Is a rather lean interface to easily provide backing services to cloud foundry so Since a while now It's the open standard that has even been adopted by other technologies like kubernetes and kubernetes like distributions like shift for example and We had even a customer that Communicated with our service broker just using the open service broker API, which was pretty cool so Yeah, it allows the service provider so these open service broker API it allows the service providers to To provide their offerings in a self service fashion to application developers and So if you are a developer and you are interested for example the Postgres service You just just have a look at the poll at the CF marketplace and Hopefully you would somehow find a suitable Postgres database service. Maybe hopefully and this Data service data service is then rather easily to create right. You just need to create service You issue this command and it will automatically provision This service for you This request gets redirected to the service broker implemented by the service provider and Yeah provisioning in this context can mean many different things like for example Provisioning could mean hey create a container on a shared VM or create a database in a shared Database cluster or even create a dedicated cluster just for the on virtual machines just for the service Yeah, and there are many other endpoints that you just need to implement and once you implement all those API endpoints you are ready to integrate your service into the marketplace But at the first glance it might seem to be rather easy But we learned at any lines that there are yeah a lot of other non-functional Requirements that you need to fulfill in order to be somehow enterprise ready Yeah, while the service broker interface is really simple So there is a method for example to provision a service instance and there is a method to for example provision a credential debt And that's basically it so when looking at the specification you think oh, that's easy I going to write my own service broker and then you deploy it to production and then you learn a few hard things We learned a few years ago like for example That you want your production load to run on dedicated virtual machines or at least right now You want that because even if container are coming quite fast and are quite popular When it comes to run databases It's a good idea to run them in a virtual machine because the isolation is just better and especially larger enterprises They just feel better when this this kind of workload runs in an in a virtual machine isolated with technology that has been proven for many years And because you want to use virtual machines. You want to use an on-demand provisioning So this means you don't want to allocate resources in advance You want to allocate resources virtual machines whenever there's a request to provision a new virtual machine When it comes to testing that gets quite hard because provisioning a virtual machine Takes some time. So five minutes. It depends on the infrastructure. They're infrastructure. It takes ten minutes. So You have when when writing tests you have to be very carefully wet when provisioning a machine and one not Another requirement that we learned is important and that makes testing very very hard and ensuring that it works is that we want our services to be installable by the customer side on-premise and We want this We would we want this to be possible by using Bosch or by using the pivotal ops manager, for example and another thing we want is that We want that our Solution works at many infrastructure. So wherever cloud found he runs there should it should be possible to install the data Services there so as you can imagine there There are a lot of different environments where we want our data services to work and to function Yes, robot already explained we all also want these Services to consume by multiple platforms not only cloud found we want them to be consumed for example by Kubernetes Another functionality that or requirement that we have is that we want our services to be highly available Not only the service brokers which actually manage the service provisioning But also the service instances like we want to have a post-quest cluster set when that when one node fails There are two other nodes taking over the workload Of course we want backups of our data services We want to have a configurable backup schedule and we want that Provide we want these backup functionalities to be provided to the end user by an API or by a dashboard so the end user should be able to create backups and restore them and of course we want monitoring and Capacity upgrade that is actually a feature that is covered by the open service broker API Yeah, in in terminology of open service broker API. It's called plan updates So if you monitor your service instance, and you recognize, okay, I'm running out of this I want to add more disk to my service instance You do a plan update Yeah, all these things That are things we learned we have to pay attention when implementing this broker API and which we already somehow want to test To implement these we came up with a microservice architecture From the top view it looks like there are 10 microservices playing together to fulfill this this requirements If we scale in we see that there are more than 40 components Included and played together to to actually make that happen When we have a look at the Good practice how to structure our tests we see a common common pattern what which is called test pyramid So it's actually says that we should have different kind of tests We should have unit tests which provides you very fast feedback cycles. They are executed on the code level So whenever a developer works on it on a code base to to implement a new feature He executes the unit test suite and he gets feedback that he broke something and And the Apple level we have integration tests. So these tests usually take longer than the unit tests So it takes longer that we get feedback that we broke something But they are more robust they test more because they test how our components interact with each other And and the end we have these end-to-end tests or manual tests These are tests from the user perspective very consume the functionality from the user perspective as the user does And in the early days, let's say three years ago when we started developing those data services We followed that test pyramid. We had a large set of unit tests We had a large set of integration tests not that much that unit test, but we had them But still in in production deployments we encountered some issues So we ended up in testing and testing more and more manuals manual steps So we ended up in having a test protocol whenever we ship a new we release We we click through the UI and we we say CF create service because We just lose lost trust in our our integration test because there are so many moving components that we have to somehow yes still make a test from the end user perspective and That was very hard to release new features and to release new releases of those data services. So what we did is we automated those end-to-end test and When it comes to integrating new data services, we have another requirement in our Service framework that is we want to use our existing components and integrate new services into the marketplace very fast Yeah, and When we for example integrate a new test into the test suite, we want this test to be available for each existing service and Another requirement we have because we are deploying those data services in a in Many many environments which are not not the same. We want these tests to be executable by the customer So the customer can verify by themselves whether this release works like intended To make the the customer able to execute those tests we came up with a with a Bosch release errand So we are deploying the services with a Bosch with Bosch So that was a natural fit to to use a Bosch errand So once you have configured a Bosch and once you have deployed the product with Bosch You can run this errand with one command and then it verifies everything whether whether the main features from the user perspective work as expected You can see that you can actually configure which types of services should be tested So you can configure that a MongoDB 3.6 and the MongoDB 3.4 should be tested And you also can configure which plans should be tested for every plan you configure The smoke test or the test suite will actually create a service instance and perform some actions on that Because that takes a while we have the possibility to paralyze that so at the top you see How how many many tests you want to run in parallel? We also have feature flex for those tests so this means not every service supports every every feature and So you can disable these these flex for for different services and Sometimes you don't want to execute all of the tests you want to have a faster feedback But test less and then you can actually turn off those features Another thing you see here is that because we are testing that from the user perspective, right? We are specifying a Cloud Foundry endpoint. So at the end the test suite will actually talk to Cloud Foundry and Actually create services and use them from the Cloud Foundry perspective So that is quite important because and there's some Cloud Foundry features within those services like for example Automatically creating application security groups By using the Cloud Foundry setup to verify that everything works Yeah, we actually also implicitly test whether security groups works and and so on also important is that for example You missed one change in the open server broker API specification Which has been implemented in Cloud Foundry and then that test suite will tells you okay that That version of Cloud Foundry is not compatible anymore with with that service broker API So please go and fix that This these tests suite or this errand then performs a couple of test cases like for example the service instance creation So we create a service instance and we test that these service instance can be accessed by an application deployed to that Cloud Foundry setup We test that the bindings work and that the bindings can be used that apps can access The service instance using these bindings That the deletion of the bindings works and that when a binding has been deleted the credential of that binding is not usable anymore That arbitrary parameters or custom user parameters works that plan update works like expected and that the backup and restore works To two and cups. Yeah, the goal was to to actually minimize the effort to introduce new new services into a marketplace so we somehow had to come up came up with a Generalized testing framework as well so we have a framework for integrating new services into the marketplace and We came up with a framework to integrate new tests into into this test framework Our first attempt was called the service binding checker. That was a small application that has been deployed to Cloud Foundry It has an interface that allows you to access the database in a generic way So it doesn't matter whether it's a postgres whether it's a rebutt mq or read this we always use the same interface So we don't distinguish years between different service types The application reads credentials from the VCAP services, you probably know the VCAP services environment variable So whenever you bind an application or in service to an application Cloud Foundry will expose the credentials to that service to the VCAP service variable So the service binding checker read the environment variable and tries to connect to that service instance There's some other endpoints To make some other actions, but that's basically it. We will talk about the the API of that that application later Yeah, and we try to make use of the service through that a generic API which encapsulate all service specific implementations That first attempt it was written in Ruby and because we run a lot of tests in parallel So currently we have implemented seven different data service types and we test a lot of service plans for each For each data service. So it ended up in in a massive amount of parallel test runs and deploying This Ruby application for each test run was some kind of memory consuming So we re-wrote that and go and we now call that Binding Go, which is something like binding checker app and go And it actually has that interface. It's as I said, it's a generic interface It allows you to test the service instance through that application by using that generic interface For example, we have a status endpoint which for the implementation of Postgres creates a database creates a table within the database inserts a record and then Deletes that record and deletes that table from the database For rabbit MQ it looks a bit different, but it's the same interface for rabbit MQ We create a rabbit MQ queue and then we insert a message We consume that message and then we delete the queue and then it's the same for for Redis We just insert a key value store and delete the key value store that that happens when we call this status endpoint You see a more specific example in the bottom. So the only thing that is specific here is the name We say we want we want to test that Postgres instance and then that is everything that has been different To yeah different to all the other tests if you want to test if you bind the Redis there We say an S Redis and then we test the Redis instance So this abstraction or the encapsulation of every service specific thing into this API gives us the possibility to actually Automate these test cases we want to test in a in a generic way So everything else is now generic and can be reused across different data service types So let's go through these test cases and and see how we perform these test cases by using this Bindingo API So the first test case would be to have To simply just check the service access whether the application can access the service so We deploy or recreate the service instance of a service we want to test then we push the Bindingo application That is actually that is actually the logic which the errand Executes when you say please execute the the smoke tests Yeah, we push an instance of that Bindingo application and we wait until the service is ready And we wait until the application is ready so Deploying the application and creating the service that is something that takes some minutes and that's why it's executed in parallel But once both is finished we bind that service to service instance And we bind that service to the Bindingo application we restart the Bindingo application and then we Call this endpoint the status endpoint of the Bindingo application We just deployed and then the Bindingo app writes a data record in the database and say Status code 200 everything went fine or there was something wrong Maybe check whether the cloud found the application security groups has been created Another test case we execute is we check that each service binding gets its dedicated credentials So how does this work? So again, we create a service instance. We push a Bindingo application and we push a second Bindingo application Then we wait until everything is finished. So the service is ready and both both Bindingo applications are running and Then we bind both applications to the same service instance We we bind both applications to the same service instance We have to restart both applications and And then we check the status of both application and we expect both requests to succeed next step We unbind the first thing the first application. We restart the first application and Again, we execute the status endpoint and we expect this and we execute the status endpoint of the second one and what we expect now is that the first request fails with a status code 500 and The second request succeeds So that way we ensure that Once we delete a service binding this credential that is not usable anymore But all the other credentials that can still be used And last but not least we unbind the second one. We restart the second one and we expect both calls to fail Another interesting test case is how we use that Bindingo API to test the backups and the restores So we use the service instance from the previous test case Usually we do that because provisioning a service takes long. So we try to may reuse instances as much as possible We again we bind the Bindingo application to the instance We check whether the binding works just to be sure and then we use the put endpoint to insert To insert a record and then the put endpoint actually expects a body Which contains a test a key value pair? And then that one in case because it's a postgres instance The test will actually create a table within the database and insert That that data into the table and then we check with the exists endpoint whether this this right request from the previous put request has been successfully inserted and once this has been verified we trigger a backup and Then we wait until the backup is successfully created. So in that case it only takes a few seconds because it's a quite small database Once a backup has been created successfully. We delete that data record from from the database again Here we are using postgres. It could also be in Redis or or a MongoDB The logic is then the same Once the record is deleted we verify that it's really gone as so we expect this get record get request to fail and Then we restore the backup and Then we wait until the backup is the restores successfully created and Then we check again whether the key is back again And we expect that check to to succeed because we restore to backup and expect the backup to have the data Another test case is the plan update Again, we used the service instance from the previous test case We ensure the binding is still present and then we ensure the binding still works and Then we insert the data again Then we check the data has been inserted successfully then we trigger a backup even we doing a plan update We're triggering a backup now We wait until the backup is succeeded Then then we make a plan update We wait until the plan update is succeeded We check that the service instance still works the data can be written into the instance We check that the data that has been inserted before the plan update is still in the instance after the plan update We delete the test record We ensure the record that has been deleted and to restore the backup and After that we ensure that the backup that has been done with the old instance can still be restored on the new instance These are the basic tests which are executed by the smoke test. So whenever a customer runs Wash run Aaron, please test this the service installation. These are the test cases that are executed It turned out that we can use the spinning go application to even test more things like for example We have deployed our data services in version one to the customer and now we want to update the whole setup to a new version So we can use this this this test logic to test that update and how that works is we deploy the one version of the data service solution then Yes service services deployed in version X then we create a service instance Then again, we push the binding application. We wait until everything is ready Then we put a test record into the database We trigger a backup We wait until the backup is succeeded And then we update the management components to the new version management components is meant the service broker itself Or did the those 10 micro services that are involved in managing the service instances itself? Once these management components are updated to the new version we update the service instances to the new version Once the update is finished we check that all service instances that we have provisioned Before the update are still working by calling the status endpoint and again the status endpoint will then insert some data into the services and Maybe insert a message if it's a rabbit MQ or something like that And we check that That the data is still in the in the database that we have Inserted before the update Then we delete the data record Then we ensure it's deleted and then we restore the backup again So we ensure that the backup that has been created with the old version of the data Data service version can be restored on the new version another thing this Bindingo API can be used for is load tests actually When you implement those for endpoints, I've showed you previously you get a load a simple load tests I'm for free. So actually you then have that API and you can actually say please run a run a load test load test on That service instance bound to that application and you can specify for how long the load test should run 1200 seconds in that example how many insert operations should be done per seconds And how how large the the data should be that is gets inserted into the service instance The result of that request is a report. So You send that request and you immediately get back one Report and this report contains an ID which identifies the load test Which is currently run and using that ID you can check what's the current status of the load tests What are the current statistic and? There's also a flag which indicates whether the load test is still running or whether it's finished Let's talk about the future plans we have With these test test framework currently we're using the CF CLI to to actually talk to Cloud Foundry and Yeah, test the things from the user perspective a colleague of us came up with a service broker CLI and the idea of that service broker a Service broker CLI is to mimic DCF CLI so that in theory we can just replace the CLI within our tests and then That's it. We then don't test We are Cloud Foundry anymore. We then test the service broker API directly which might gain us some some speed when it comes to feedback cycles, so tests should run faster than It should run faster because we don't have to deploy the Binding Applications every time we run a test Another kind of tests we have is We we have Integration tests or what we call Bosch release tests So because we deploying our data services with Bosch we for example have a Bosch release for Postgres for RabbitMQ and for every service we offer and We have a massive test suite that actually Runs against that Bosch release and it works like that it provisions that Bosch release without using a broker or anything else It provisions a Bosch release and it checks that for example this Postgres cluster is working like expected even in failure scenarios so it manipulates the IP table rules on the VMs and Simulates a split-brain situation for example and checks that the cluster Doesn't end up in a split-brain situation and that that when resolving this split-brain situation The cluster again still joins and repairs itself. It also checks what happens if the master dies So it crushes the master and ensures that that the failover happens and that the new master can accept data and that the data is replicated to the remaining slaves Currently building a new data service building this kind of test is It's quite hard and it's the major major task. We have to do when for example integrating a new data service into the marketplace Yeah, that is that is the the kind of work we struggle most with or we invest most work into and the idea is to have to use the spending of application and To have a generic Bosch release test So we just have to specify a few parameters for example In in case of a split-brain scenario a postgres cluster should behave like that and that And the couch to be cluster should behave like that and that Yeah, that's it the idea with the Bosch release tests and how we can make use of How we can make use of the Bindinger API and testing the Bosch releases itself in failure scenarios Yeah, that's pretty much it any questions in retrospective. How much sense did it make? To have this one generic app with all the test cases in besides having the test cases in the service brokers Or the services themselves and just calling them from there because I think it's It's very specific to the service what you actually want to test I mean you'd only talked about data services made that that Gives a smaller context or a smaller scope But we for example we have a lot of different services starting from logging to messaging to databases to Think about a spring cloud suite or whatever. Yeah, so There's a trade-off between how generic is it and how how practicable is it? We also have applied that test suite to an Elk stack So we have a Elk stack as a service and we already learned that that API Doesn't yeah, it's hard to you to abstract the log messaging thing into that API So there might be some limitations But I guess for the use cases we have or for the use cases we aim for in the next few years I guess that can save us a lot of time Sure, there can be some services or some some kind of software is that that doesn't fit into that abstraction We came up with within that API Yeah, so but Another idea is that instead I mean of using of testing or this errand with cloud from before example We can switch cloud from we could beneath us for example, so It allows us to really test against a real system, right? It's not only like faking something, but you can actually test against cloud foundry switch it to kubernetes and so on so This makes it possible and if you just use the broker test for example It's fine, but It will not work in the long run because you need to verify that works of cloud for the kubernetes Another question I have two questions one was regarding your smoke test during You provision this app bind this app right Create some internet entries in the database and so on and later on at the end You delete it the first app and then you test it if this first up is really not accessible to the instance and I Didn't got it why you do this even you have two instances running all right It is one is not inaccessible to the database What why so that's not clear to me. That's the first one and the other one is Could you at the end of this test send somewhere the results or where do you proceed with this to? Was able to do the first question put it somewhere with rest or something else. Yeah, the first question is Wasn't this dedicated a credential test and why we do create deployed two apps and why we do create two bindings The idea is that when you bind an application to a to a service that this application gets a credential set to access the service And we have the requirements for most of our services that we bind the same service to a different application, right? We somehow want to have a different credential set So we have two two credentials for each of the applications and the idea is that we When we delete one credential set by unbinding the application, right? We want this credential set to be not used anymore It can be that for example one application that make use of your service Has been attacked by a bind hacker or something like that and the credentials has been exposed to that hacker And then you want this credential set to be unusable. So you unbind your application and then you we just ensure that's really Implemented in a way that you can't use make use of these credentials anymore once it has been unbound Right, right and the second question is The foot for the first question we could sometimes also optimize of course that we don't push two or three instances of this Finding go app because currently we are reading V caps levels for example this environment variables, but you could also pass credentials via User a URL parameters for example in the future so we can just use one binding go app for different scenarios Where we would normally push three or four binding go apps So we okay, and the second question is We we put the results of the smoke test into the log directory of the smoke test off the smoke test aren't and Then we stream them to a zeus lock endpoint, and then we can analyze them there I don't think we have time for other questions, but you can't find us at the booth So we are rounded our booth if you have any questions or want to discuss something just come over there and grab some free t-shirts or something like that Thanks