 So, hello everybody. I'm Oliver. I work for Any9s. And today I want to talk about the Open Service Broker API and especially how we at Any9s build the Service Broker, how we test the Service Broker, and also how we ensure that once we deploy the Service Broker into production environment, it works and functions within that environment. So that means that once we deploy a new release of a Service Broker into, let's say, our internal systems or even systems that are running on-premise on customer side, that we have a way to ensure that this Service Broker is running even after the deployment. So we have some kind of test suite that run before the deployment, before the release phase, but also test suites that run after the deployment. But before we start into how we test it, I want to shortly describe how our journey with CloudFoundy at Any9s started and how we came to developing data services and implementing the Open Service Broker API. So actually our CloudFoundy journey started roughly about six years ago when we started to build a public platform for end users who wanted to deploy their application to a platform and then make it available on the Internet. So it was a shared platform that means different parties could deploy their application to the same platform and they shared the resources but also the costs. And if you have a look at the Open Source CloudFoundy, which is a major component of our public platform, so we are based on the Open Source CloudFoundy, then you know that this Open Source CloudFoundy is very good at running unstateless applications. So you can deploy those applications very easy to CloudFoundy. You can scale them. You can update them. And if something breaks, CloudFoundy will take care to restart those applications. So CloudFoundy is very good at running those stateless web applications. But if you deploy the CF deployment, then you see that you need more than just a run time to run your web applications. Probably your web application needs a database. Like a Postgres in MySQL or maybe a MongoDB or maybe a search server like Elasticsearch or Message Glue like RabbitMQ. For that purpose, CloudFoundy provides a marketplace. So the developers who deploy their application to CloudFoundy can look up this marketplace and see what or which of those data services, databases are offered in that CloudFoundy environment. And then they pick a data service like, for example, they need a MySQL service. They say, I need a MySQL service of that size with that amount of nodes. And a few minutes later, they have that cluster up and running. Then they can take that cluster and bind it to application. And then the application will be able to find credentials to that service instance within the environment variables. But the thing is, when you set up the CF deployment, you will notice that this marketplace is empty. To get something into that marketplace, you as a platform administrator or a platform provider has to build something that is called a service broker. And the service broker is a component that actually takes requests from the platform and takes care that the user becomes a database setup. To write a service broker and to plug it into your CloudFoundy environment, you have to fulfill a specific API specification. And it's called the Open Service Broker API specification. The Open Service Broker API specification is quite lean, so there's a couple of methods you have to implement. Like, for example, create a database or create a service, whatever service it is, update the service to a bigger plan so that you can provide more capacity to your end users. Or create a credential set so that the application can use that credential set to access the database. So it's a quite lean interface, maybe five methods you have to implement. And it has been born in context of the CloudFoundy community. So CloudFoundy was the first project who came up with that interface and who implemented that interface. But today, there are other technologies who adapted that specification. So for example, if you have an Open Service Broker API, a compliant service broker, you can also adapt or yet connect those service brokers to a Kubernetes environment or to a Kubernetes similar environment like an Reddit OpenShift. We even have some customers who don't have a platform at all or who don't consume the service broker from a platform at all. Instead, they came up with their own customer panel and directly talked to the service broker API. And that's how it looks today. So the context in which the service broker operates and in which the service broker has to fulfill its function, it's getting more diverse because more platforms adopting those service broker standards. And that's what also makes it quite hard to test those service broker implementations. To have a look how the marketplace looks like, most probably you already know, but for those who don't saw a CloudFoundy marketplace yet, that is how it looks like. So you say CF marketplace and you get all the services that are provided across all the brokers that are registered in that CloudFoundy endpoint. And you see there are a couple of services. And with each service, there are also a couple of service plans which define how big the service instance will be that you book and also what topology the service instance will have. Like for example, whether you have a single node MySQL cluster or that you have a three-node cluster that just have some kind of replication that when one node fails, the other nodes take over the workload. So if you see this open service broker API specification, you might think, oh, that looks easy. I go and implement the service broker node. But when we started that, we thought the same thing, but then we learned a few lessons, especially our enterprise customers teach us that they're a bit more to take care of than just provisioning a database. We learned that this day two operation is very hard and that also these day two operations are very hard to ensure that it's working when we update our service offerings. So what are typical operations that might not be in your mind when you're starting implementing the open service broker API? So what most of our customer demanded from us was that the services are running on dedicated virtual machines. So even containers gain more and more traction now. Still, enterprises feel better and saver if they know that this kind of stateful workload is running in a dedicated virtual machine because it's still the better isolation, provides still the better noisy hood protection. So yeah, but the problem is with dedicated virtual machine, they're quite costly. So that means that you also have to provision them on demand. So you don't want to spin up 100 Postgres clusters and wait till you sold them. You want to spin them up whenever a new user or a new developer books that Postgres cluster. Then you want those virtual machines be created on demand. Although this makes testing quite hard because creating virtual machines takes up to five minutes and if you have a cluster, you have to create more than one virtual machine. Also what is important for us is that the service brokers run in on-premise environments so that customer can install it in their own environment. That's not only running on our public pass. So when we think of on-premise environment, this also plays a role when testing the thing because we don't know what the circumstances are in which those brokers will run. If you know PCF, the commercial CloudTronG offering from Pivotal, you maybe also know the Pivotal Ops Manager. Our goal is to provide the installation method for PCF and for open-source CloudFoundly or for Bosch directly. Also, this creates more diversity in the environments which we want our brokers to run. We want the broker to be or the automation to be infrastructure independent. It should be run on OpenStack, should be deployable on vSphere. Should be platform independent. So it means it should be consumable from CloudFoundly or from Kubernetes. The whole platform should be highly available. So that means the provisioning API itself, the service broker API itself, should be highly available but also the cluster which gets created. Also we want to have backups of our service instances. Backups are quite crucial and the backup logic should be tested carefully. We also want capacity updates. So that is actually a thing that is covered in the Open Service Broker API. In the API terminology, that is called plan updates. So if you observe that your disk is running out of space, you want to upgrade your service instance so that you have more space available before it actually crashes. One of the most important things we take very really care of is that when we want to implement or bring a new data service into a CloudFoundly marketplace, like for example we decide tomorrow that we want a couch to be or a Kafka within our CloudFoundly marketplace, we should be able to provide the automation of the data service very fast. So that means we came up with a framework that allows us to bring such new kind of automations for new data services into the CloudFoundly marketplace. But we also came up with a testing framework. So we have a framework to build those automations but we also have a framework to come up with a test suite for those automations which makes reuse of the existing logic as much as possible. Let's have a look how we fulfilled those goals. We have implemented the microservice architecture which plays together to provision virtual machines and installing Postgres and so on. It's actually based on Bosch. But on top of Bosch, if you have a look from the top view, based on Bosch there are still 10 more microservices which play together to fulfill those requirements and if you zoom in you see that it's not only about these 10 microservices but there are more than 40 components playing together to manage the life cycle of those databases. And when it comes to this microservice architecture and microservices are a quite new thing so we had to think about how we test those solutions, those service brokers. And if we have a look at the test pyramid and we tried it to the beginning to be complied to that test pyramid. The test pyramid says that you should have as much logic tested by unit tests then you have some kind of integration tests ensuring that components play together quite good. And then you have end-to-end tests or manual tests which are tests from the end-user perspective. So you're testing your whole stack like a user would click through your product. And we learned quite fast that in a system like that it's hard to ensure a certain quality by just providing unit tests. We did that and we ended up in a lot of manual testing because we lost trust in our test suite. So we focused on those end-to-end tests because in a system like that where so many parts played together for us it seemed to be obvious that we need more end-to-end tests and that's why we came up with a framework that we used to write end-to-end tests for new service automations quite easy. And that is what this talk is actually about. I want to show you how our end-to-end test framework looks like. The goal of this end-to-end test suite is that we want to reuse existing test logic as much as possible. So when we, for example, decide tomorrow that we want a CouchDB within our Cloud Foundry marketplace we want this CouchDB to be very fast in that marketplace that the whole life cycle is automated quite fast and also that we have a test suite that we can execute against the production environment and then that test suite will tell us whether things are working or not. And, for example, if we have the test suite and we now don't add a new data service to our product but we add a new test case like, for example, we figured out that the particular scenario must be tested because it fails quite often so we add a new test case to that test framework and the ideas that then that test case becomes available to all the data service implementations. And also those test suites should be executable by the customer. Some customer environments, they usually differ so some customer environments are running on that version of our automation some customer environments running on another version. And so the idea is that the customer by himself can verify, for example, that when he updates from that version to that version that everything is working and that the update will succeed. So another goal is that those test suites can be executed by the customer as a feature of the product and that the customer can verify that it works. If you're from a level of Bosch, that is a piece of a Bosch deployment manifest we are using to specify and configure our smoke test. So currently we are calling these tests smoke tests so you can actually configure which services you want to test and which plans of those services you want to test and you also see that this test suite has some feature flags so that you can activate and deactivate some of those tests manually so the customer itself can decide which test cases he wants to execute on its platform. You also see that you can specify a CloudFoundry endpoint so that means the test suite will actually perform the test cases that the user does. The test suite is using DCF CLI and actually creating a service, waiting until the service is ready and then performing those tests. A nice thing is because the problem is that those tests take quite long so it waits until the service is ready and because it's provisioned on demand it can take up to five minutes until the service instance is running and if you want to test a couple of service plans it sums up and it's quite long running so that's why we decided that it should be possible to somehow parallelize those test runs and that's what you can configure here at the top that you can configure. I want to run five tests in parallel. So let's have a look at which test cases we want to test and after that we have a look at which interface we came up to with all the service specific things and that we can come up with a generic test suite that just uses an interface to test all the service implementations. So the test cases we currently test is we are testing that a service can be created. We test that a service can be bound to an application and that the credentials are actually usable so that means that the application can take the credentials connect to the database and then write some data in it that's meant by apps can access the service binding. We check that once an application doesn't need the service binding anymore because we want to delete it we ensure that the credentials are not usable anymore after the binding has been deleted. For some services we check the arbitrary parameters for example the user can say as you have created service and he can specify some custom user parameters like for example what Postgres plugins should be installed in the Postgres instance. There are some tests that are quite service specific but should also be tested. Then we test plan updates and we want in these tests we ensure that after plan update the data of the database is still there and still accessible also we test backups and restores. So let's have a look how we try to come up with a generic framework that makes those tests as generic as possible. The first attempt was we wrote an application which we called service binding checker it was actually our first attempt it reads credentials from recap services and then it tries to make use of those credentials and access the service and make use of the service. This application was written in Ruby and because we had this requirement to run a lot of tests in parallel we also had to deploy a lot of those service binding checker apps into a Cloud Foundry runtime so it ended up to be very memory consuming. So that's why we decided to rewrite that application and go now we call it Binding Go like binding checker app and go and that is actually the interface of that Binding Go so whenever we add the new service like CouchDB or Kafka we have to implement that interface for this particular service. But let's have a look at that interface what does it mean? So the first endpoint actually says that when someone calls this endpoint the app should return whether the service instance is functional. So for example when we talk about a Postgres what happens if someone calls slash status and this app is bound to a Postgres is that the app creates a table in that Postgres instance it inserts a record within that table and then it's delete the record, delete the table again and then it returns HTTP status 200 if everything worked out. For RabbitMQ it looks a bit different so for RabbitMQ we create a queue we insert a message, we consume that message and we delete the queue and once that worked we return status code 200 that everything is fine and that the RabbitMQ seems to be functional. The only difference between let's say a RabbitMQ and the Postgres is in the URL you specify which kind of service it is. Beside of that it's you're calling the status endpoint for each for different kind of services and then the upper or the logic, the test logic which relies on that interface is quite generic. Let's have a look at the other three endpoints because they're quite important when it comes to testing updates and plan updates or backups. So the next endpoint is that we want to insert a record into the database or into the data service. So it's the put data service ID and this is about at that point we're going to insert a data into our Postgres for example so we're going to create the table again and then we insert the record. But we don't delete it now. For that purpose we have a third endpoint where we say delete that record and then we specify the ID we have chosen above when we created it and then the record is gone. Also we have a method to check whether a data record that we created with the put endpoint is still in the database. For every MQ it's like that so you create a Q and you insert a message into that Q and then whenever you check whether the Q is still in the or the message is still in the Q you go and check is the message still in the Q. So let's see which how the test cases looks like with that four method that encapsulated all the service specific things. So for example you want to test or implement a new Kafka service for your cloud from your marketplace or platform marketplace you would implement this method and then you can reuse the test suite and perform a couple of test cases without any further effort. So let's see how the test cases work and how the test cases make use of this interface. So the first test case is that an app can access a service. So the generic algorithm, generic means it doesn't matter which kind of server it is like whether it's Postgres or Elasticsearch or Redis is the test suite creates a service instance with create service. After that it pushes the Bindingoo application to Cloud Foundry. It does this in parallel because both operation takes some time and then it waits until the service is ready and then it waits until the app is ready. It binds the app to the service instance. It restarts the app and then it checks the status endpoint and then the status endpoint will insert some data into the database or into the message queue and then once this endpoint returns status code 200 we are sure the database seems to be okay. Another thing, another test case is we want to test that each service has dedicated credentials or that each app that is bound to a service gets its dedicated credentials. How that works is we create a service again. We push one app. We push a second app. We wait until everything is ready so service instance should be ready and both apps should be ready. Then we bind both apps to the service instance. We restart both apps and we check that both apps are working so it means that both credentials are usable. Then we unbind the first app. We restart the first app and then we check both app and we expect the first app or the first call to fail and the second call to succeed so that we can somehow ensure that both applications have a different set of credentials and that once we unbind the app the credential set is not usable anymore. The last thing we do in that test case is we unbind the second app and we expect both calls to fail now. Another interesting test case is how we test backups and restore. Generic logic for that test case would be like that so instead of creating a new instance reusing an instance we provision in the last test case then we bind an app to the instance. We check that everything is okay and now we use the put endpoint and we insert some test data and again it doesn't matter whether it's a Postgres or a RabbitMQ the algorithm is the same. We ensure that the record has been inserted successfully by just checking whether it's in the database or not and then we trigger a backup so actually we have a backup API we trigger the backup and then we wait until the backup has been created successfully then we delete the test value and we ensure that it's actually deleted by just looking it up and saying okay it's not here anymore and then we restore the backup and we wait until the restore is finished and then we check whether the value is in the database again and here we expect that the value should be here again and we expect the status code 200. It's a quite similar scenario we can use the same method to test plan updates the point with the plan update is that we want to ensure that after the plan update the data is still in the database so we don't create a new service instance we reuse the existing service instance again we bind the Bindingoo app we check that everything is working there are some records in it we trigger a backup we wait until it's finished even it's a plan update we trigger a backup because we want to ensure that after the plan update we can restore the old backup which has been created by the old plan we update to the bigger plan we wait until the update is finished and then we check that the database is working after the update and we check whether the data is still in the database so we expect the data to be in the database even after the plan update then we delete the record which ensure record has been deleted and we restore the backup and this is the backup we have created with the old plan and we just want to ensure that it still works after the plan update another interesting thing is what we can test with that simple interface is we can test platform updates for example we release a new product version of our platform so that means that we update the environment and we want to make sure that the update actually worked so the test algorithm for that is we have the service or the platform deployed in the version X so that not only means the database itself but also the management components like the broker then we create a service instance based on that old version we push the application we wait until everything is ready we put in some test data we trigger a backup we wait until the backup is successful and then we update the service broker to a new version and we update the service instances to a new version we check that the service instances is still running after the update we insert some data that we delete some data we check that the deletion has been successfully and then we restore the backup that has been created with the old version of the platform and ensure that it still can be restored we have some further plans with that testing framework a colleague of mine came up with a service broker CLI and the idea of the service broker CLI is to mimic the CF CLI so the idea is to replace the CF CLI within the test suite so we don't have to rely on CF anymore and can reuse the test logic also for Kubernetes I guess I'm running out of time so that's a good point for questions in theory it works with every IIS that works for Bosch so underneath we are using Bosch as an automation tool so that means OpenStack vSphere we have some experience the most experience we have with AWS and with vSphere that is what our customers are using the most but we also have some setups on OpenStack I guess and on Azure and on Alibaba Cloud I'm not sure whether I can get the question so you're talking about multiple platforms and one service broker so you have one broker and you want to deploy the service instances to different IIS that's something that is not possible out of the box but Bosch provides a concept that is called multi-CPI so if that concept should be feasible quite easy with the multi-CPI concept you can actually configure Bosch to be able to deploy to different IIS targets so based on that feature it would be quite easy I guess any further questions? okay thank you