 Ok, więc moja nazwa jest Przemysław Godek, razem z moją kolegą Bartosz Żurkowski. Jesteśmy w samochodzie w Samsung Research Poland, i będziemy próbować pokazać ci, co mieliśmy, co mieliśmy, co mieliśmy w naszym pracach. Więc krótka wypowiedź z naszym prezentacjom. Więc najpierw pomyślejmy, co to prawda jest, co to jest misja prawda, i co są te kolorzy, i po to zainteresujemy się na koniec, co mieliśmy. Więc najpierw to jest database-as-a-service dla OpenStack. To daje pełna cyfrowa dla databases. Uwielbiam się, że stoi 11 database engines, które są both relacjonalne i nie relacjonalne, a także dla jednej instancji, oczywiście, a także dla klasystów. To daje datastore agnosticznego API na urządzenie database, i jest zbudowany na całym komponacji open-stark. Z drogi synergii z NOVA, Cinder, Swift, Glance i Neutron. A więc, kiedy zobaczysz na górę, możesz zobaczyć, że największy i niebezpieczny, obywatelny obywatel database jest w porządzeniu z prawdą, a niektórzy responsiby DBI jeszcze nie poznały, które nie powinniśmy poznać. A więc, chodźmy przez chwilę o obywatelni. Przede wszystkim, możesz prowizjonować instancje, rewizować jego volumen lub smaku. Możesz stworzyć database i używanie ich. Możesz rekonfigurarować database i również rozpoznać konfigurację defaultu. Możesz odwrócić database, wyraźnie database, wyraźnie logi, uruchomić bezpieczność z grupy bezpiecznej. I też, co jest najważniejsze, czy jest poddajny prowizjonować klasystów i też klasystów horizontalowych. I to też dla jednej instancji, replikacja i rewolucji. Więc zrozumieliśmy podczas punktu, czy klasystów, do wyraźnienia wielkiego database jako serwis. Przede wszystkim, te dwa rzeczy są prowadzone przez obywatelni, automatycznie automatyczne cyklu managementu i klasystów database. Kolejny punkt jest podczas kontaktu dla database. Więc zobaczmy na ten początkowy wyraźny database prowadzony przez prawdę. Więc nie masz podczas kontaktu. Więc aplikacje muszą być zauważone od klasystów, do wyraźnienia klasystów. Muszą uruchomić do wyraźnienia specjalnych policji, tak jak rewolucja, muszą być podczas kontaktu. I to to, co ma obywatelni klasystów, nie jest naprawdę zakończone przez ktoś, który rekoruje database jako serwis. Więc dodajmy database proxie. Wydaje mi się, że od klasystów od klasystów od klasystów od klasystów od klasystów. I mamy bardzo problemy, który jest jedynym punktem do wyraźnienia klasystów. Jakoś na marketu teraz db proxie, solucje, tak jak proxie, max scale, ha proxie, nie pomaga z tej sytuacji, ale szczęście to jest bardzo proste solucja. Być zakończone zakończone to jest łatwo zrobić takim postaczeniem zakończone z sebelą po prostu strejє aktyw pasyw setup i synchronizercie those instances with lifed and in case of the failure failure Fajna transakcja może zostać złożona i tak dalej, ale to jest dytalna, independentna. So next thing is it's normal case that you want to scale out your cluster on some point and with adding or replacing nodes, your proxy need to be reconfigured and to take over the responsible for a reconfiguration of proxy. We have gotten some kind of the registration. So each node after start up register itself and this notifications are pulled by proxy using console could be also used, but in our case we have done it this way. So next part Bartek will take. In production environments it is crucial to have continuous insight into your database performance and behavior. So we want our database metrics to be kept in a centralized entity monitoring service, so we could visualize this metric, so we could conduct anomaly detection, root cost analytics and react proactively to any issues that may indicate that something serious is happening and may propagate these failures. On the right this diagram presents how we approach implementing monitoring scenario into Trove. So our deployment units, all Trove instances are agent oriented. We already employ guest agent, which is a proxy between control plane and data plane, so it allows you to speak between control plane is able to speak with the database service guest agent. Next to the guest agent we install monaska agent. Monaska agent periodically collects metrics from both operating systems, so metrics such as CPU, memory, disk usage, as well as database metrics and then it forwards these metrics to the monaska monitoring sync in the control plane. It is pushed via Kafka. And from that point we may access measurements in monaska and in example visualize the data in Grafana or also monaska provides you with a convenient API for defining alerts. So in example we may collect metrics about disk usage and we may define alert that if disk usage is exceeding 80%, then alert will be emitted and this alert may be then consumed by other integrated services. So a short recap of our production case study. We already know that automated life cycle management and clustered databases are provided by Trove out of the box. We know that database proxy is crucial and important for seamless communication with database service. And we have discussed the monitoring scenario, so how to gather metrics and use them for good purposes. So the final point of this study is how to introduce autonomous properties such as auto scaling, auto healing, auto optimization into database deployments. The concept of autonomic control loop was initiated by IBM around 2001 and this is a concept which addresses problem of growing complexity spreading in the field of computing systems. Our systems are growing every day, they are becoming more complex. So imagine that we have environment which deploys 1000 of database instances and how to manage this life cycle. This scale is unmanageable for human operators. They need some level of intelligence built into the infrastructure to support them in crisis situation, to automate decision making and provide some feedback about how to solve the problems. So we want to introduce, we desire the properties in our deployments such as self-configuration, self-healing, self-optimization. And the final goal is to offload human operators from burden of administering complex system, in example databases. IBM in its white paper that they published in 2001 compared this autonomous concept to human body because on a daily basis we do not have to think and control how we breathe, our blood pressure, our immune system. All of that happens automatically. So we can focus on more important tasks such as participating OpenStack Summit, for example. Very often autonomic control loops are modeled around architecture called MAPE that I will describe in the following slide. This stands for Monitor, Analyze, Plan, Execute. These are four required phases that we need to implement in order to deliver autonomous behaviors. So we start with a managed resource. This might be any type of entity, whether it's virtual machine, trough cluster, trough instance, compute host, neutron port, anything that you can imagine. And then we have sensors which are monitoring agents in different layers of infrastructure. And they combine metrics gathered from managed resource with metrics gathered from these other infrastructure layers. Metrics are then being analyzed and based on static thresholds or machine learning predictions, others are emitted for subsequent phases. We analyze metrics and alerts and determine root cause of problems. The root cause is then being used to determine preventive action plan. So in example we may monitor CPU usage or memory usage. We may emit alerts if usage is too large. Then we may determine that the root cause is too low computational power of VM and the preventive action plan may be to scale the flavor of the virtual machine. And action plan must be executed by orchestration between different infrastructure services. And this is where effectors come into play and affect managed resource. So the loop is closed from monitoring managed resource with sensors trough determining root cause of problems, determining action plan and activating it on to managed resource. Good news is that over time OpenStack grown to the point that we already have all building blocks to implement this architecture. So we may have sensors such as monaska agent, Zabix, Nagios. Then we have monitor, monaska. We may then use Vittrush as root cause analytics service which will create virtual entity graph of your whole deployment and it will also inject alerts into this entity graph and based on some defined manifest, YAML policies it will be able to take some actions. Vittrush has also integration with Mistral so it allows you to trigger Mistral workflows and Mistral is workflow manager, allows you to orchestrate different OpenStack services and define these behaviors via policy files in YAML. And effectors, of course, are OpenStack services and Ansible. Okay, so this is how this all production case study translates into overall solution architecture. Starting from the left we have applications communicating with database cluster via database proxy and this communication happens via single database endpoint providing seamless communication. We have multiple instances of database proxy and multiple cluster nodes for better performance and better capacity and data resiliency. Cluster nodes emit metrics to centralize monitoring service and monitoring service allows for visualizations and defining alerts. Alerts are consumed by anomaly detection services which is responsible for determining the root cause and action plan. Action plan is then passed to the workflow manager and action orchestrator and actuates this plan onto the managed resource in particular the database service. And the final building block is the service registry. What Przemek already mentioned, anytime cluster topology changes we want to dynamically reflect these changes in our proxy layer. So we will always register current cluster topology in service registry and database proxy will observe service registry to any changes it will update in example balancing sets. To give you better intuition about how these different components operate this is an example of V-Trash entity graph. So V-Trash will provide you with entity graph consisting of vertices each of which is different type of open-stack resource. So starting from the top we have trough cluster then we have three trough instances so nodes of this cluster then we have virtual machine attached to each trough instance along with a cinder volume for each of them and then all these virtual machines are connected to one compute host. So let's say that our compute host fails and as you might notice we have two compute hosts here so three database clusters attached to two compute hosts. Now let's say that on one compute host something terrible happens. It fails. V-Trash may identify that all virtual machines attached to this compute host will fail as well. And it will propagate this failure to trough instances and finally trough cluster. So thanks to V-Trash we are always informed which entities are affected by the failure and we may react practically and quickly. And also it allows you to determine the root cause which in that case was a compute host failure. We just need to traverse the graph in reverse. In V-Trash we may define templates that will describe behaviors in response to detected problems. So in this case we define a template with two entities trough instance and trough alert. We also may define relationships between them in example that trough cluster consists of trough instances and that there may be disk space alert on some instance. And then we define a condition that if there is low disk alert on some trough instance and this trough instance is a member of cluster and there is alert on this cluster propagated from the instances then we should execute workflow in Mistral as response. We want to resize cluster capacity. On the Mistral side we define another policy which defines how this workflow should work. This is a list of tasks that should be executed in correct order in order to adjust cluster capacity. So we see that firstly we calculate new volume size by fetching current size and multiplying it by 2. Then we resize cluster volume by this calculated volume size and then we wait for cluster resize to complete. Ok, so now it's time for the demo. This is an outline. This is quite a complicated demo. So I hope everything will work. So firstly we will create Galera cluster consisting of three nodes via trough. Then I will present you monitoring dashboard in Grafana visualizing all measurements gathered from cluster nodes. I will also show you low disk space alert definitions in Monaska. Then I will present you bitrush entity graphs so we will see the vertices that represent trough entities. I will present you bitrush template once again with low disk space alarm scenario. I will then show you Mistral workflow responsible for scaling cluster capacity. And I will run sysbench to populate some data into the cluster. And when this database cluster will be overloaded with data we will see that we will see Monaska alerts being emitted and also Mistral workflow triggered by bitrush. Sorry for poor condition in a few first minutes. So we create database cluster of Galera consisting of three nodes. Then we will wait until virtual machines for these clusters are up. All instances, all virtual machines are now active. And then we will wait until trough instances are also active. I could actually skip this, but they are active now. Great. Let's check the cluster status now. I included these steps and we wait so long so you could see how trough API works, how it looks like. So task name of the cluster is now known which indicates that cluster stopped to provision. We will once again list trough instances to grab ID of arbitrary instance because you want to create database for use by sysbench. We do not need to create database and user on each instance separately because this is Galera replication, it will be handled by database engine. Now we are moving to Grafana dashboard. And as you can see it will display resource metrics, CPU, disk and RAM usage for each cluster node and going back to the terminal we will see alert definitions in monaska. We define that if disk space usage on VDB device to which data is being stored if it exceeds 50% then the alert should be raised. And we will now see that these alerts are not raised yet because we haven't compiled data yet. Now going to vitrash dashboard. Let's see how entity graph looks like. And the fact is that it looks like mess. At the beginning. But you will soon see that I can quickly order this graph into something beautiful. In the end it will actually take the shape of the stingray logo of trough. This is how it looks finally. This is the same that I presented you in the intuition example. We see cluster with trough instances. We have separate vertices for each virtual machine and cinder volumes that are attached to them and also the compute host. Virtual machines are contained in this host. There are also neutron ports and neutron networks. In this slide you might see how great opportunities vitrash gives you by providing this entity graph. Because you may identify issues on many layers. Whether it's networking layer or it's virtualization layer host level. And the vitrash template that we already saw. It is called trough cluster capacity autoscaling and in this policy we define entities in the graph and relationships. This name of the alert that we saw when I demonstrated you the monaska alert definitions. This is the same name. This is how we connect vitrash to this monaska alerts. Also relationships as I already said that cluster contains instances may emit low disk space alert and then scenario. Which will trigger vitrash workflow in response to alert. We now move to the workflow manager dashboard and we will see workflow definition that we referred from vitrash template. This is what vitrash will call in response to detected problems in cluster capacity. So we have definition of how to calculate new volume size, how to trigger resizing of cluster volume and how to wait until this resize is complete. Moving back to the terminal. We will now start sysbench to populate data into the cluster. So now it started inserting database records into the cluster. After a while we may notice that monaska raised alerts on all cluster nodes. And these alerts will very soon appear in vitrash entity graph. Just wait for it. So first two alerts emitted on two cluster nodes and we will now see that this alert is propagated to cluster vertex. And when alert is raised on cluster vertex vitrash will trigger this mistral workflow to be executed. And now we might see that mistral workflow was triggered to scale the cluster up. We may also verify task executions and we now see all these tasks that were defined in the template. So calculating new volume size resizing cluster volume resizing for cluster resize. This is particularly calculating volume size task and actually this output is hidden but it says that new cluster volume size is twice as much as previously. Now if we list 12 clusters we will see that this cluster switched into resize volume state and that each cluster instance is being resized in a rolling fashion. So only one cluster node at time. And this monaska graph shows that the disk space usage rises to 50 51%. And after that we see this decrease of disk space usage because we don't want to resize strategy. And we may see that on each cluster node. So this is for the demo. Unfortunately we didn't manage to prepare full proof of concept because this is proof of concept. We didn't show you how we set up proxy SQL for database proxy and how we handle seamless database communication during program, thanks to proxy we may continuously push traffic to database service and even though this cluster is in resize state we may still get responses from the cluster and conduct queries without any interruption. We also wanted to present you with failover scenarios such as killing virtual machines on which the cluster is deployed or killing compute host but there was also no time but we managed to prepare some automation with Hit and Murano to wrap these things up and provide you with you know because deploying all of this stuff is a hard task because you have to provision cluster you have to set up monaskalors you have to set up vitrush templates and you wanted to use Hit and Murano to just give you this one beautiful icon of database solution that you can click and have your solution deployed. I would like to thank some people for helping me with the demo I would like to thank Ifat and Muhammad for helping me fixing a serious bug in vitrush template during the summit yesterday. I would like to thank Dog Shumski for supporting me on deploying monaska on top of cola he did a great job in coloncible by deploying unciple roles for deploying monaska and grafana so now it is now available in coloncible. I would like to thank my colleague at Samsung for helping me solving various networking and deployment issues. Thank you very much. Any questions? How many efforts in weeks or months or years you have put on this solution? To be honest design was about one week implementation one and half week. That's very quick. And actually we didn't have experience before with vitrush and mistral and monaska so we had to do some research but the documentation was pretty well and also support from the community facilitated development so quite quickly to achieve. Any other questions? Thank you very much and have a nice last day of the conference.