 Well hello everybody and welcome again to another OpenShift Commons briefing. This time we've brought back some folks from the crunchy world of Postgres and Sarah Conway is going to give us a talk on containerizing Postgres for cloud natives. We've had them on in the past but I think there's been some interesting new updates so we're really looking forward to this talk. I'm going to let Sarah introduce herself and we'll have live Q&A via the chat and at the end of the talk. So please put your questions into the chat and this whole thing will be recorded and put up on our YouTube video as well as links to the slides afterwards. So Sarah take it away. Hi thanks so much for having me here today Diane my name is Sarah Conway I'm a software engineer at Crunchy Data. I wanted to talk to you today about two projects I'm invested in that I assist in documenting and developing the container suite and the Postgres operator. This will cover the overviews of both projects including the why's and how's in addition to latest updates on the projects and future plans. So what's the container suite? Well it's currently a Docker based set of containers that focuses on providing Postgres as a service along with tools for administering and monitoring Postgres and these containers. It's also open sourced and it's hosted on GitHub. The link is on the last slide to access the repository. You might ask what are the advantages to containerizing Postgres? Well one advantage is the ability to consistently and uniformly deploy a database within seconds. It can often be extremely complex to spit up a Postgres database given its monolithic structure and so this method makes things much smoother for developers or database administrators. Dependencies are no longer a problem infrastructure cost is reduced performance is massively improved over virtual machines and you can allow developers to more accurately recreate a consistent environment through development testing and production. Then you also have a separation of concerns. This lets you perform a database backup and restore or other administration tasks in a lightweight manner for spinning up Postgres. The ability to spin up large numbers of databases with a rapid deployment is in high demand unless you also easily support a cloud based model of deployment. Currently the base for the containers is through Docker and Dockerfile files. When the container suite was first developed it was sort of a wild guess that Docker would be the way to go and it would be where the future of containers was headed and so everything is currently based off of that. However as I'll be discussing a bit more later on Kubernetes and OpenShift is moving towards cryo technology and so eventually we'll be adopting that in place of or potentially in addition to Docker and so you can see all the benefits to containerizing your application that you've heard also applies to databases. Life becomes easier with lightweight applications and it's a great time to be taking advantage of that as the technology is only continuing to improve. All the Dockerfiles are designed in such a way that they are consistent in nature and are all built with either CentOS 7 or REL 7. This allows you to not only be aware of security issues and address those for those specific environments but it also allows you to know that the images will all act as expected in a consistent manner. Other operating systems as a base image will potentially be developed and supported later on but for the time being those are the only two we support. These images can also be deployed in three primary kinds of deployment models which are Docker, Kubernetes and OpenShift. We do have helm charts currently supported as well for a few of our examples but the widest range of use cases are supported on those three specific environments. This design strategy also allows for easy customization for each environment not only for the Dockerfiles themselves but on a higher level. For instance you can modify the JSON files provided for the Kubernetes and OpenShift environments in order to best fit your needs. Both deployments allow you to deploy to a single node or multi-node cluster while the Docker style is a standalone Docker deployment where you can use Docker networking and Docker swarm to provide multi-host capabilities. And then you have the issue persistence. All the data for the Postgres database in the containers is mounted externally to the container. You get two of the files for the container through a volume out and each of these deployments give you a variety of ways to mount your containers from and a best by default to cluster to dynamic or surf or surf to empty dir storage and a bunch of others each with their own advantages disadvantages and security issues. Currently we by default assume you're running an NFS setup in the examples but you can always reach out if you have questions about setting up other kinds of storage with the container examples. And of course examples are provided for all of these environments on that link there. You can see all the various ways that you can deploy a Postgres pod and run different containers against that such as a metric suite or running a crunchy vacuum against it. We have complete walkthroughs available in that document for you to experiment with the different possibilities. So a few of the highlighted features here show what was kept as a main focus while the container suite was being first developed. The crunchy containers project was designed to first address the high demand for a streaming replication environment. It used to be an incredibly complicated process to set that up especially in a consistent manner so being able to run Postgres like that is important. Also part of that is that people want the ability to have both synchronous and asynchronous replicas though the ability is provided to support both. The project currently allows for a primary Postgres database that has multiple read only replicas. Multiple locale support is built into the container suite as well straight into the Docker files and then you have secret support and Kubernetes and OpenShift environments so you can store secrets for your database using kube secrets. A nice feature specifically for OpenShift is the ability to choose a random user ID for the Postgres user rather than using the default and that's one example of using wrapper code inside of the Postgres container. And then you have flexibility in what you can do with that Postgres image such as being able to backup or store using pgbase backup pgbackrest and or pgdump. And finally with each minor or major release that comes out of the suite we always make sure to add support for the latest Postgres version and fully test that all features work as expected. A major design goal with this project is to ensure users can fully customize their version of the project to suit their needs. For instance the setup.sql file is mounted outside the container itself. The users can modify this and implement custom SQL statements for the newly created containers. So if you have some default rules or objects you know you'll want to create every time this massively speeds up the process. The pghba and Postgres configuration files are other examples of files that you're commonly going to want to be able to access and customize. So you can choose to either use the default files that are generated upon container creation by Postgres or you can mount them externally to the container in the persistent volume claim to allow you access to configure those files to fit your requirements. This gives you complete control over these files. You also have the option of tweaking specific values in these files are making small changes such as through switching the environment variables in the initial project setup. These variables are defined in your BashRC file and cover requirements such as the Postgres version, the version of the container suite to install, whether you're installing in a CentOS or well environment, what namespace you're in, so on and so forth. And finally you can completely modify or edit the Docker file itself within the project if you want to reduce the size or add custom modules to the file. The goal is to make this suite as extensible and widely useful as possible. This design goal gives users complete flexibility in what is being developed and created in their environments and gives them a starting point to work from and configuring their clusters. All told we have about 16 or more different containers all serving different purposes. Not all of them are listed here but I wanted to give just a general overview of what's offered in the suite. So you have the crunchy Postgres container which can be run in several different deployments either by itself as a single primary database or with synchronous or asynchronous replicas and it also comes with backup and restore functionality. The backup container has the specific Postgres backup tool called PG-based backup that can be executed against the country Postgres container. You then have PG Badger which is a tool that produces HTML reports to provide detailed Postgres log analysis. PG Pool which provides connection pooling and lets applications access the Postgres cluster via a single connection. Watch which provides automated failover by watching a cluster's primary. As soon as it's detected that the primary is no longer responding, crunchy watch will actually trigger a failover on a replica and then you have collect which uses the Postgres exporter and note exporter packages to collect both Postgres and machine metrics and delivers them to the Prometheus container. The Prometheus container ticks these metrics and hosts them in its data store and then from there you have Grafana which graphs these collected metrics in a web-based dashboard. You also have crunchy vacuum which reforms a Postgres vacuum. This is essentially the garbage collector Postgres and reclaim storage space in addition to crunchy DBA which has a cron scheduler functionality for DBAs. The tool PG-Admin 4 is offered in container form in order to provide a GUI management access for Postgres databases. You have Backgres Restore which uses the PG Backgres project. This back this backs up and restores databases. The upgrade container hosts the Postgres upgrade utility in order to perform major version upgrades. And finally you have Crunchy Proxy which is a small Postgres aware proxy Damian. It can route SQL statements to the appropriate Postgres backend within the streaming replication cluster. And here's a simple graphic just showing how all of these containers are able to interact with each other. You can see the incoming application requests go through the PG pool load balancing container here which distributes the workload between the primary and replica pods. Crunchy collect pulls data from the databases and is processed into the metrics dashboard where it's displayed through Grafana after being held in the Crunchy Prometheus data store. Then you have PG-Admin 4 which is the graphical interface in addition to PG Badger. And both of these are pulling data directly from the Postgres container. And then you can see that there's three different options here for performing a backup. They're all pulling directly from the primary Postgres container and writing that to a volume which is then used by the to actually restore that container into a new container. So the highlights of our last few releases for the container project include adding Postgres 10.0 and 10.1 support in the last release in addition to updating the other two 9.5 and 9.6 to their latest versions as well. We also updated the monitoring solution we provide by updating packages for Grafana and Prometheus to the latest versions in addition to implementing the Postgres Exporter and Note Exporter packages. And this allows us, before we were only collecting Postgres server metrics and now we can collect both those and host operating system level metrics. We also created a new Grafana dashboard as well in accordance with these changes. We've added new Helm charts to the repository in version 1.7 and of course many more changes were made in both of these versions namely focusing on updating documentation and standardizing the project. But I wanted to provide just a few highlights to give you a picture of where we've been. As for where we're going there's potential for supporting Cryo and other Red Hat container technologies because both of these projects are so contingent on the Kubernetes and OpenShift projects we end up following in their tracks and implementing new technologies in line with their changes. Other than that we're working on continuing to stabilize project releases including better testing processes in addition to major updates to documentation and examples over time. We're looking at adding in new examples and use cases. Some examples of this is better PG Backgres integration and the ability to employ new Postgres security features in the suite. So we have a couple customers who approached us with a problem. They wanted to deploy over a thousand database applications and they needed a way to manage them. That solution became the operator. The Postgres operator works directly with Kubernetes in order to provide operator capabilities for managing Postgres clusters deployed within a Kubernetes or OpenShift environment. This uses a straightforward command line interface. The project also incorporates a REST API and REST client for interfacing with the operator. First of all you can find all of this code we're going to talk about today on GitHub as a repository. It's a completely open source project so please feel free to create issues and make recommendations for features you'd like to see as we are continuing to develop this project. Everything's there that you'll need including some basic documentation so it's pretty easy to pull this down, build it, run examples and just experiment with it. You can install it on anything that can create a goal link binary and it connects to the Kubernetes cluster. PGO offers an easy way for users to make use of the Postgres operator and also gives a view of everything you've created. You can just say show me my Postgres databases without actually having to interact with Kubernetes and operator is really just a controller or piece of software and in this context the focus of the operator is on controlling deployments of a Postgres compute opponents within a Kubernetes or OpenShift cluster. If you're familiar with controller patterns in general that's what this is doing. You deploy the operator on a cluster and it starts up and begins to wait for any stimulus from external events which in turn triggers a response. These responses that it's configured to do are basic Postgres operations. You can also use an operator to automate things. In the world of databases there's all kinds of workflows that DBAs would do in this context so what we can do is automate a lot of those manual tasks and the operator itself is a place where we can build those types of automation layers. This particular operator itself is built in Cobra which is a GoLink library and it uses the Kubernetes client API for GoLink. There's a link there. It's a pretty interesting open source project, Client Go, that forms the basis for the operator and it allows me to interact with the Kubernetes API with customized code. These interactions include things like updating labels to containers, creating and deleting containers. Everything it does is based upon leveraging that client API. This operator is different from most other operators and that this one has a command line interface that allows for easy interaction. From a command line perspective the operator works a lot like kubectl or the oc command and that allows you from your desktop the ability to interact directly with the Kubernetes NoPershift APIs and whenever you do that you can get information back from the cluster. You can also create objects using that command line interface and that's the primary means right now for the operator to understand what you want it to do and for you to cause it to do things. The operator runs as a standard deployment. You want it just like any other deployment in your environment and after being deployed it sits out there and watches for changes on custom resource definitions that we define to manage postgres deployments. CRDs are a customized template-based approach for allowing users an ability to define the makeup of a postgres cluster. Kubernetes recently switched over to these in place of third-party resources. They're essentially serving the same purpose of them they're just a means to catalog and store metadata and interact with the operator through the standard Kubernetes API. It may be a primary database container or a series of replica containers or a series of services for those maybe even a postgres-based router proxy but all those things make up what we're calling a postgres cluster. You can find those in a template. The operator is designed so that you can add your own set of templates to meet your particular requirements and there's a default initial definition that over time you'll see new ones added into. There are six different options or no six different objects that the operator considers to be what defines a postgres cluster here that are created with just one command. In postgres you can have a primary database alongside a series of read-only replicated databases connect to it in its replicated state and then you can put services out in front of those databases and these also have related persistent volume claims and there's lots of other things going on here. So one of the values of the operator that treats all these components as just one cluster it boils down to simplification of postgres clustering mechanics. Without the operator you'd have to construct and deploy all of these things and pieces by hand. This is the common question why the operator when it's possible to just run the containers build templates and deploy those and it's been working that way for a couple years. The container suite has flesh out documentation plenty of examples and easy to run scripts and if you're a developer or good dev ops guy it's fairly simple to put together something similar that suits your needs. Well the operator is useful for users that require a large number of postgres containers and want to means automating their DBA tasks or simply want to ease your user interface and managing the deployment of postgres clusters. I've listed a few reasons here where you might want to consider using the operator. It's really geared towards people who wish to automate workflows around databases. These are things that DBA would typically want to do such as backing up or restoring databases, reducing human errors. You don't necessarily have to build your own set of scripts around the base level containers to do certain things and when you start working with a large number of database deployments that can really get unwieldy over time given all the things that make up a functional robust postgres cluster. So without some sort of automation you have a lot of things you have to manually keep track of. Some people that are deploying lots of databases they want the ability to implement a set of standard practices or policies around their databases. So the operator gives them a set of means to do so and it allows for very specific needs using SQL files. Once you have the PTO client up and running it's really very simple to actually run and execute commands to do complex tasks. It's quite easy to use I'll be showing some examples of the interface later on. The operator really shines when it comes to large scale deployments. If you have say hundreds of postgres databases that you want to manage that's the main value of the operator. It gives you the ability to collect and maintain metadata on all those clusters and you can query based on that. It helps you navigate across large numbers of databases on OpenShift. Complex orchestrations is another one. It gives you the ability to develop advanced pieces in a database, multi-step or multi-component tasks that would otherwise be complex to do. For example upgrading a database there's a series steps that a DBA would have to do in order to manage that process. Well we can implement those in the operator so that it makes it much more user-friendly and consistent to manage those complex orchestrations. Crunchy has built a container catalog to support deployment and administration of Postgres. These containers are the building blocks used by the operator. So the operator depends on that set of base containers to behave and work to deploy Postgres. The Crunchy container suite actually supplies five of these containers. The Crunchy Postgres backup, collect, metrics and upgrade containers to be specific and the operator will use these as building blocks. The operator can be manipulated using a set of commands issued by the command line interface client called PGO. From the command line you can do a variety of commands. The first thing is creating a cluster deployment which allows you to create the services, deployments, pvcs and everything else in that one command. Likewise you can delete the cluster or show all the running parts. PGO tests simply runs a sequence of pccool command against the cluster and this ensures everything's up and working and it will also print out the equivalent commands that were used if you want to actually manually connect to the cluster or test it. The only thing to note with the delete command is that it does not delete the pvc. This allows you to ensure that your pvc with all your Postgres data is not deleted by mistake. Additionally this allows for the workflow for backing up and restoring a database or an upgrade to drop the containers but keep the pvcs working around them. PGO show pvc allows you to display the contents of the dedicated persistent volume claim similar to the ls command. PGO scale lets you scale up the number of read-only replicas in the deployment from the default of zero. When you scale up the number of replicas to one it will spin up a read-only replica which will eventually rise to the same level as the master database after a period of time depending on how large your master database is. Right now these replicas are asynchronous but we will later be adding a support for synchronous replicas in addition to the ability to scale down replicas. The default is set to zero because it's pretty unnecessary for testing environments or most common uses of the operator to have more than that. PGO backup performs a full database backup with the pg-based backup utility and stores this backup on a persistent volume claim which you then reference to restore. PGO restore takes those backup pvc and path flags and creates a new database from that backup to create the restored cluster and you can also specify secret FUM in order to copy through credentials over from that previous cluster so that it'll have its own set of credentials. PGO upgrade is the ability to perform a major or minor pustgres upgrade. This is useful if you for example have a database running pustgres 9.6 and you're looking to upgrade to the next point release. All you need to do is say pgo upgrade and proceed to take down the image and bring it back up with a new image with the same data. The default if you don't specify a flag is a minor upgrade. If you specify the major upgrade flag this causes a job to be executed which runs the pustgres upgrade utility and this spins up a new version and upgrades old data into a new persistent volume claim causing the containers to be dropped and recreated to use the new pvc. It's pretty it um yeah and then we have the create policy command which is the way to create a SQL based policy and give it a common name. This is useful for people with a series of SQL statements that they want to apply against the database that's pustgres user. These can be security related application related really anything and like but basically they're pieces of SQL that you'd want to name and then you can apply those towards a series of clusters based on a selector criteria. This SQL will eventually get applied and run on the pustgres database by the pustgres user by default although you can always define a specific user inside of those SQL statements. PGO label allows you to create a label to be applied to a subset clusters and this allows you to more easily control and administrate your clusters. For example in that pgo apply my policy command you can see we're using the selector of name my cluster but you could also apply it to the clusters with the research label. Finally pgo user update passwords allows you to change the password for all users in the my cluster cluster. There are many more commands and variations of commands in this including new user administration commands in the last release and you can find explanations and examples of all of these in that get help link. Now for the highlights of the last few releases for the operator the main changes in the last three releases was the addition of basic username password authentication including password management and the ability to define expiration dates for passwords. These credentials are currently stored in Kubernetes secrets. We also added TLS support for the API server and now listens on port 8443 with sample certificates and keys included. In version 2.3 modifications were also done to build on top of Kubernetes 1.8.5 dependencies such as client go and api machinery. There are multiple plans in place right now namely focusing on adding in new use cases that allow for advanced postgres administration and automation of common dba tasks. For example the ability was added to delete a cluster's data files and backups in order to completely clean up a deployed cluster. This command also removes the pvcs whereas before those pvcs would have had to be deleted individually. This data purge feature depends on a new crd called pg tasks which is part of what's created when the operator is first initialized. Users can cause a job to be created to remove all data files for a given cluster and when the cluster is deleted this feature is enabled using the pgo delete cluster command flag delete data. Low balancing is also going to become available in the next few weeks most likely either through the pg pool or the proxy containers from the container suite. Metrics to report was also added through a command flag that adds a crunchy collect container to the database pod to allow for metrics collection within the operator. We're also looking to support postgres at scale in addition to multi namespace and multi cluster operations improving usability for large-scale environments. It's highly likely that web user interface will be developed this would of course be optional in addition to the command line interface already provided but the option would be there and finally the security will be improved with a role-based access control security model in addition to disaster recovery features for protecting your clusters and data and as before unlink is provided on the slide for you to look more into the latest releases for the postgres operator and that's everything. Please check out our website to develop to discover more about the company and please feel free to shoot me or Jeff McCormick who's the product architect behind the suite and the operator and email if you have any follow-up questions while looking through the projects. That's it. That was great Sarah and very timely too. There's been a lot of releases since the last time many one from crunchy talk so that's really good to see the progress and you guys have been sort of our poster child for how to containerize services so it's been wonderful working with your team. There's a couple of questions the first one we may have missed it but is the PGO API something you install separately or is that a container deployment too? It is a container deployment it's actually you can download that from Docker Hub under crunchy data I believe it's called crunchy-api server but all that's going to be in the install documentation it walks you through downloading all those containers and getting it set up. And there's another question does crunchy provide tooling for a DV population within Kubernetes runtimes? Typical challenges or schema upgrades between releases rollbacks of changes in case of issues and deployments and etc? Currently no we do not provide that capability but I know that's in the plans for the future. Definitely if it's something you're interested in I would submit that as a feature request on the GitHub repositories. Cool yeah this great use of GitHub for for tracking issues and making that stuff work we do a lot of that too and that the OpenShift world and it really helps. There's one question is can you talk a bit about what you recommend for backend storage and what tools you have to manage that? Currently yeah go ahead. Currently we do like I said we only use NFS at this time that's what we walk you through in the instructions and how that's how the examples are set up we are looking at moving it towards the use of cluster storage we're thinking about making that the standard for this suite and including including the installation instructions that's mostly because we've been seeing a lot of demand for dynamic storage as of late other than that we have experimented with putting it on seph and google container engine as other storage options but it's not tested at this time. All right I think that's all the questions we have I'm going to give people a couple more checks and we ended up with quite a few people listening in so that's good and it's really you've done a great job in giving the resources for where people can can get ahold of more information or ask more questions so thank you for that. I'm giving people a few more minutes to see if there's anything else Peter did that answer your question about storage is my question to you he's his his voice isn't activated today so he's just doing stuff through chat I think that I think you've answered everyone's questions and done an awesome job of covering a lot of information so if people are looking or are using crunchies containerized Postgres please reach out to Sarah and give them your feedback ask for new issues and we'll post this presentation up on the OpenShift blog shortly along with the slides and all the links so thanks again Sarah for taking this one on and and giving us this good backgrounder so we look forward to the next releases and we love how you guys have been keeping up with the incredibly fast pace of Kubernetes and OpenShift releases as well so thank you for all your efforts it's definitely been a wild ride but I'm enjoying where it's headed thank you so much all right take care