 Hello and welcome again to another OpenShift Commons briefing today as part of a series of Operator Hours and the good folks from Crunchy Data have for a long time done great work around the Operator pattern and today we have Jonathan Katz here with us to talk about the high availability Postgres and a whole lot more on OpenShift and we are going to have him do an overview of the topic and hopefully a little bit of a live demo because we'll put him on the spot and live Q&A so if you're wherever you are whether you're in Twitch or YouTube or Facebook or here in Blue Jeans ask your questions in the chat and we'll relay them to him and have a conversation at the end about you know the future of Postgres SQL and all kinds of good things so Jonathan take it away introduce yourself and let's learn a lot more thank you. Thanks Diane yeah so yeah very happy to be here it's you know it's been a while since I've been participating in an OpenShift Commons you know I'm looking forward to the point where we can all get together again in person so just real quick a little bit about Crunchy Data all we do is Postgres and all we do is open source you know very similar to Red Hats model you know we started historically focusing on how to securely deploy Postgres just you know very important in you know many enterprise environments and then you know as you know containers became a big deal and as Kubernetes and OpenShift became a big deal we focused on how to deploy Postgres in those types of environments which is a lot of what we're going to talk about today and like I said you know everything we do is open source you know everything I'm going to demo today talk about today's open source you know we believe you know the best Postgres solution is you know the upstream solution so all you know everything we try to do you know we you know we give back to the community in that regard so yeah you know feel free to you know ask questions throughout you know I'm very active in the Postgres community actually I have a slide on that I'm actually been I've been using Postgres for about 15 years I've been active in the community for about 10 actually a little bit over 10 now mainly focused on advocacy so for instance we have the Postgres 13 major release coming out next week and a lot of what I do is you know gear towards getting that release out the door you know before I joined crunchy data I was involved in a you know a few different startups in New York City and of course we use Postgres as the solution there but a background section application development and I think you know some of the things we'll talk about today will reflect that and you know for me you know is sort of my journey from application developer to understanding more of the systems administration side and you know bringing that background of you know being always being the accidental administrator in you know the various organizations I worked for so and you know you know beyond beyond that you know love you know I just love the open source ecosystem in general you know I've been very fortunate in my career I've only you know I've been able to use exclusively open source so it's great to you know give back you know in a variety of ways so you know maybe the back of a second you know if we were well first off if we were in a you know a regular conference setting I would normally ask how many people are running Postgres and then how many people are running Postgres in production but I also like to take a step back and just talk about you know what it takes to run a database you know in an organization you know it's one thing to play around with you know a data system on your laptop do some cool things on it which you know arguably is what I'm gonna be doing in the demo today but there's you know more considerations when you're trying to run a database system in production I mean it's very similar to what you might do with OpenShift where you you know you have multiple you know high availability notes set up to make sure that if you know one goes down you're able to get your workloads over to another until you know the original node heals and you know and you know and you need to consider this you know when you're running Postgres I mean or you know even any database I mean databases are you know foundational applications they're storing your data so you need to make sure your data is there is safe it's stored securely and you're able to retrieve it and retrieve it efficiently as well but you know a lot of things can go wrong in production you know for instance your data might become unavailable there could be a network outage there could be you know a road process that's you know just sapping all the resources in the system you know it could you know it could be any combination of things you know you know there are many cases where you know I would you know commit a denial of service on my own production database so there's the human error as well so you want to make sure that your database remains as available to your applications as possible but you know certain you know other things can go wrong as well you know for instance let's say I drop the users table you know that's going to cause a very bad day so we need to be able to restore the database you know back to the point where the users table existed beyond that you know you might be you know using databases in your development environments that you do have one production environment but you know to develop you want to be able to bring up databases for your team so you might be able to rapidly provision and you know maybe rapidly destroy it you may also want to be able to create production or production like data or scenarios for your developers to work through for instance the troubleshooting issue that is in your production environment so being able to clone and copy that data is you know very important as well and of course if you have you know various you know data regulatory concerns you might be able to do that in a way that is safe and secure you want to be able to you know anticipate problems before they happen so you might be able to monitor you might want to know that oh I'm running out of disk you know I need to provision you know I need to provision more space I need to resize my you know PVC or you know or PV you know maybe you know maybe things are not performing as quickly as they could be you want to be able to you know troubleshoot and diagnose which are your slow queries you know and try to optimize them and of course you need to be able to manage you know you know who is able to access your data production typically it's your applications or you know a database administrator who might come in and troubleshoot things and you know you also need to make sure you're securing the connections or are they over TLS and you know you know all the considerations around that so there's a lot to consider and you know I you know I sort of try to layer it that you know there's you know multiple ways to get there that you do have your stock Postgres system but you need to bring in an ecosystem around it now Postgres has a lot of these things available in terms of you know what's built into into the core system and this is you know comes through really over 30 years of evolution of Postgres post you know I like to joke well it's not really a joke you know I'm about the same age as Postgres it's been around you know it's been around for a hot minute now and a lot has you know it's it has you know adapted and changed at the times you know one of the reasons why Postgres has become very popular through the years is because of its license you know much like Linux you know there's no single vendor you know it's a very flexible license no one can actually own Postgres and the community has gone to great lengths to make sure you know this is the case because of this you know it actually you know it's attracted a very healthy ecosystem of people contributing to Postgres and through this and through real-world use cases you know it's you know recent Postgres releases have brought it to you know future parity with other proprietary databases so I'll take an example from about 10 years ago but on streaming replication now in streaming replication is one of the foundational aspects for you know creating a high availability system that you're able to take your data from your primary server and you know and you'll copy it over you know in an ordered fashion to your replica server and in a way that's you know fast and efficient so that came about 10 years ago and you know it was you know the basic form is basically you know as soon as they get the data changes send it over I've I've no guarantees if it gets there but I'm gonna try to send it over as fast as I can then came sequence replication which is geared towards workloads that you know require you know that are right sense that for instance I really don't want to lose the transaction I want to make sure it's copied to at least one other place that you know that came let's say you know roughly nine years ago and you know I could actually probably go into a whole spiel about the difference between these two replication modes their trade-offs etc. I'll save that for a later slide or perhaps a question you know and you know as Postgres evolved you know further things were added suddenly you had cascading replication where you can have a replica of a replica which allows you to you know push things out you know even you know you know more distributed environments and I can actually tell you that the Postgres operator which we'll talk about you know supports that and supports that feature in one of its architectures we you know eventually we got something called quorum commit you know which is popular in the distributed consensus world where basically you can say as long as my you know my transaction gets to at least you know let's say two replicas then I'm consider it written and if not you know I'm going to hold the transaction till I know it's you know safely written to you know a certain number of replicas we also got logical replication which you know created you know you can create some you know really awesome things around that but one of the important things is being able to online upgrade so you know going from Postgres let's say you know 10 to 12 I can keep my system online and basically you know flip it over like that reducing the amount of downtime that I need and you know and so and that's just one feature you know this you know replication has been very important to you know you know ensure Postgres can stay available now to properly use it there's a few you know you know this takes some work to you know set things up you know similar to writing the you know an open shift manifest you know there you know the feature is there but you know you need to be able to fill it all in and this is where using automation patterns like the operator can help you know with that task the the other thing is you know Postgres you know extensibility you know beyond just having extensions themselves and you know we're going to talk about those today where you know you can basically you know add on to Postgres functionality while still maintaining you know the open source core there's a whole library ecosystem that makes it you know super easy to manage Postgres you know be able to for instance monitoring it or you know handling you know the HA in a really cool way beyond that you know having you know transaction safety data integrity ensuring that your data gets written to this being able to take of your data is corrupted this is the page checks some feature in Postgres you know it's you know it's really made it you know very trusted and the question late you know what my first time the Postgres community the question I got most often was what's Postgres and this was in the tech community that you know people didn't really know what it was and it had already been around for 20 years the question I get now is not what's Postgres but how do I do XYZ with Postgres how do I deploy it in this manner how can I use this feature and you know it's really showing the evolution that you know not only does you know Postgres have you know a name recognition it's being deployed it's being deployed in mission critical ways and again with my open source community hat on like it's just so cool to you know have really seen their evolution and it really goes the credit of the community and you know the feedback we've been getting into the community to make sure we can continue improving the product so that is my spiel on Postgres there you know I was joking with Diane that you know I could probably talk about Postgres itself for at least an hour and there still would be plenty more to talk about but you know today you know the goals talk about how do we run Postgres on you know OpenShift or Kubernetes and in a way that can satisfy you know all the all the needs you need to be able to run it into production and this is where operators come in operators you know have really changed the way in terms of how you can deploy stateful applications on Kubernetes and what do I mean by that? Stateful workloads really only have one job you need to maintain state basically if you know I modify you know let's say I insert a record let's say it's a financial transaction I need to make sure it gets written to disk I need to make sure that you know nothing's going to mutate that record in a way where it's going to invalidate the transaction or lose the transaction I mean that would be pretty bad so we want to make sure that we're able to create a system that's resilient that let's say you know my disk does you know crash out or it knows you know the block there's you know some corrupt blocks on the PVCM using that I'm able to restore my data in a way and know that no I didn't lose any of it but in order to do that on Kubernetes you know we need to apply more knowledge in general and this is where operators can help because operators can help encode specific information into you know into your environment so that you're deploying your you know you're deploying your stateful service in a way that you know matches what it does so Postgres does things one way my school does things another way there's you know from a high level they're similar but the specifics in terms of how they work are different and that's where the operator can capture that information additionally you know to modify the state you know it could be range for something you know very simple or very tedious to create a Postgres replica is actually a little bit of a tedious process it helps to have all that automated and it helps to you know have that automated in a way that is able to do it efficiently adding a user on the other hand is very simple you know I can add a user database know with one line of SQL but let's say I bring the new DBA into the team and I want to give give them an account to all the databases well that's a little bit more tedious and granted you know if you've been doing DevOps practices for years you can you know write you know your appropriate ansible playbook to get that all set up but then you have to think about that from a maintainability perspective because you know you're you know you're creating something that may or may not be standard standard practice and then you have to teach everyone in terms of how to do that what operators do is that they create a consistent view of the world so that you know adding a user is the same no matter where you install the operator and that's where this pattern is great because you know with having a healthy operator ecosystem suddenly you know we're able to you know create standard ways of you know taking actions on the various stateful services that are out there and you know this is the final point of you know allowing automated managed workloads you know some of this starts with just being able to do you know the proverbial day-to-operations handling high-avail building having monitoring in place having you know taking systematic backups and so on and so forth but there's things continue to advance and you know some of this is forward-looking having you know having smart systems and you know some of these are already and what I mean what I mean by spark systems I mean you know being able to auto scale or you know auto auto tune themselves and there's some services that can do that and with Postgres you know to be able to do with a database system like Postgres to do things like auto scaling you need to have a lot more knowledge because you know what does that actually mean do I need to scale vertically do I need to add more RAM how does that affect my Postgres configuration or you know do I need to load balance things more as my workload really read heavy how do I determine if it's read heavy how what are my what are my scaling for thresholds and we have the ability to start amassing that information start building out systems like that but you know that takes some time that said you know with the way the operator pattern works you know we're well on our way to getting to that point so you know that I would say that concludes my spiel on Kubernetes operators and you know tie it into a topic that's near and dear to my heart which is the country Postgres operator so as I said at the beginning at country data everything we do is open source and the operators no exception it's actually been open source since March of 2017 it's one of it's one of the early examples of a stateful operator currently one our version 4.4 actually we have a 4.5 and beta right now I'm going to demo a little bit from that today this is the surprise demo I was informed about but you know that surprise demos are all the more fun it's a it's level five on operator hope it's a certified operator you know you can go ahead try it out today and you know going back to that original slide I mean I do like the hippo wearing the the coupap but the idea with the operators we want to support all the features you need to run Postgres in production and it could be a single postgres postgres cluster running production or it could be you know a thousand postgres clusters being managed by the operator you know we've seen both use cases of course everything in between and to us you know it was important that we're able to do that and you know I could go I can go and read through the list but I you know I'm right I'm going to cover you know a couple things I hadn't touched on yet elasticity being able to add and remove replicas you know they could be for high availability purposes they could be that you need to read only replica you know your business intelligence person comes in and you know wants to perform there are some read-only queries and you know it could be you know it could be for load balancing as well you know it you know it just depends on what your use case is but you know the idea was to make it easy that I can you know add additional replica just by typing you know pico scale hippo or whatever my cluster is leveraging you know one thing that's important to us we can leverage you know Kubernetes and OpenShift native objects to you know create a you know very sound system so Kubernetes has this concept of pod anti-affinity where you can assign rules to say hey you know I want pods of these types to ensure that they don't deploy to the same cluster you can require it and say like don't schedule the pod unless they're specifically deployed to a different nodes or you can prefer it and say you know try to schedule the pods to different nodes but if you can't do it no it's okay you know schedule to the same node but the reason we want with that is to make sure that there's you know a higher probability or guaranteed probability that you can schedule your Postgres instances to different nodes. To disaster recovery we use an open-source tool called a PG backrest you know it's actually something that you know we support very heavily and the reason is that it was designed for terabyte scale databases that you know they saw the office saw an issue where it was difficult to create an efficient backup solution in Postgres for you know very large databases and they sure enough they've solved that problem you know and you know one thing that we've done with the operators that we built our disaster recovery solution around it it's great I you know I you know it's one thing that you know I've always looked to employ you know wherever I've been employed you know since the existence of PG backrest and we're very happy to you know help try further use cases to it in fact you know one of the benefits of doing everything open source is that you know the operator does help you know and based upon the feedback we get from the operator it does help drive features being added to PG backrest so one of the big ones was being able to expire backups you know or create a time-based retention policy you know saying like hey I want to keep this full backup for 21 days you know that you know came from direct feedback of people using the Postgres operator and ultimately it was upstreamed into PG backrest administration one thing that I may or may not attempt to demo is being able to use the popular PG admin tool which is a graphical user interface to administrate the you know your Postgres instance and a PG monitor which is a connection pooler you know one of the new limitations in Postgres is that the number of connections you have could have an impact on database performance as it has to be a fairly large number that said first off there's actually a big improvement for that coming in Postgres 14 but you know that's at least a year away and you know you know until then you know there is PG balancer which is you know a connection board so basically you could have you know you know several hundred connections coming to PG balancer and then you know scale scale down when actually goes into Postgres I've deployed it you know multiple times in my career big you know I was very happy that you know we were able to integrate it into the Postgres operator and you know there's some other this matter benefits of you know PG balancer as well where it can do connection state management we're basically doing a failover scenario you can have PG balancer you know hold connections until the failover is complete and then resume them a last but not least in you know a big feature 4 5 is a full support of the open source PG monitor which includes you know lots of wonderful charts and graphs that are essential for monitoring Postgres clusters which you know we'll touch on in a little bit so why should we use an operator well there's a lot of you know really good points on that I'm I'm just going to brush on the slide because I rather get into you know talking about some of the architecture and the nitty-gritty details of how the Postgres operator works but you know automation standardization ease of use and I think you know I tried to discuss this as well as I could before I do want to emphasize the ease of use you know the idea that you know we have you know a fairly simple API or you can manipulate things from the CRDs or you know even like a CLI UI you know our Pico command line interface you know is very popular because I can just do Pico create cluster hippo boom I have a Postgres cluster Pico create cluster hippo replica count 3 I have you know a high availability cluster there I've also been involved in developing a UI around the operator and you know it's you know that that's a long story for another day but you know by creating like a standard interface it's very easy to create you know even more you know robust applications around Postgres scale and when I talk about scale you know it's not just saying like oh you know the Postgres operator can manage you know a thousand Postgres clusters those are scaling your workload and you know this is something I always look at you know when you know as an engineering manager is how do I scale our processes across the team can I create something that's repeatable and standard that you know everyone can you know can be able to do an interface with so that you know that's important to consider as well we certainly don't you know engineering type things to ensure the operator itself can scale for instance our multi-name space support you know we at one point we were hitting limitations on it but you know thanks to one of our very smart engineers you know we can handle I know easily northward of a hundred namespaces in the multi namespace mode you know I don't know you know we haven't really pushed the limits of it yet or we haven't found the limits I should say security and this is the cool thing about running Postgres in containers is that you naturally get a certain level of security there and such that you're in a sandbox environment and that actually affords you to do you know if you want to do some you know cooler things like give people super user within your container and by that I mean Postgres user not a not a route within the container yeah you can do that and there's lower risk to that you know I might have my own opinions on what you should be doing but the idea is that you know you know you're operating in a more of a sandbox environment and you know that affords its advantages as well and you know just having the process isolation built into it certainly helps when you're running in a multi-tenant environment and you know the flexibility and this is one of the first things that really attracted me to what we were doing with the operator is that as long as you have a node you can run it anywhere you know it doesn't matter where your OpenShift nodes are deployed you know you can run a Postgres cluster there and that's really cool and you know I think you know in this regard like you know I can you know speak for the crunching operator but the you know operators in general create that unified layer where you know so long as I have you know something that speaks you know Kubernetes or OpenShift I can deploy my workload same with if I'm running OpenShift you know as long as you know I have an OpenShift node somewhere it doesn't matter what cloud provider I'm on what hardware I'm running on you know I can deploy there so that's really cool you know one thing that we do when testing the Postgres operators we test it you know we test in all sorts of places you know we run OpenShift in one environment then OpenShift in another environment and you know for the most part it just works so that's a you know that that's my pitch for operators in general you know let alone the the Postgres operator so how does it all work well you know I would say just look at the diagram but going to a little bit you know part of the operator pattern is that you have these things called custom resource definitions and you know from my application developer perspective I consider that your database schema your your customer resource definitions or CRDs basically you know store the infrastructure of what you're going to deploy so we have one called PG cluster but PG cluster stores is essentially your definition of your Postgres cluster you know it says how many you know how many replicas do you want what ports do you want to run it on do you want a PG bouncer with it do you want to collect metrics with it you know and so on and so forth and you know from that you know PG cluster you should be able to deploy a Postgres cluster anywhere if somehow your whole Kubernetes environment gets wiped out well well if your data gets wiped out then that's different story but let's say all your deployments get wiped out well I should be able to redeploy you know a Postgres cluster that looks exactly like we had before from that definition so from there what the operator does is the operator reacts to what's been added to a customer resource definition or any changes in the customer resource definition and then applies those changes throughout your OpenShift environment we've layered a few things on top of that we have an EPI server that makes it a little bit easier to interface with the customer resource definitions as it aggregates that information together and that's actually what the pico command line client uses as well speaking of we have a command line client called excuse me pico which is I got the pico client it works across different operating systems and it makes it you know super simple to create Postgres clusters manage Postgres clusters modify them take backups you know whatever it may be and we'll demo some of that we also have a scheduler which is used to schedule backups amongst other things and this is great because you know you should always be taking backups that's another thing I always emphasize that there's one takeaway from this talk this take backups of your Postgres cluster you know you never know when you will need them and maybe to further emphasize that there was a time in my career like one time where I really had to make a point in time recovery like you know there was something where we had to you know get the data back to some point of time and replay it you know and you know find certain things and thankfully you know now that we had backups you know we we had our transaction history you know back to that point and we were able to you know solve the issue but please take backups it's so important so just finishing up you know this diagram the most important thing when you're running a database is your storage no matter what it is you know your storage is going to be your bottleneck George's gone much faster like SSDs were one of the greatest things that happened for you know database workloads you know the one trick with SSDs particularly back in the day was reliability but you know you know the recommendation would be you know always have you know run in like arrayed 10 mode so that all said you know the interesting thing in you know the Kubernetes no butchered ecosystem are you know the variety of storage classes available or even you know people who you know we have cases where people just use a host path storage and you know pin their you know plus clusters to a single node and you know ensure you know everything gets right into that node but you know storage selection is you know a very you know detailed involved topic the various storage classes have certainly improved you know even you know even since I started our crunchy data and you know it's interesting to see what's you know happening there our team stays on top of it to you know for our various testing and compatibility purposes but you should always consider you know what your you know what level storage you need based on your workload not everyone needs the latest and greatest and most expensive storage for their databases I mean you know am I biased like yes of course I you know I love playing with like the latest and greatest and fastest storage but you might not necessarily need that but if you do you should know how you're able to you know optimize using Postgres you know with your very storage layers so let's talk about high availability which you know was you know the you know the original topic of this this chat this is a really important slide because this explains how high availability works with the Postgres operator but it's also just like how cool it is you know whenever we get tickets about you know support tickets around high availability you know one thing we always try to ask is you know how's your high availability set up in your Kubernetes or your OpenShift environment because your Postgres high availability with the Postgres operator is tied to your Kubernetes high availability and this is a feature and this is why it's really cool so Kubernetes and OpenShift are backed by their own distributed consensus storage system you know you know it could be at CD and you know at CD has its own high availability high availability system built in but the reason why we leverage this is because it minimizes the amount of Postgres nodes you need to deploy for high availability so so let's take you know the raft algorithm you know the raft algorithm algorithm says you know you always need an even number sorry you always need an odd number of nodes to be able to get high availability and I believe you know as I recall the recommended number is five so running five Postgres databases particularly let's say they're multi terabyte you know that's you know that's a very large footprint even running three I mean I like to recommend three is the number that you should run but you know that's still if you have you know 10 terabytes of data that's at least 30 terabytes that you need plus backups plus you know your you know your various logs and wall and you know other things that you need to store so this will be a lot of data floating around I love for changing the you know the OpenShift back storage system you actually only need to run two Postgres instances to get high availability and that's what's really cool because it does help you lower your footprint but you can still maintain a safe distributed consensus high availability so you know what I showed here was you know I just showed three nodes here and these are the actual OpenShift nodes because you know the other thing you need is a you know PG backups repository you know that's the third component for the high availability and the reason you there's two reasons why we leverage that one of course you need to take backups remember the one take away from this talk is please back up your database to PG backups is actually used in the self-healing process so let's say my primary goes down and you know let's say it's down for several minutes you know the replica is promoted we want to bring the primary back into the folder the old primary you know it's gonna be a replica but we need to be able to catch it up to where the primary is so it can you know rejoin you know as a healthy streaming replica we can leverage the PG backups Delta Restore feature to efficiently copy the information into you know the the failed replica effectively reprovisioning it and bring it up to speed and then tie it to you know tie back to the primary so now like this PG backups serve as you know part of you know the disaster recovery system you know it does play role in high availability as well which is you know which is really cool because you know we can leverage all those efficiencies that are put into it to you know quickly heal components within the system so the takeaway from this is that you know the Postgres operator does provide high availability but you know it's leveraging it's leveraging it from you know Kubernetes and OpenShift which you know is like I said you know it is really an advantage I guess you know the other the other thing I should point out is that the operator itself is not providing the high availability because that would make it a single point of failure the Postgres instances themselves are providing the high availability you know they're communicating with Kubernetes to make sure that you know basically determine you know if there needs to be a leader election because you know a primary is down or unreachable or you know whatever it may be so disaster recovery that is it work I would say that it works in you know pretty you know in a pretty cool way and in what we can support so we can actually you know we support a you know a multi repository setup and what I mean by that is that you can push your archives your backups to you know a you know a PVC that's somewhere within your your local OpenShift environment or you know you can push them to S3 or an S3 compatible storage system you know just mean IO and you can actually support both at once and yeah and that's cool because you know you can guarantee that your backups are being pushed to multiple places. Additionally you know you're able to schedule backups as well and you know the easiest way to make sure you keep taking backups is to keep scheduling them or to have a schedule of them so they can keep being taken. However you should also make sure your backups are being taken and you should monitor everything else and you know this is a feature that you know that I'm previewing for you know the upcoming operator release that's coming out towards the end of the month is our integration with PgMonitor. So what PgMonitor provides is you know a series of Grafana dashboards that can be you know read in from you know Prometheus database and it basically shows you your overall healthier database and you know it can you know it can track things or tracks a variety of things and probably to go into that is a full talking itself but you know you know it's essentially a security list of the key metrics need to keep an eye on to detect to try to anticipate issues with your system or you know that it will give you the overall healthier system so for instance if you're supposed to take you know you know daily backups and the backups not taken you know that you know the top bar that you see on this screen is going to go red. You can drill it down to your specific databases or specific pods within your cluster but you know the idea is that you know you have all these key metrics there. Take another one replication status this helps you detect you know what your replication lag is. If you're using asynchronous replication you know this will be this could become an issue because your lag is too much you know you have your specific data loss if there's a failover event so you know you do want to keep an eye on that. The other thing the other thing with this is that you know this works with you know all the upstream components so you know if you already have a Prometheus instance that's you know set up that you're using to aggregate your metrics you know you can plug in the Grafana dashboards and you know be able to pull from that Prometheus instance. One of the dashboards that I want to show is actually related to getting pod specific metrics but this was an issue that we had run into for a long time was that you know some of the key metrics when using a Postgres database are related to your actual pod utilization and you know we had a hard time trying to pull these metrics out in a consistent way. You know one of the people at Countrydata Joe Conway you know he's a Postgres conitor, a major contributor you know and you know now a container enthusiast wrote an extension to Postgres called PGNodeMX that can actually reach inside the pod and pull the information out of the various C groups and it works for C groups version 1 and version 2 and it can pull out you know this specific you know let's call them the OS style metrics from that particular pod so I can see oh you know this you know primary has as much disc activity or you know it's currently you know CPUs currently tagged out 100% of it do I need to raise the limit and help answer questions like that and this is probably you know you're not supposed to pick favorite features but this is the feature I'm most excited for for the upcoming release because particularly you know as an app excuse me as an application developer you know this you know this is this is where I always started to when troubleshooting my system so because to me they're the stats that made the most sense if memory utilization is getting out of control you know I knew that you know there's probably you know there might be a runaway query and I should look into that same thing with like you know CPU was at 100% let me try to find the process that's doing that oh maybe it's a really poor recursive query I ran which was often the case you know I could then go in and fix it I mean kill the process and then fix it but yeah this usage you know oh I'm at like 80% this usage you know is there an unacknowledged replication slot somewhere that's you know causing you know too many wall logs to be retained and of course you know it's nice to be able to look at the charts and get all this kind all this information but it's also good to be alerted to problems and we've pulled together our favorite alerts that will tell you you know you know what kind of errors are going on you know for instance if high availability is you know if you can't heal something you know some things wrong the cluster isn't accessible and it's inaccessible for some period of time you know let's you know let's send a critical alert for that the way to get this alert originally was I I did a very terrible thing I removed the data directly from Postgres and just like totally made it made the instance unusable and that was good enough to trigger the alert I was actually not running as a high availability sense which is why it was impossible to heal so that being said you know we've added a variety of alerts you know is your disk filling up is a replica you know is it lagging too far behind and so on and so forth and you know we hopefully you know you find these useful and I hope you never have to respond to one of those alerts so last but not least in terms of you know the walkthrough demo then we'll try you know a live demo you know adding the ability to administrate your database from a user interface in this case of PG admin we actually did a we actually created an integration where you know you know we're able to tie the Postgres user accounts to PG admin and keep them all in sync so that way you can log into your PG admin data instance and you know administrate your Postgres database so we tried to check all the boxes in terms of not only having the data administration options available making it very easy for the people who do the daily Postgres work to be able to interface with their database so let's do this let's create you know this you know high availability cluster with monitoring and connection pooling and you know all these wonderful things with one command because that's really the beauty of all of it so first off I went ahead I created a Postgres cluster because I think we're going to do have some fun so let's see so we have a so we have a Postgres cluster let's inspect it a bit so first off Pico Pico's you know we call it the command line client for you know for interfacing with a Postgres operator I mean you can interface with the custom resource definitions directly we find this very useful to use command line client and you know there's a bunch of different things for instance I can test if my cluster is up I can see the current disk utilization I can scale it well first I can I can introspect it I can I can check on the health of my backups we take another backup because why not and I can scale it up so yeah so so the Pico so the Pico client is you can see is a Swiss Army knife for being able to handle you know all these different operations within you know your ideally Postgres environment so while that's going on to show there's nothing up my sleeve I created I've also created a PG bouncer which I've been connecting to it is and I have a port for it set up to it you don't need to worry about reading the screen since just a port for it I've also been I also added some data to my Postgres database to you know to basically bootstrap it using you know a tool called a PG bench so what I'm going to do is I am going to start writing data to my database using PG bench and I will also say this is truly a live demo because I'm going to do something I did not test beforehand but I want to see if it works so how often does a live demo actually break on you so typically I've rehearsed them so the probability is very small this is this is uncharted territory but in this case I'm going to try to purposely break it what I'm going to do is I'm going to kill the primary node while we're running a PG bench and we're gonna see what happens so let me find first off when we find an available terminal window where I can do it I realize the font is probably really small here but what I'm going to do is I'm going to run a delete on the primary node look at that so clearly we have the server connection crash but it reconnected just like that so yeah I'm gonna I'm gonna put this into the background so it keeps running so what's going on so let's do it a test on HIPAA so we can see that you know our you know our primary and PG bouncer are off and we you know the original primary was this node the HIPAA blah blah blah blah the new primary is HIPAA ACGJ so so high the ability worked that you know we quickly you know even though we deleted the pod which you know things can happen in HIPAA world in OpenShift world so maybe the pod does get you know rescheduled or deleted but it came back up the interruption was very minimal that's pretty cool that's you can't get much better with high availability on that regard again you know what I mentioned earlier is that if you want to avoid transaction loss you need to have synchronous replication set up which is something that you know we do support from the Postgres operator here let me that PG bench because those messages are getting tiring if I want to create a cluster with synchronous replication I would use the sync replication flag now I mentioned I talk about the trade-off of that the problem with synchronous replication as it stands today is that if your replica goes down so not your primary if your replica goes down technically your primary goes down because you need to guarantee that your rights are gained to the replica now with Postgres 10 and beyond this is solved with a quorum commit and I believe you know I haven't tried it myself but if you do some of the advanced configuration with you know a Postgres operator instance you can probably get the quorum commit I would also caveat your knowledge may vary but the quorum commit you know you can make it a little a little safer to deal with a synchronous replication in the sense of that you know there's still like the performance penalty I mean sorry there's unlikely there is a performance penalty still using it you just need to guarantee that your rights get to whatever you know number post-cause instances that there are but you know it's not it's not necessarily going to cause you know a downtime much like it just a single replica goes down your primary goes down but that's it if you know there's certainly workloads that need it and you know you're willing to pay the performance hit because you're guaranteeing that your rights are going to be in multiple places I'd also say you know leveraging PG back us the way we do those you know help ensure that your rights get you know pushed out to you know the the backup repository as well you know there's always a little bit of a delay because you do you know you need your full right-of-head lock record written before it know it goes out there and the other things that are that are that are cool about this other than showing that high availability actually works is let me see if I can create a PG admin so take a few minutes to come up but you know we'll try directly logging into the database you know via PG admin if I'm able to get all my commands correct because again uncharted territory you know maybe to cover a couple of the other cool features you know you're able to tune your memory settings or tune you know how much you know CPU RAM you want the particular node you know if you decide that you know what you originally you know deployed your cluster with is not sufficient enough you know you can update it later what else can you do you can add table spaces table spaces are good if you want you know there's a variety of reasons where you want to use a table space so table space is an additional external volume that you're attaching to your Postgres instance you know for multiple purposes one it might just be a very large table that or a very large group of tables or very large database that you don't want to attach to your main a Postgres data director it could be that you know you want to take advantage of you know super fast storage for you know a particular data set it could be that you just have a lot of data and you need to you know spread it out amongst multiple PVC so I don't know the use cases do vary so you know you're able to you know add table spaces to you know operator Postgres clusters operator managed Postgres clusters you set the size of the PVC that you want you can do that on a one-off basis so what other cool things as you can see there's a lot of flags they do encourage you to read the documentation I've tried to write the documentation away that tells a story that gets you going to where you need to go as always because it's open source patches are welcome you know people certainly do like to leave their opinions on the documentation we also support an easy way of setting up TLS connections you know this is you know probably I only lightly touched upon this earlier TLS of course is a very important part of you know being able to you know encrypt encrypting communications make sure people are in eavesdropping we try to make it as easy as possible you just have to provide you know a key cert pair and the area you're good to go you can force you can and you can also force all connections to be able to TLS by using the TLS only option so I've probably stalled I hope long enough there should be a PGI been deployed there is let me see if I can port forward to it so the moment I'm going to say make sure I actually the port forward correctly to my computer yep alright so now I'm going to shift my screen one more time or maybe two more times all right so this is PG admin it's not you know as they mentioned the popular user interface for interfacing with Postgres I'm going to create a new user of the user hippo and voila so this is cool because you know this syncs with the administrative dashboard I can you know very easily log into my databases and the which one I ran those you know the PG bench tool on mostly because I want to be able to run some queries against it but I forget where I ran it here we go all right so yeah so here's the here's the here's all of our data try to we can try to query against it or I'll probably know because I'll remember the syntax you know I can create a table right here insert some data then query it just like that and you know again you know what was really nice is that you know I was able to you know run a command on the operator and then you know have everything you know synchronized into the PG admin interface so you know it's it's these things are convenient and it's you know creating ways of you know systematically being able to you know to be able to you know you know manage a whole wide variety of different Postgres workloads and needs so I think with that I will return to the slides the first we saw we actually saw this whole thing created and yeah I mean that's really you know it's really all I have I can you know try to go ahead and you know do more live demos see if I see if I can actually break myself if I try to deploy the monitoring suite I will break myself because I found something you know incorrect in my configuration I stopped and figured quite exactly about what it is but if I can get that working it's actually pretty cool to see all the charts and dials going yeah I think with that I'm happy happy to take questions hey Jonathan's Mike here from Red Hat how are you I'm doing really well interesting demo it's always it's always good to see do the inner workings of something so cool but you know I manage relationships with lots and lots of software partners here at Red Hat and it seems like there's just database vendors coming out of the woodwork everywhere are they all the same meaning like how is how is crunchy uniquely positioned out there to be better than the other 10 database vendors that are all popping up everywhere yeah so you know but there's there's all sorts of different solutions out there to you know how do I store my data and you know it's certainly something I've seen through the years and again you know I'm biased I've been using Postgres for 15 years I've deployed been deploying it successfully in production in for 15 years deployed I know I was running Postgres now even before there was replication or like any high availability guarantees and you know I think you know one thing that you know I love about it is that you know it's it's a very strong and healthy open-source communities you know similar to the way you know Linux is similar to the way Kubernetes is and you know one thing that I've liked at Crunchy's you know we you know we have adopted the Red Hat model and you know we found you know and we find that you know you know we like the fact that everything we do is open-source and we have to support open-source that you know we can make the upstream you know the best stream so to speak so I think you know you know what you know what I like about Crunchy is you know beyond that it's you know how we do focus on you know what I said in this original slide you know beyond open-source it's you know adapting to the modern technologies that are out there ensuring that you know we continue to make Postgres work you know officially an open-shift and you know the focus on security you know a lot a lot of what I found with data security through the years is that you know everyone wants it but they really only employ it you know to you know to ensure that they're compliant with whatever regulations that they are and if you find that you're not doing things to keep your data safe and secure you can actually run to you know a PR nightmare when you know data is breached and you know data security is a whole topic in itself you know we quite spend you know several hours on it and I think you know one thing that you know we try to focus on is not only how to mitigate the risk of threat but you know how to deal with you know what happens you know should there should there be a breach in ensuring that you can minimize you know the overall damage of that so you know that you know that's what I love that's what I like to say about Postgres and Crunchy it's you know the fact that you know you know we love the upstream and you know that's you know that's what we want to support thanks I got one other question that completely other side of the table though crunchy data where'd you guys come up where'd you guys come up with it we always hear lots of interesting stories about how different companies select their name and you know the hidden meanings of why of the name and so forth but why crunchy why crunchy data so interesting I thought you would have asked about the hippo because I think that's you know that's the bigger urban legend well that's I think that's part of it it's like you know there is a hippo crunchy how does that work yeah so so there's many so there's many urban legends around this I'll you know I think you know crunchy data actually came out of a meeting where someone described you know what our founders were working on is very crunchy so that's you know that you know maybe maybe the urban legend for that's a little less but the hippo has many stories and you know the one that I choose to believe is that you know hippo you know hippos are you know fiercely protective of their watering hole and you know given crunches data roots are in you know the security space you know having you know having an animal that is you know very protective of you know your core acid you know the watering hole your data lake if you will you know it gives a strong you know sense of assurance that you know we're looking after the integrity and safety of your data you know another one that I heard is that you know hippo you know when you see a hippo in the water you know you only see its eyes but you know there's a whole lot going on underneath the water and you know that's sort of you know like a principle of security is that you might only see one layer but you know this lot of other layers going on there's at least five other urban legends I've heard around the name but you know those are the two that I particularly like gotcha okay so what's next what do you uh I know we're just about out of time how do people get in touch with you and did they just send an email to Jonathan or you know call your house phone yeah yeah well fortunately as you can you know see by I don't know if it's still sharing my kitchen or not we're fortunate that that house phone is not is not actually connected to an outside line but yeah email me you know jothin.cats at crunchydata.com you Twitter me at jcat05 go to the website crunchydata.com see what we're doing plenty of people come to the actual github repo crunchydata slash postgres dash operator certainly get a slew of questions there but yeah you know that's the you know that's the easiest to what's next if you want to get in touch okay and there was there was another question Jonathan just I know sorry we're over time but CRD yeah yeah Shannon was asking about you know can you share your CR by any chance and I was like I'm not sure what CR is but Shannon just it's the it's the is what the it's what you used to create the actual pods yeah so the see it so all this is in the documentation now we have this section just on the custom resource so yeah the CR is the CRD is the definition itself we actually we in our documentation we have examples of how you can create custom resources if you I mean if you I could show what one looks like real quick one that's already created yeah sorry if you have just the YAML it's fine just kind of for the learning purpose to spin it up it's kind of easier have a sample when you mentioned that documentation is that the access that crunchy data dot com document right it is all the the github one it's the github one well get up well get well github it's in the access that country data dot com documentation okay yeah I found that github one and points it to the crunchy data but I would just kind of want to get a quick start and grab your CR and then I can just pop them on and then try yeah yeah so this is an example CR this is the cluster that I was demoing from and it is in the github which what you're saying correct well the correct we it's in our documentation so we have an example for how you can create a post-bust cluster using a custom resource okay one more the reasons why we have the command line is you know it makes a little bit easier you know it's much easier to type he go create cluster hippo than you felt the CR but that's it you know we do we do provide examples for how to do it from you know using just the the custom resource okay cool thank you welcome all right then I think we came to the end of our hour here and it was a bit of a tour to force and I really appreciate that but thank you so much Jonathan we'll have crunchy back again in a little bit of time I think next month we'll have some of the folks doing some spatial GIS but this has been an awesome session so thanks Michael again for arranging this and Jonathan for sharing your kitchen with us yeah happy happy to invite people into our kitchen that we can only fit so many