 Hello, everybody, and welcome to this talk at KubeCon, Cloud Native Con, about Postgres extensions in Kubernetes. In this talk, I would like to introduce a novel mechanism built into an open source project to allow dynamic loading of Postgres extensions in Kubernetes. This is a very novel way that enables to run any extension that you may want to run in, again, Kubernetes. So, first, about myself, who am I? You can just get to the slide, but basically, I am the CEO and founder of a company called Ongress. Ongress means on Postgres, and that's pretty much everything that we do. I've been myself, Postgres DBA developer, application developer for more than 20 years, and I like to specifically focus on R&D on databases, specifically Postgres. I also like to speak at Postgres and all the types of conferences online. I actually have to say that this is my first talk at KubeCon. I've tried four times before I've been rejected so I'm very happy to finally made it. Finally, very happy to be here with you. So, let's look at Postgres extensions. Let me do a one-on-one on Postgres extensions in case you're not familiar with them. So, Postgres extensions, you can think of them as the plugins of a browser, right? They're software vandals that you may install that enhance the browser functionality. So, Postgres extensions are pretty much the same but apply to a Postgres database. They extend the database in so many ways as we shall see. They are actually one of the most often cited best features of Postgres. So, when someone speaks about Postgres, very early on discussions say extensions are one of the very best things of Postgres because actually they transform Postgres, they significantly augment the Postgres functionality and make it a very adaptable and rich database. What is also important about extensions is that they do not follow the yearly cadence release cycle of Postgres. The new features come to Postgres only once a year. So, if you want to develop new features and you pack them as extensions, you'll be able to deliver these features to your users any moment. And so, that's why you're normally developed by third parties. The extensions are also important for the ecosystem because they have prevented many forks of Postgres because if you can run everything as all your functionality as an extension, which is often the case, then you don't need to fork Postgres and it's just something that built on top of Postgres and can run on top of any existing Postgres without having to fork it. And most of them are open source. So, what are some examples of these extensions? PostGIS is one of the most advanced GIS systems available for databases. And this is just a set of extensions on top of Postgres. Just run a simple commands and you'll get all these extensions for you on your environment. Or look at Citus data, which transforms Postgres from a non-distributed database into a distributed database with charging capabilities. And all this is just by installing a single extension called Citus. Also from the same thing, there's PGA auto failover which is an extension that adds automatic failover capabilities to Postgres without require any external component. Again, just an extension. There's also a great example of timescale which is yet another extension that adds time series capabilities to Postgres including automatic partitioning of time series data and advanced function for querying time series data. Or somebody be extension that creates an index type that behind the scenes is an elastic search class where you can do advanced full text search without living Postgres. So, what is the extent of which functionality you can build on top of extensions? Well, first of all, you can add any database object. Database objects can be data types, function. This includes aggregate functions, operators or even procedural languages. Yes, procedural languages are the languages that are used in Postgres to call functions or procedures and Postgres allows you to bring your own languages. So, you could bring a procedural languages on extension. They could also be what they're called another database storage engines which in Postgres are a table and index access methods which allows, is an API part of the Postgres API that allows you to create, you can do this on extension to implement the different mechanisms for storing the data in either tables or indexes. You could, there's hooks which you can call to implement extension functionality but in reality in an extension you can call any internal Postgres function. Everything is allowed from this perspective. So, you could also do replication wall that affects your ability of the database. You can create background workers which are processes that are managed by the database to perform different tasks. And you can even include external files to the database like binary or all the kind of data that may also compact as extensions. So, how do you program an extension? Well, if the extension is just about creating objects in the database, you could use either SQL or PLPD SQL or any other procedural language to create these objects. But for the most part, extensions are programmed in C and compiled into a shared library that can be loaded into the database. These are the extensions that actually allow you to call all the methods inside Postgres code. But in reality, anything that compiles to a shared library can use with extensions. So, for example, you could use, some would be extension that I just mentioned before is programming Rust. You could also use C++ or any new innovation is welcome bring your own favorite language. If it can compile to an extension it will run or should run. So, how do you use the extensions? Let's assume for a moment that they're already installed. So, if they're installed, they should have some files under typically a user shared Postgres TOL may or version extension directory. There you may find the control files which are metadata about the extension and SQL files which are the commands required to create the objects that will represent the extension if that's the case. Now, once the extension is installed if it is a C extension which is gonna load the shared library into the Postgres space, then it probably needs to be loaded via a mechanism which is a PostgresQL.conf configuration parameter called shared preload libraries. This is a list CSV, a list of extensions that you want to preload when the database starts. So, you add the extension to this parameter and then you require database restart because those are loaded into memory at the start of the database. And we'll still see how it's also important later. Then you connect to any database where you want this extension just to say functions, data types, whatever to become available. And just around the SQL command create extension extension name. Done, that's all you need to run the extension. Last but not least, some extensions provide additional configuration parameters and you can then write them for tuning the extension, configure the extension on PostgresQL.conf. But how do we install the extension? Before I said, let's assume they are unregistered. How do we install them? Well, there's no such thing as an extension store. So basically search engines, right? In reality, some extensions come packed as their packages or RPM packages. And there's also a repository called PGXN which also comes with a client tool that you install and helps downloads the extensions. The, this is a fantastic tool that you promised that PGXN extensions they are in source code form. So you need to compile them. PGXN automates this process. But if and only if you have already installed all the build libraries plus all the build libraries and dependencies of all dependencies that the extension that you're gonna compile requires which oftentimes is a hard task for the user. So it's great, but it's not batteries included that they're providing the source code for. So this is how extensions are in Postgres for the general part. Now let's look at extensions specifically in Kubernetes and what problems can when trying to bring all these extensions into Kubernetes. Just before doing that, let's have a look at what is the available set of extensions on cloud managed offerings. So if we look at Postgres, Postgres basically includes 76 extensions. There's a small typo there inside, 76 extensions. Eight procedural language extensions which are procedural languages for creating functions and procedures and 68 so-called contrary extensions. They all come with a Postgres source code package. So total of 76. You will look at the main cloud offerings like our Amazon RDS and Aurora, Google Cloud SQL or Azure in its different offerings. The number of extensions is between 50 and 70 in total. And if you look at the core extensions, this country plus this procedural language, most of them are available, but definitely not all of them. In some cases, even 32 of those core country extensions are not available to use. You cannot install them. And with regards to third party extensions of which there are hundreds or even thousands available in the world, a very few, a very small number of them are available. The best case is Amazon Aurora where 38 extensions are available. Now, what do you do if you want one of the extensions that are not present on these environments because you cannot load them. It's they need to be allowed on a one by one basis because this is a managed offering, right? So you're out of clack. You either need to do without extension or manage database yourself. Now Kubernetes operators allows you to enjoy and experience similar to manage Postgres as a service. So maybe Kubernetes operators is the answer to this because there maybe you will have more freedom. Well, the reality is that the landscape looks pretty much the same. We look here at some of the open source Postgres operators that were able to run the test to run the number of extensions available. And the Salando operator, for example, brings around 90 extensions of which around 40 are third party. It's not a bad number, but all the operators are definitely more limited in terms of the extensions and particularly third party extensions that are open. So what I would like to introduce today is a mechanism developed in an open source Postgres operator called stack risk, which in version 1.0 introduces a mechanism for dynamic loading extensions, which basically allows any number of extensions to be loaded. Right now there is more than 100 extensions available, but this number is dynamic and it's growing every day and it will cover hundreds of extensions over the future. Now, if we wanted to do this in a naive approach, let's call it that way, we could think of, okay, let's back all these hundreds of extensions into a single container. Well, that may work, but it has some problems. First of all, that may result in a gigantic container image, but the extensions are normally not that large, some of them are like post-GAS, it's weights in around 200 megabytes. In reality, you're gonna end up with a gigabytes database, gigabytes container image. Well, that's not the end of the world, but it's certainly not ideal. But also from a security perspective, you're loading hundreds of different projects, which represent a huge attack surface, and that is not ideal from a security perspective. But last but not least, when you want to introduce a new extension or a new extension version to fix a security problem on one of the available extensions as part of this container image, you need to bring a new image. And this means restart the pods and this means restart the database, which is a quite disruptive operation for a database. So it is something you don't want to do when running a database to frequently restart the container just because you want to add a new extension or a new version of an extension that appeared as a result of a security problem with the previous version. So what is the mechanism that this open source project, Stacris, has introduced that allows dynamic loading of Postgres extensions? Let me describe it first. So first of all, no extension is built into the container image. Then we could think, okay, we could load the extensions into the ephemeral storage of the container itself. The image is immutable, but once the container is running, there's the ephemeral storage of the container and maybe we could load the extensions there. That would work. However, it has some problems. If for example, the node dies or the container is restarted, this is lost. You will need to redownload the process all the time and reprocess the extensions all the time. That's also doable, but it's not ideal. Instead, what we do is download and install the extensions as on the PV and the persistent volume. It is typically external or distributed storage so that it will persist to survive and node failures or pod restarts. So once you have installed the extension once, it will probably be there all the time. It will have basically the same lifecycle of your database that data, the PG data directory in Postgres. We also use this functionality to relocate Postgres binaries into the persistent volume and simulating the extensions. This is also useful not only for the extension process, but also for major invention upgrades. And after this, the most important and significant feature introduced by this functionality is what we call the cluster controller. The cluster controller is a Kubernetes controller or operator, more precisely a controller that runs as a sidecar along the pod. This is because this sidecar requires access to the file system. It's gonna download the extensions and pack them and install them into the PV, but it operates as a small controller on itself. And I'll explain the functionality. Actually, I don't know if this is a, this model of having this local controller or hierarchical controller that also participates in part of the reconciliation cycle is something that is common to other projects and this is new to these projects. In cases, as far as we know, it's quite unique in terms of Postgres, but maybe it's a pattern that could be useful for other projects. And the extensions, they are provided in compile form and they are provided as an external repository. So from the architectural perspective, we see here that there are two operators or two controllers, the main stacker's operator, which is the usual Postgres operator you can think of, which is on the top left corner of the diagram. And when the user instantiates a CRG of type f2sv cluster with a spec field that basically specifies that you want to install some extension, we're also gonna see how this works. Then the operator is gonna perform some actions and it's gonna instantiate the pods and so forth. Within the pod, this local controller, the cluster controller is gonna read the information on the CR and it's gonna interpret which extension needs to download, it's gonna connect to this external extension repository running on the internet, download the extensions, unpack them, verify digital signatures, create symlinks and in general, make them available to Postgres and they will all be stored in the persistent folder. Finally, there is an optional component which is a cache proxy that will allow to cache locally extensions so that if you need to download the extensions in many, many clusters, you can cache them locally and it's also provided as part of the open source stuff. So let's see how it really works with a real code example. So this is the minimal CR, minimal sd cluster that you may need in this type res to create a cluster with two pods for the latest Postgres version taking about this storage for each node and this will also include connection pooling and envoy proxy for Postgres traffic and a few high availability, a few other bills. Okay, so this is a basic cluster CR that you may want to create. So let's say that the cluster is already created, let's add an extension to it. So to do so, we just need to add these two lines that are involved here, extensions and the name of the extension. We could also add a specific version for the extension but this is optional. If not, Stacros will resolve it. Now, how do we know which extensions are available? Their names or their versions? Well, there's three ways and you can do this. You can either go to the Stacros web console which allows you to search for extensions and provide descriptions and see their available versions. You can also check the documentation or if you're gonna go the hard way you can use JQ to parse the repository metadata. So let's go back to the extension. So when we create these and do a Qtab apply and we add these extensions named PostGIS, what happens is that there's gonna be a new thinking webhook that is gonna retrieve the extension metadata from the repository and it's gonna add these to install Postgres extensions section which is basically additional metadata about the extension including the internal bill number, the Postgres major version with which it's compatible, the detailed PostGIS version in this case 301, the publisher or what is the repository to get the extension. If this extension would not be found, a warning would be issued. Then there's a validating webhook that checks that the requested extensions because there have been expressed declaratively that you want this PostGIS extension to be present that is also present in the to install Postgres extensions resolve metadata. If there is no matching if PostGIS would not be both on the top section and the section below then the validation webhook would reject this change. So because the extension would not be available. Now, if this process goes on, then the cluster controller will come into place and the cluster controller will read this CR and understand that there's a willing to install the PostGIS extension. This cluster controller runs its own reconciliation cycle that is watching on these extra clusters and we'll use a status field that post statuses as you see here in the screenshot that basically allows for checks for every pod whether the requested extensions have been installed. How it does so because this post statuses is an array that takes for every pod, looks into the install Postgres extensions and you need to find the same metadata here that is available in between install Postgres extensions. That's that will list the extensions installed. Again, you need to be installed for every pod. And then updates the status, bringing back information feedback to the user about which extensions have been installed successful. Also, following standard Kubernetes patterns, it creates a set of conditions in also in this part of the status field. One of those represents the status of the operation whether this installation of the extensions work or not and second whether the pod requires restart. Some extensions as I mentioned are loading dynamic libraries and under certain conditions they may require database restart, database restart also. So this condition is also signaled by the control as part of the status field. So let's see a demo. Here we have a cluster running already called Qtcon. Sorry, it's not very original. So it should be an SD cluster, sorry. Yes, thank you. Called Qtcon, right? There we go. Which it has some pods running. Let's check that we can connect and check Postgres and see what available available extensions are there. So we'll just do an exec on Qtcon zero, container Postgres util because we're gonna run command line. So this gets us connected to Postgres. Sorry, my bad. All right. Should get connected to Postgres and there's a function here in Postgres called PG available extensions that will list the available, not loaded just on the file system. Not my day, available, sorry. The available extensions in the file system. And as you'll see, there's no post-GIS. Actually, there's a very small amount of extensions here. Normally you would find those and some extensions in a normal installation. These are just the extensions that are used internally. So they are built by, they're brought by default but they're also installed by the same mechanism. So let's look at the Jamo file. It looks like this is exactly like in the presentation. And let's add this extension that we want to install. So under the Postgres section, we'll call extensions and we're just going to say name post-GIS. Quite simple. Let's just keep it all up like this and the extension should be promptly installed. We can check this by checking the, let's for example, the SQ cluster cube con. So let's do again, SQ cluster cube con. And we will see here that on the request section, we just said that we want post-GIS, also the mutating webhook added the version here. But then there's going to be this to install Postgres extensions section where the extensions are listed in this case. We can see here that post-GIS has been resolved to be version 301 that is going to the publisher is this one for the compile for this major postgres version, this bill number and repository, et cetera, et cetera. As you see, the other ones are also listed here because they are downloaded using the same mechanism. Then if we're looking to the status, we'll see that the condition says that it didn't fail. So which means that extensions were successfully installed and that it may require a restart at some point. And last but not least, if we look at the pod statuses that we will see that on container cube con zero, the post-GIS version, yes, one install as well as in the container cube con one. So we have post-GIS here. So yes, it looks like everything worked well. So if I connect again to the database and run the same query, now I see post-GIS. So it has been indeed dynamically loaded into the container, the PV and made available to postgres. So actually I could just say create extension post-GIS, oh my God, post-GIS. And now I have this extension available. And I can, for example, list the objects that are brought by post-GIS, which are a lot of them, right? So this is how it looked like. If we actually look into the web console, which I also have here handy, we go to the configuration tab, we will see that it is indicated here the list of extensions that have been installed in the cluster. You can also check from the web console which extensions are configured here. And if, for example, you may want to add new extensions, you can just edit the cluster, go to the extension section and pick any of the available extensions to install. Now let's finalize this demo by looking into the container image to see how this looks like. So let's connect to the cluster controller, which is the sidecar controller that is running, and run a shell here to see where these extensions have been downloaded. So the PV is mounted on barley postgres. So if we go to barley postgres and look here, we see that this is the data for the database. We see the binaries for postgres which have been relocated and the extension. So let's look into the extension directory and they are split by postgres minor version and then build number. And here we have the downloaded extensions. So here we see the post-GIS that we requested with the tar package, which contains all the code about extension, digital signature and a file indicated whether it has been installed or not. Then these extensions are unpacked into this directory under the USR folder. So if you get under USR, we can look at the share, postgres 13 major version extensions. And here we see post-GIS with all these control and SQL files that are created when you install the extension, which happened to be symbolic links to the real directory where the postgres binaries are being run and are accessible to postgres. So even though it looks a bit complicated, the good thing and the good news is that all this process is absolutely transparent to you. So in summary, what are these extensions mechanism, dynamic loading mechanism for extensions built by the stack-risk open-source drive? So it is a mechanism that obviously, as I mentioned before, allows loading dynamically extensions, removing loading and loading, upgrading extensions, they're not built into the container image. And this is very good because the container image becomes extension less. It has two main advantages. One is increased security because the surface exposed surface, attack surface is much less. There's no dynamic, this dynamic libraries and potential source code of other extensions being built into the container, but it also decreases the container size. Also, this enables a declarative approach to loading and unloading postgres extensions because they are built into the SG cluster CRD or via the web console, which piggybacks on the same system internally, you're able to express which your intent with these which extensions you want to have. And actually on a future version, we'll all mechanism to restrict by an administrator which extensions may be used on the SG cluster CRD so you could limit. If you want to, which extensions can be or cannot be downloaded. Uninstall, obviously. And the main novelty that has been introduced, definitely in the postgres world, not sure if a similar approach has been done with other operators that are not related to postgres is the use of what we call the cluster controller or a pod local controller, which is a small controller running as a side car that has access to the file system in the pod so that it can watch on some fields on the CR, the SG cluster CR, watching for extensions that need the user wants them to be installed, then downloading them from the external extension repository and bringing them to the local file system, downloading and packing, verifying the digital signatures and installing and making them available to postgres. This is a pattern that we believe is quite useful, potentially also for other projects. We're also looking for feedback. So if you have any idea or willingness to share some thoughts on this, feel free to do so. And at the end, what this allows you is to run postgres on Kubernetes with practically unlimited number of extensions without having to download new images. So it allows you to run as of today more than 100 extensions and with a growing number in the future. So if you faced any restriction by running on a cloud managed environment, now you can get the same experience by running a stagress and postgres operators in Kubernetes with any extensions that you may need. So that's it. I'm gonna open for questions. Thank you for listening so far. If after this talk, you still have some questions when I reached, yes, please join the Slack and or Discord Stagress community services. So you can join us and continue the talk there. So open for questions and thank you very much for listening.