 We continue with the day of Postgres today with Hari on the unleashing power of Postgres with the multi-master replication. All right. Good afternoon. All right. The sessions after the lunch are the hardest to attend, not to sleep. I'll keep this session as interactive as possible, folks. But if you have any questions, you don't have to wait till the end. No, ask me the question. So the session is a lot more interactive. Thank you. My name is Hari Kiran. I'm a customer success advocate at PG Edge. It's a startup working in the Postgres ecosystem, adding the multi-master features to the whole Postgres open source community. I'm going to be talking about, no, I'm going to scratch the surfaces of how do you transform your single master or a single primary database into a multi-master and make sure you are still within the boundaries or within the Postgres open source ecosystem. That's the main goal or the main motto that we have at PG Edge. We have another product, which is PG Edge Cloud, which is more of a SaaS-slash-Debase offering. Now, if you look at what are some of the problems with single master or the current Postgres score, which allows us to scale with a number of read replicas, we can scale using our logical replica. Some of the issues that we have found with extensive customer outreach and an extensive no market research are these three pivotal problems. Application users, no, I'm from India. I've seen some of the folks from Australia, some of the folks from Europe, US, right? So our application users are spread across the globe, right? But the application servers or the database servers, for now, they are still not yet global, meaning you can set up a cloud instance on AWS or some of the cloud voice that we have, right? The Google or the Azure, but you will end up having one primary instance, right? We see that as one of the problems. We see that as one of the issues and looked at, hey, what are some of the options we could look at in resolving them? And now one other point is the databases are hard to go global. It involves cost. It involves operational overhead. It involves people. It involves the commissioning and the provisioning and the IT infrastructure overheads. Now even if you look at the very, very foundation, the building blocks of setting up a database instance, right? So on your left-hand side is the primary and replica, which is the conventional architecture that we set up. And on day in and day out, you use Ansible or Terraform or even a simple Bash script can go ahead and set up this primary and replica. And we would have done this now a thousand and one times. On the right-hand side, the subtle differences that we would like to introduce is the master, multi-master, right? So if you look at the computing at the base of it, the bottom of your screen. So you've got multiple database servers, all of the servers acting as primaries, which means you will be able to go ahead and write to each and every one of those nodes, right? So that's where the unidirectional replication slightly transforms and advances into a multi-master slash bidirectional replication. Unidirectional replication is in something new for the Postgres ecosystem. It has been there in existence since 9.4. So we had a patched version of 9.4, which allowed us to install the BDR 0.1.0 extension that made the database to a level, you know, bidirectional compliant. Now if you look at some of the, you know, advanced users or the enterprises, what are some of the key areas, key areas that some of the companies or some of the enterprises are focusing on, right? Availability is at most one of the important ask, I would say, right? One of the important KRS, right? We are talking about, no, we are not even talking about, no, two nines availability anymore, right? So three nines and four nines availability is something that a lot of the enterprises vouch for, a lot of the enterprises are looking for in the database ecosystem, right? Database platforms and from a global usage standpoint, right? Like I mentioned about, we've got users across the globe, we have got applications across the globe, right? How do we make sure the database side of the story is also global? Multimaster is one of the ways. I'm not saying it is the only way. I'm saying one of the ways is to go with the Multimaster database ecosystem, right? Now this is a lot of the, you know, distributed database engineering folks, you know, distributed database companies are, you know, looking at, right? So these are the two instances that we are mentioning here, but now to the lay of the land is far more bigger than what we have, right? So these are the two studies that I often quote whenever I'm talking to the customers, when I'm talking to the engineering folks, right? Saying, hey, if you, for you, the availability and latency is an important factor, right? You might probably look at these two use cases, right? You might want to take a look at these two instances where you could have safeguarded your IT infrastructure, you could have safeguarded some of the investment that you have made in setting up the database and making sure the platform is resilient, you know, making sure it is fault tolerant and so on and so forth, right? The overall IT acumen that should be part and parcel of any database ecosystem per se. Both of them, right? So whenever we look at these latency plus lack of availability, we lose customers and nobody wants to lose any customers, right? We want to gain customers, we want to acquire customers and make sure we are on the win-win situation, right? So both the customers have to be 100% available or at the most 99.9 and available and making sure the applications have minimum to zero latencies, right? So there is a concept called as closer to user, which is to set up the applications, no use, no cloud flares, the word cells, fastly we have got one pod called as varnish software. They are also in this CDN ecosystem. Every one of these companies or enterprises are going to the route of getting the compute, making sure the data, making sure the entire user experience is closer to the user, right? Now one of the solutions like I said, right? One of the solutions is PGH platform. This is a pure Postgres and an open source version available. I'll talk about the GitHub and then the URLs, but just to look at a very, very high-level perspective of what PGH platform is, right? It is optimized for the network edge, which is again, no closer to the user and then making sure you are incorporating some of the core fundamentals of edge computing into your database platform. And we support, we talk about this multi-availability zones, multi-region and then multi-cloud deployments. As of now, PGH platform supports all the three cloud CSPs, right? The Googles, Azures and the GCP, right? And then hybrid cloud deployments. So I've been a big fan of, I don't know, big advocate of the cloud concentration compliance, right? Which is, you may be having a relationship with one of the CSPs, right? The idea or the point that I'm trying to make is you are risking cloud concentration, which means if that particular cloud has got any issues on that specific region, right? So you are jeopardizing the customer experience, you're jeopardizing a lot of the other stuff. Now, some of the key features of PGH platform, if I say, again, this is built on the foundations that we have had for a decade, right? Now, Simon Riggs, one of the earlier architects of logical replication, rest in peace. The gentleman passed away in an accident a few weeks ago. He was one of the architects of the logical replication that we are using now, right? And the BDR, the bidirectional replication open source platforms that we have had. So this, the entire PGH platform is dependent on or based on the PG logical, which is again, one of the oldest extensions a lot of our Postgres engineers know about. And the PGH platform is an asynchronous database replication, I would say. And whenever we talk about multi-masters, so you're talking about data being processed, the data being inserted, returned to, read from multiple master instances, which makes sure there are a lot of guardrails that needs to be implemented. One of them is the conflict resolution and the conflict avoidance. So in a very, very basic sense, if I have to say there's node one and then there is node two, both of them are trying to insert a primary key value. You are ending up in a conflict situation, right? For us to make sure the conflicts are not going to be arising, PGH's platform has got a couple of guardrails in place. One of them is the CRD, it is the conflict-free replication data types, right? I'm not going to go in depth on some of the points that we have, but I would definitely scratch the surface, right? So again, another important point is the latency-based routing, which is to find the nearest node. Say, for example, a connection request is coming from Thailand, a connection request is coming from Australia. So you would have servers that are closer to that region and you will identify in an intelligent fashion, hey, what is the nearest node available? And make sure to send out all those connection requests to that nearest node, right? That is the latency-based routing, automatic routing, right? Again, if you are talking about incorporating this logic into your application, it is to go ahead and change your code, change the whole application behavior, and it involves a lot of development activities too. In order to avoid that, we made sure the latency-based routing for automation, when automating the whole connection concentration layer to the nearest node is available with PGH out of the box, right? And one other important thing is the anti-chaos engine. We call it as AS, as the name stands for, it is for something superior. So we have AS from a validation standpoint, which is, hey, there is node 1, node 2, and node 3, all of them are speaking the same language from an LSN standpoint, from an SEN standpoint for all the Oracle folks who know about SEN, and making sure that all three nodes have the latest and greatest essence and consistent data, right? I'm going to talk about the cap theorem and what is the components, two components of the cap theorem that we have incorporated as part of PGH, right? And data residency support again, right? If I have to quote an example of data residency, there's no more good example than GDPR, which is to hold the data within their own region, right? So you will be able to incorporate data residency support out of box, which is to make sure, hey, the region that is available, now they've got three regions and then one in Europe, all the data that is processed within Europe stays within Europe, right? So you need not have a selective replication between the node that is available in Europe, and then you don't have to transmit any data that is GDPR compliant to the other multi-master nodes, right? Which is again, an out of box feature that is part and parcel of PGH. And I keep saying that open source model and then standard postgres, right? If you look at this diagram and it's very clearly evident, what is the standard postgres engine and what is part and parcel of PGH? What is part and parcel of the PGH offering? So if you take a look at it, the top two blocks, right? The PGH CLI and the SPOC are the only two blocks that are contributed or not that is part and parcel of PGH platform, right? All the other components, right? Including the standard postgres engine or some of the extensions we have, right? Now postgres is famous for its extensibility, meaning if a feature is not available in postgres core, you will be able to go ahead and look on the internet and say, hey, what is the extension that I can install on my postgres and get out and perform my tasks, which is not available in the postgres, no, within the postgres core and contrib modules. Now the SPOC is the multi-active replication, which has got that inbuilt conflict resolution in place, right? That's the extension. You will be able to go ahead and download the source code, look at what is the part and parcel of how SPOC is built, right? What are some of the building blocks? Are the architecture principles that we have now incorporated from the erstwhile BDR systems, right? The BDR-1s, the BDR-2s, and the PGLogical-1 and 2, which are still an open source extensions that are available. And like the categorization here itself is absolutely clear, right? Now crystal clear as to if there are any other third-party applications, if there is any other components that you need to have up and running for setting up a PGH platform, right? So you should have your standard PSQL postgres engine and the bunch of extensions that PGH supports, right? Some of the key differentiators that I touched upon, right? So the standard 100% postgres, which is, no, if you have a database, postgres database server, which is running either postgres 14 or 15 or the latest and greatest of 16, you can download the SPOC extension from GitHub. You can start working on setting up a multi-master. There's no enough documentation. No, we have made available on the PGH.com, right? And this is another important component, right? So the open source versus, no, we are, for God's sake, we are talking at FOS, fully open source software, right? So this is another important, the fabrics and fundamentals of the PGH itself, right? So if you wanted to be as open as possible, that's why we have this SPOC extension and the Node CTL components already available for download. The source code is available for download on GitHub, right? And one other important thing is, how do you make sure the user experience or the way you are installing the PGH plus postgres is made as easier as possible, right? So all of them are available. And this is the cap theorem that I was referring to and PGH is eventually consistent AP, right? Which is availability and the performance, right? So we are not looking at the consistency, right? So that's one of the reasons we say asynchronous versus the synchronous replication. So whenever we talk about synchronous replication, we are curtailing ourselves from a right latency standpoint, right? Now we are introducing the way, in my opinion, we are introducing latency when we are doing in the rights, right? So though we have got multi-masters in place, though we have got multi-region support, it still hampers your right acceleration, right? One of the reasons why we consciously took the asynchronous way, right? So we have spoken enough about PGH, how do we go ahead, how do we go ahead and take a look at some of the deployment options I said, right? So the most easiest of all is to look at the Docker, right? You just need to download the source code and go to the GitHub slash PGH and take a look at the Docker component, the Docker side of the code base, download it, you will be up and running on a Docker setup within a few minutes, right? I have done that multiple times and then the whole user experience that we want to bring to the community, right? Bring to the developer community is make it as easy as possible for you guys to set it up and then start working on it, right? And then we have got, I mean, this is the GitHub URL. The slides will be made available by the FOSS organizers at some point of time, but no, this is the GitHub where you can go download some of the examples and feel free to contribute back to the code repository if you have any thoughts or any issues that you have found, right? And this is the last slide that I have. Any new Postgres technology that gets in, right? Without customers, no, we are, no, I mean, without customers, we are all doing nothing, right? So this is one of the customer use case that we have from a financial organization. I wouldn't name the financial organization now, but if you look at the general use case, right? The whole, the customer experience was at setting up numerous database servers, having a large database team trying to look at managing and maintaining that database. Now, they have shifted to PGA's platform. They're able to do, I mean, database distribution successfully. They're able to lower their IT operational cost and at the same time, able to work, make sure the Postgres engineers that are working on the erstwhile Postgres instances does not have to do any hand holding, meaning you are as good as working on a Postgres database, right? The added components are the Spark and the PGA's platform utilities like Node CDL and stuff, right? And you know, these are the benefits that the customer has made, which is highly available system, ultra high available system, making sure the latency numbers are as less as possible. Thank you so much. I'm open for any questions. Okay, do we have any questions? We have. The multi-master application that you have for a conflict resolution. Can you elaborate more about if I have to put it very, very, very simple terms, the last commit wins is algorithm that we are using, making sure whatever is the last commit that wins the conflict. And more often, one of the lead architers that we have as partner parcel of team, Janvik, was the author of Sloney. He always makes sure that instead of having conflict resolution, promote conflict avoidance, which is to make sure there is never a conflict within the database and within this Spark extension that we have built. So instead, looking at conflict resolution, we are pushing towards the conflict avoidance. In case the conflict arises, there are guardrails to make sure that conflict is resolved. Okay, the long-term issue on multi-master is deadlock. And the long issue about multi-master is deadlock. Deadlock, yes. And deadlock can happen at any time and we cannot do anything because both of them is independent. Right. What do you mean with conflict avoidance? Do you regulate the transaction between two machines? So there is something called as snowflake sequences that we have introduced. That is, again, a public good rapport that is available. And Jan is going to be talking about in the PG Dev, right? That is the conflict avoidance that we have introduced as part and parcel of Spark. Okay. Thank you. Cheers. Any other questions? I have one. When you say multi-master, do you write all the masters or do you write the primaries or do you write only one primary? We write to all the master instances that are qualified and set up as masters. So if you look at the whole database set up per se, you'll have no three regions, no three nodes, and it can have its own logical replicas and physical replicas. You qualify these three nodes, the three regions and three nodes as masters. Yes, we go ahead and write to all the masters and they eventually get consistent. Okay. Any other questions? All right. Thank you very much. Thank you. Thank you all. Is that an HDMI thing? Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Okay. I guess we're ready for the next talk. Jill from Exit CrossFit will take us to the most specific assessment. Hi, everybody. So the purpose of this set of this book is the part of the way to change the security. And first of all, I want to remind you that I have very bad memory as a result of most of our 80 security checks to perform on a political cluster. And honestly, most of the time I am missing some of them. So this talk is more about how to automate this test to avoid missing some checks. My point of view is that for the community we need such a tool to be able to enforce the security of PostgreSQL cluster in the world. And because in my DBL life, I see a lot of PostgreSQL cluster with very few security. So we need such a tool. Some words about me. I'm the author of PG, PG Badger, PG Formatter, PG Clue, and a lot of extension and all the tools for Postgres. I'm also the CTO of the exact cluster company, which is specialized in database migration to Postgres and PostgreSQL support too. She's a company that is sponsoring all my developments and holds the tools that you can find on my GitHub repository and some other GitHub repositories. Thanks to them. So the question is how to secure your PostgreSQL cluster and what to check exactly. This is the main question. The problem is to know all the checks have to be performed and there is several ones, as I said. So hopefully there is a document what was created by the Center of Internet Security, CIS, which gives almost all the lists of the security checks that have to be performed on a PostgreSQL cluster. So it is available online for free. You just have to register and you will have this document. There is also some recommendation from the DISA, which is for most of the part redundant because it was with the first one and it's much more difficult to read. But there is a lot of information too. About the data check security, there is a lot of articles produced by the European community, which is under the law GDPR, called GDPR, where you have a lot of recommendation to protect user data privacy. So I will not talk about this here because what I will present to you is very new and I have not had the time to work on it, but it's on the way here. There are different areas where we have to check security and the first one is the operating system. Operating systems are a lot of security questions, but I will not talk about Linux parts. This is the job of the CIS admin to secure your Linux box. Here, this is just the checks that must be done for DBAs. So ensuring that your PostgreSQL package come from a no-sus, and especially I recommend to use the PostgreSQL repository. This is something especially for maintaining your PostgreSQL cluster and upgrading your Mino version. Looking at if PostgreSQL starts at boot can be something valuable to do and especially verify how you are following the Mino version in your PostgreSQL cluster. This is really important to follow the Mino version and same, it's very, it's often that we can see some PostgreSQL cluster with several Mino version back from the current one. And it must be checked every time that you have the last one, especially for security updates and bugs, of course. But as you do, you have to be sure that no one, no other Unix user on the PostgreSQL cluster is able to use the PostgreSQL cluster privilege to escalate the privilege on the PostgreSQL cluster. This is very important. At five systems, there is also several things that must be checked. Some of them are already checked by PostgreSQL itself when it's stopped, especially the permission on the PG data or some other checks, but some are not done. Especially when I recommend these checks, if the PG data has checked and enabled, this is not something that is part of the CIS recommendation, but I think it's something that must be done to be able to detect silent corruption on your PostgreSQL cluster. The different partitions, this is to avoid filling disk and having a DOS with such a configuration. Most summer check here are not part of the CIS, as I said. I have had around 25 additional checks because of my experience and why we can see in PostgreSQL cluster. Of course, PostgreSQL configuration must be checked. This is very important, especially the runtime parameters that are started by the backend. It is possible to have a DOS, especially, for example, you can disable the use of the index in the PG catalog when you start PostgreSQL. So it's important and if you disable this kind of index, you are sure that your client will have a lot of performance issues. So it is important to check that someone at one time have not changed his parameters and et cetera. All parameters that are start by the backend at runtime can be checked automatically. All the other one, Postmatter, C-gaps, user-privileged, and runtime must be checked by the DBA also. The objective of the tool here is to present all these parameters to be checked by the DBA. Logs, this is one of the most important part for securing your PostgreSQL cluster, you have to be able to get all trust of what happened on your server, or at least be able to know where the problem comes from. If we've come from outside, or this was a local connection, et cetera. It is easy with the wrong configuration to not log everything that should be interesting for the DBA, or even worst, to have nothing in the logs. This is really easy with a misconfiguration to cannot find anything in your logs, so it's very important to check that. Especially tracing connection, disconnection, it's very important to know at which time someone has connected and could have done something on your server. Ensuring that you have a log-line prefix on your PostgreSQL.conf that is compatible with PGBJ, it's an important tool. It's a recommendation of the CIS, I'm pleased to add it to this tool. Ensure that the log directory are outside the PG data Now with SSD disk, we can see that most of the log directory stay in the same directory in the PG data. This is really something to not do, because with each log, you can fill your PG data too. You might also want to enable an audit on your PostgreSQL cluster with PGOdits, extension PGOdits, which will allow you to trust exactly what the user have been done, and especially for all the DDLs that has run on your PostgreSQL server. This is something you have to consider. User connection must be enforced. This is something, this is where it starts for most of the attacks. Ensure you have a password complexity enabled. If you have an external authentication like LDA or GSSP, maybe you have already something, some reinforcements in the password complexity. If you create the user in the PostgreSQL instance, you don't have any possibility to enforce the complexity of the password except by using an extension. I would recommend you credit check, which is an extension with doing a lot of things to enforce the complexity of the password. And it also gives you some more advanced features to enforce the connection to form the user. Authentication timeout also is something that must be verified, and authentication delay, it's important to limit the possibility for a user to repeat tons of connection on your server. You have to stop that at some point. With credit check, for example, you can ban a user that have, let's say, three connection attempts, and you can ban it if the three connection fail. This is something interesting. Of course, you all know the trust method in the pghb.conf, so the tools need also to check your pghb.conf to see if it is well configured, if it is not too much permissive, especially to, I have seen some installation with, honestly, the person don't know exactly how the net mask and the IP address marks work, so everything, this was an open bar, so a tool that can check that and verify that there is not a mistake, it's a good point. This level can be critical also for data security, especially for excessive data privilege, super user, create role, create DB, and the set, the function privilege, which are the security definer close. Encryption at some level, I'll say, are need, especially I think the PG data need to be encrypted, but it really depends of your security policies, but this is something that you may want to check. SSL, TLS must be enabled. You can also check if you want the fifth enforcement, open SSL cryptography enforcement. I'll give the link if you want to enable it, and it check also if you have a PG crypto or PG sodium encryption library. So let me introduce PG, the DBSAT, which is a tool that aim to perform all automatic check about security of a PostgreSQL cluster, and collect all the information that the DBSAT need to check to manually. You know I love PAL, so this is a PAL script. With no dependency, it don't bother to install tons on dependency, you just put it on your PostgreSQL cluster and it will work. It perform all security check by calling system command and PSQL command, so you don't need anything else on your cluster. It is a multi-language. There is one transition in French or one transition in English following where you come from. There is two kind of report, HTML, which I recommend you and text. Text perhaps better if you want to make some diff on several runs. There is about around 80 security check. This is a tool that must be run as PostgreSQL user. If you have set a PG pass or all your PG environment variable asset, you can just run the command and send the output to a file. You can also set the usual parameter with PSQL to connect to the instance. Choose the format and get an output and what kind of output. This is the HTML output. You have first in the report, the summary of all the checks have been done with a check if it's set correctly, so green, it's okay. Red, there is a problem. It's not well configured. And if you see the square, empty square, this is a check that must be done manually. So all the 80 tests will be presented like that. And after that, you can click on any check to see the detailed report. And for each check, you have a detailed report which is explaining what is the check have been done and if it passed successful or what problem you have found with this check. There is warnings, there is critical for some part, there is critical part. Maybe it should be reviewed because this is the first draft of the applications or maybe I have to review this. You can see full example of this report on the internet. I have put them on my website. This is a fresh program, so there is a to-do list. Most of the system, I have planned to work on this system in the year, I don't know if I can be able to handle all this work in the year, but I will try. I have a high data also with the check to be able to detect if you are some misuse in your start procedures with identity object, identity name object. But I want also to develop an extension to act as a fire role to prevent in a SQL injection in your, in the POSWARE SQL tutorial. Contributing is important. As I say, this is a new tool, so I need a lot of feedback. I need a feature request and I need more check. We have discussed yesterday night with Julian and we have found some new check that have been applied to the release, but there is some other in the to-do list. Don't hesitate to report bugs. There may be some report feature requests and give feedback. My objective is to develop a tool, a community tool to be able to enforce the security on the POSWARE SQL cluster. So if everyone use it and contribute to this tool, we can have something common for all together. And this is the objective. Thank you. Thank you. We have time to take questions. Any questions? No, no, that's very interesting. Oh, sorry, I just hear, okay. That's very interesting because it is really needed by business people to understand whether their system is secure or not. But you only mentioning the GDPR, it is possible that we are using to comply with PCI DSS and also ISO 27001. So after we create some scripts like that and then how can we make it a standard? So people can accept it as a compliance, the standard compliance, is that possible? I will try to, all the new checks that I have produced, I will try to send them to CIHAS if they want to get it in another version of the document. And I think this is a good start for a standard on security. Helping also CIHAS to detect, to list all the security problems that can be checked in, that should be checked in the PostgreSQL cluster. Do we have any other questions? Or make a GDPGDSAT standard? I have one if no one's asking. I always have questions. What was the motive that you actually took this tool or the script in Perl instead of? Why Perl? Yeah, in short, why Perl? Because I talk Perl before French, you know? This is my native language, yeah. This is the reason. Also because everyone is doing Python or Rust or anything, but don't forget Perl, this is something. This is almost all called to system commands. Correct. This is the best tool to call system commands. Yes, but Perl has its own drawbacks and difficulties with the libraries. I wasn't able to develop that in a few days in Perl. If I have used any other language, it takes me months. So yeah. All right, thank you very much. Thank you. Thank you. We need a coffee break, I guess. After everyone goes to sleep, we continue with the next part. Hi, everybody, it's Disaster Recovery Team and Paul turns with PostgreSQL Story. Thanks, thanks, thanks, thanks. First of all, thank you very much for everyone giving me this opportunity to speak on this topic. I really thank to the Post-Asia Committee on this. So, yeah, so today my talk on hybrid disaster recovery, fault tolerance, and how you can solve that PostgreSQL solution. I know that many of the speaker before me talk about some of this topic and I'm going to try to see that, how I can differentiate that, maybe talk on that, right? So my mostly focus on understand the HA, DR, and fault tolerance, right? A lot of the customers or users may simplify or not to simplify in this difference yet. I'm talking that. And how you can solve this with the PostgreSQL solution, we talk on different kind of solution on that. And maybe final thought about how you can choose one or other, right? So who hell I am, right? So I'm Ajit Gargay, I'm Phil CTO, work with Enterprise DB. I'm based out of Singapore, been in industry more than two decades and with the PostgreSQL, more than 13, 14 years now. So yeah, so I'm picking many of these conferences, especially this part of world. So let's talk about HA, DR, and fault tolerance system and talk about their characteristics, right? There may be many of the users confused between everything in the same bucket. So bit difference in my opinion. So these are some of the situations where high ability, disaster recovery, and fault tolerance system occurs, right? And especially in the PostgreSQL case or maybe many of the database user case. Very high level, when you talk about HA, you talk about the single data center or single region, right? You don't talk about the cross region, right? That's a HA kind of system. And then you expect like 99.5 system, RTORP, I'm going to talk about RTO and RPO. Maybe you expect some kind of failover in RTO. You expect zero or not zero, near zero data loss kind of system. And the moment you talk about the disaster recovery, you being talk about the cross region, not the single region, or cross data center. So how you can avoid these kind of problems? Especially when this occurs, when there is a really big outage like flood or earthquake, unfortunately the data center, electricity down or something like that, you expect disaster recovery system. So how you can build this kind of system using a different PostgreSQL solution? Of course you expect bit more RTO, which is a recovery time because the whole region down. So you expect some recovery more time on that. And then of course you expect some kind of data loss because again the whole region down. So the different solution based on these two system. But there are applications which is really, really do not require downtime, right? And then you come in a fault tolerance kind of system, which is a mix and match of high ability and disaster recovery, right? But then why everybody's not gotten FD, right? Then there are some different mechanism to configure that. You of course need a more investment, more server to configure these kind of system different solutions. So let's talk about CAPS theorem. I'm sure some of you guys know about the CAPS theorem, but then just to repeat, CAPS is C stand for consistency. So either you get the right data which you receive on no data, right? So you make sure that is a consistent data which is coming to the posterior server. So not the wrong data you can read and write when you get to read specifically, right? When you talk about availability, right? So every request which you receive, right? It should be available to the user even in the case of fault tolerance or the fault or maybe the network partition, right? And the partition tolerance. You have a, typically you have a partitioning when I say partition, the network fail or something is failed and how you can avoid that. And in that situation, how you can configure the systems. And in my opinion, you can either have one of these. If you have a network partitioning, you either have a consistency or availability. That's a tradeoff. You cannot have everything in a single basket when the network partition happened. And this will definitely impact on your performance as well. The moment you try to get the more consistent system then it will require the same kind of replication. I'm going to talk on that. But then you impact on your performance and you need to choose and wisely what your business needs, what your system needs. Based on that, you can configure that because most of us do have a different kind of solution. And based on your needs, you can configure this kind of system. Now this diagram shows you what is the RPO and RPO, recovery point objective and recovery time objective, right? RPO typically major in terms of data loss. How much data loss you are okay when the system is failed, right? Of course, nobody wanted data loss but then you need to tradeoff, right? Whether you want a few second data loss, few minute data loss, that's a RPO you need to measure. While RPO is how much time you are okay to not to continue with your business. Five minutes, 10 minutes. And again, it is not a single time, when it's a single time, that means no typically it's major in whole year or whole month, right? Based on your ability. So you need to choose what is your RPO and RPO. And again, as I mentioned, it's impact on some of the performance based on you, how you configured. But in industry, typically it's major in percentage, right? 293949 based on the combination of ability and based on that, you can measure this. Now, how you can work that in the typical standard is if you are saying that my system is available 99% whole year, that means you expect 3.15 hours downtime including your maintenance, whether you're doing a patching, vacuuming or listing and this goes narrow down toward the, even the higher system like up to finance with the five minute downtime in the whole year. And then you can measure that into the month as well, depends upon how you can configure. So it's not a single time, it's a whole year because you may do the maintenance in whole year, patching, upgrade and all these things. So you need to configure based on your requirement how you design your RPO, RPO and different kind of solution. Now, this is all about the understanding of RPO and RTO. Let me check my time as well. I need to see the time as well. This is based on your RTO and RPO. And this is understanding of TAPS theorem. This is understanding of the different situations how you may manage the HADR kind of solution. Now let's talk about the solution, what Postgres do have, right? How Postgres can help you to avoid or resolve this kind of system. So very high level, one of the, so there are two major things. One is how you replicate the data. That's one portion. And then how you can do automatic failure if you need to do the automatic failure. I'll talk first about the replication mechanism. The very basic again, many of my earlier speaker talk about that, but the three things. One is the physical replication in between the two servers, which is we typically call in Postgres streaming replication. And the second is a logical replication, right? These are the two mechanism in Postgres or process where you can replicate the data. There's a third one, you can share the disk instead of replicating data, you share the disk that required a third party solution like redact cluster solution. But I'm not going to talk on this today. I'm going to talk on the replication mechanism. Now streaming replication, some of you guys know about this, is a bit by bit stream to be moved from the wall files, move from the one site to the primary site to the standby site. You can configure based on your need again. If you need a RPO, which I mentioned is absolutely zero, then you can configure the sync replica. And in sync, you have multiple methods. Sync on, sync off, write apply, write available, depends upon again your RPO ratio, where you system fail. And then you have a sync replication, which is a by default, where near zero downtime is required, then you can get the sync replica, but it will give you the better performance compared to sync replica. So these are very easy method to replicate the in Postgres from primary server to the standby server. You can do the cascading as well if you required, but yeah, this is a very easy method. Most of the users use a streaming replication, even including the of Hioglity as well. Now, this is of course the primary site, you can do the read and write, while the standby one can use for the read as well if you required. The second method is, sorry, before that, there are some limitation compared again with the logical replication, with the physical replica, right? Typically physical replica, take a read and write in a single server, not another server in case, if comparing with the multi-master kind of solution. But then failover length of the physical replication also take time because it need to be failed, detect and apply, right? So take typically 30 to 60 second in case if you're using a physical replication. And then everything you're doing on the maintenance job, especially indexing, vacuuming from primary server, it goes to the standby server. So you expect those kind of things on the standby server as well. And physical replication, don't do the filtering of data. If you require to filter the data, you do everything or nothing, right? So those are some of the limitation I would say, again, compared to the logical replication, I'm going to talk on that. The one more limitation I would say that it required same binaries from either primary and the standby. So, I just have five minutes. Okay, so this is a logical replication. Logical replication, I think one of the speaker talk about in detail. Logical replication on the top of the, if you're comparing with the physical replication is that you can avoid some of the limitation in logical replication because you can actually filter some of the data on a logical replication. It based on a PubSub model, application subscriber model. I'm not going in detail, I'm just scratching the surface over here. And let's talk about the cluster management solution. The data is replicating. Now, how you can do the auto failure in between this, right? So these are some of the solution. I'm not saying all, which is typically used by the users. Petroni is one of the great tool which can be used for the failure purpose, auto failure purpose based on a rough consensus. I'm going to talk on that. And then rep manager is another solution. And EFM is enterprise failure manager from the EDB solution. But these are the really great tool, Petroni rep manager you can use this as well. So one of the architecture using a Petroni is Petroni is based on rough consensus, require a ETCD small key value pair, which can you load on the another VM or maybe on the same server depends upon that. But ideally is another VM. And then you require a proxy to land a connection and you require at least two VMs or two server to land a VM for the high utility purpose. Bit complex, but then is really, really good solution, especially on the VM or physical even the Kubernetes as well. And one of the solution, if you are comparing with that, this is a failure EDB failure manager, which is on a virtual IP base, bit simple, but it does have its own limitation as well. It doesn't take your all of the failure cases, right? And it does have some limitation on auto failback based on J group or Java group solution. Now I'll not talk much because I have a limitation, but then the one of the thing, the logical replication elimination is it doesn't replicate the DDL. I think it is under consideration, but then improvement in a logical replication you can do with the DDL. It doesn't do the auto failure by default. It requires some kind of proxy. And if the node is failed, it doesn't fail back again. So that's what the distributed post-race is coming up picture PGD again is from the EDB. It does a data sovereignty across the geo. If you have a across the geo replication required, it does a conflict resolution. It does a conflict detection. And then you can avoid the conflict as well in this. It does have automatic failure. So very high level. This is kind of architecture you can see in the PGD. You have a one region, you have another region across the geo you can do the failure and sorry, within a region and across the region you can do the automatic failure as well. So all of this can be possible is one of the solution with the end does have a two part one is a BDR, which is a bi-directional replication. And top of that, it does have a proxy, which is a PGD proxy, which can divert your workload in case there's a failure happened. Yeah, so I will skip this slide, but I will talk about one is now which one to choose. There are multiple options, right? In my opinion, you should take about the three things or three, four things. One is a location preference. Are you really want HA or DHA in a single region or across the region? Based on that, you can choose which solution you want, right? What kind of RPU and RTO required? You absolutely want zero or near zero is okay with you, right? You want recovery should be come from the backup server. Again, sorry to talk about backup, you can do the recurring backup or you want recovery from the failure system as well. And then the performance. So performance is really, really important because the moment you go on a sync, you want to RPO zero, you expect impact in the performance. So based on the different solution, you can configure, based on the different requirement, you can configure the different solution using the PostRace different features. I think my time is over. Thank you very much for this. This is my email ID and GitHub. You can follow me on this. It's a nice PG vector scripts. You can follow me on the GitHub as well. Thank you. Any question? Okay, we have time for questions, questions. Any question? When we talk about multi-master or binary replication or something else, I see each time the same problem. It's a problem of performance. You know, on different nodes, you have different data in chat, memory in caches, in woofers. Also, you have vacuum made in different time. So sometimes logical replica works much worse than master. How you strike this problem, how you solve this problem of different performance? Yeah, so if you're comparing, so in a multi-master, again, if you're comparing the only basically with the logical replication, with the physical replication, right, then I do not have a right now link or slide, but Enterprise DB do have a public, I would say the performance benchmarking between the physical replication and the logical replication. The moment you start increasing your workload, the number of users, the performance of logical replication, especially I'm talking about the BDR case, right, it almost same a bit higher compared to the physical replication. And we have a, I do not have a right now link, but I can send you the link on that where you can compare that performance. Now this is a case of when you configured the async replication, not the multi-master with the sync replication. By the BDR, do have a sync replication as well with group commit or with the different kind of latency commit, depends upon how you want to configure the sync replica as well within a BDR. But if you are comparing with async replica with logical replication or BDR or async replica replication with physical replication, the moment you start increasing the number of user, your performance either same or bit higher in logical. I'll send you the link on that. Any questions? Any other questions? We have time for one more question, maybe two. You have credits if you want to use it. You can ask the question, no problem. Okay, thank you very much. Thank you very much. Okay, now we have the next talk. Today we have a lot of security and compliance and a great job by me to say two things about the conference talk and community, putting all into the H&A, as we go through the field over encryption and talks. So we have Giulianto from Ekinox. We'll take us to the journey of encryption in 11 DB Postgres. Thank you very much. Good afternoon, everybody. No, everybody. Already late, so maybe we're quite tired, but don't be tired, because I will bring you a quite heavy discussion today. I will be accompanied by my assistant, human AI. His name is Irene, so AI. Okay, so I introduce her because she may be learning something. Okay, maybe first of all, we should pay respect to Mr. Simon Rik. He was a big contributor of Postgres KL. Maybe we should take five seconds. He is the major contributor and he is the one behind the PGAudit publications, White America Free, and so many things. Thank you. And also I want to say a Selamat Hari Raya for anyone of you who may be celebrate for tomorrow. Selamat Hari Raya Lebaran. Anyone from Indonesia? So my brother. Okay, next. So it is me. Maybe for the one who didn't know yet, I'm running an IT solution in Indonesia and Southeast Asian. We also are having some event in Bali and also in Singapore and in the past, BG Day, BG Conference, et cetera. So I think enough from me. Okay, this is a memory from 2019 when we were in Bali at BG Conference. Some of them are here right now. Raise your hand. Okay. So Postgres KL is derived into 11 DB Postgres and this is our product. Basically, this is proprietary, but we also are following the versioning of the Postgres. If you want to test the 11 DB Postgres with the encryption, you can go to 11db.asia.register. So 11 DB Postgres already implemented in so many clients in Indonesia. You can see the logo here. Mostly it's corporations. So this is something like we take a part of is we are avoiding this corporation not to use Oracle and instead use Postgres. So this is our milestone. Maybe I'll skip. So maybe the question is why we need encryption in RDBMS because I saw from the beginning since last day so many students. So I have to start from the beginning. Why we need encryption in RDBMS? So first, it is business needs secure transaction. Transaction should be secure by RDBMS, not no SQL or any kind of things because business needs to be secure by ethnicity, consistency, saturation, and durability. So the transaction also needs to be by something like encryption or anything else and then multiple party involved in data management since the data is designed and then there is a business development, architect, data analyst, administrator, and et cetera. There are so many party involved so if we don't create the privileged escalation, right, we build escalation, maybe we have facing some problem because maybe there is some data breach. So also we need to comply with GDPR, PCI DSS, and ISO 27001. For data management. So because there are so many compliance regulations nowadays, I think we need to move forward into something better. In this case, it's encryption. And also minimize the potential fault since usually a fault is caused by internal. If you had an equation, feel free to ask. So maybe we can elaborate more information. Okay. Based on the analysis that we conduct to the regulations for this, in this GDPR, we saw some articles which is indeed required. First, unauthorized. So we have to create something like authorization method, a better authorization method. And then we can see encryption and then only personal data is just necessary, blah, blah, blah. So based on these articles from GDPR, it is concluding that encryption is required for creating a better security in data management. And we try to check also the ISO 27001. There is some articles also regarding encryption, cryptography, access control, and prevention. And in Indonesia, we have law. Drive from GDPR, we call it UUPDB, undang-undang perlindungan data pribadi. It is also mentioning something like before. About access and security level. Okay. So this is enough for us to go forward. Why we should have something like a better encryption, better security in data management. So what is the challenge? The challenge, first, I think when we talk about encryption, we should understand that in this recent world, there is already a quantum computer can do their work. Even though maybe only a very simple one to calculate, but it's possible. Some encryption that we have, that we use right now, it is, we call it not quantum safe. So we have to use the encryption with just quantum safe. There is, what is it, the issue from NES, I think, about information, which encryption that should be used in business. So it is compliant with the quantum safe. And then also, quantum safe, that's the first challenge that we should comply. And then the compliance to end-to-end security. To comply the end-to-end security, I think we should implement something like SSM, hardware security modules. And then, but it should also be easy to use. It should, we call it a transparent or seamless. So we should have an encryption method which is quite transparent or quite easy to use. And then support multi-factor for authentication and authorization. Of course, backup and restore should be there. And then, yeah, it should not be there. It should not be performance degradation. So these points that we try to analyze before we create these methods. And then we try to create column-based encryption in the postgres. Actually, it uses a data type. And then, because it is column-based, so it has to have index, so it is able to search fast enough. Because if we encrypt the column and then we want to search inside the encrypted column, so we have to scan and to decrypt. So it is not flexible, it's not efficient. So maybe for the student who maybe didn't really understand about data state, yeah? So there is a three state of data. First is data address. The data, it is on the storage. And then the data in transit, it is when data transmitted in the network. And data in use, it is when the data in the memory. So if we talk about encryption, which state we should encrypt? Usually, we prefer to use at least the data address and then the data in transit. Maybe in use, maybe not. Maybe yes, maybe not. So data in transit is very easy because we already have TLS, SSL, and PPN, something like that. But data address maybe another challenge. But some technology can be used like, you know, storage-based encryption. So I can go next. So if you use something like this, storage-based encryption, it is very easy because the database didn't know it is encrypted or not. This is in the file system or in the operating system. And we can also create encryption in the application level. So the application itself encrypt the data before put on the database. So the problem is in the applications. This also can be done. But this is also another problem raised because how we manage the key. And then the third is the encryption and the reason is done by the database. So we can say it is transparently from the applications because applications didn't know whether it is encrypted or not. Also encryption didn't know where is the key to encrypt and encrypt because it is done by database. So we choose this approach. The encryption is done by database. And yeah, it is. So based on the picture, you can see this is the postgres. And this is the agent. Create an agent for postgres. So this agent function for, what is it? Facilitate the database. For example, like DBA access to the agent and agent will modify something, setting in the database. And the agent itself communicates with SSM. Okay, maybe anyone understand about or maybe see about SSM? Raise your hand, SSM. Okay, thank you. And we also create a mobile application. We call it Charaka. Charaka is Sanskrit. Means messenger. Is that correct? Yeah. So we use mobile help for authentication. Because in the encryption, we can encrypt the data inside using the data type. But how we can ensure the user can access. So we create another type of authentication. And by using Charaka in the mobile apps, we can create something like offline OTP. OTP, yeah. So applications. In the, what is it? Protection level. It should connect first to the iconic out service to get a token. What is it? Session, yeah. To get session, this session will be used to connect to the database. When application with their connection string bridge by some attacker, the string, yeah, we just usually put on the code cannot be used for attacker to connect to database. Since it should require to use communication to the iconic out service. That's another kind of protection that we develop for SA, SA means iconic seamless encryption. So, yeah, this is applications connect to the iconic out service. We will show you the demo later. And after they get the content string, they can access to the database. And when the database is restart, this database cannot be restarted by itself without any consent from the owner. So the database owner used to give consent whether the database able to be restarted or not. Because trying to restart the database is another security concern. You can take a time to take a key or maybe something else from the database. So this picture is a complete explanation about the need of implementation of encryption in 11DB. I will tell you more about index and anything else later. But at least you can see the topology first. So what we call it end-to-end encryption, it is the key for end-to-end decrypt. It is on the SSM. And the database will inquire to the 11DB Postgres before they can get the key from SSM. So nobody whether the database or the agent has access persistently to the key. Maybe up to this step, any questions? Or I just move first. So this is the demo explaining about how we implement the encryption in the database. I will start and stop while I'm explaining the steps. Start. So this is the first time one should register to the system. You can use 11DB.HCI register. And this registration we collect to create the profile of the user. And then when user register and download the package, inside the package there is what is it we call it, random seed to create the synchronous OTB. Maybe I will need to faster a bit. So at the first time we only support online SSM but later you can use SSM. Or you can use TBM. TBM is a trusted platform module to just fly on the board visually. TBM can act as a SSM but very less functions. So this is already done and then we will end linking the applications. So we download from the code and then because we are not registered yet into Google App so sorry we have to download the manual and we have to install manually. And then registration it is from email we put on the chara to link the mobile phone into the system to link us to the mobile phone. We create a package Debian and RBM so you can try to use DAP or RBM. This is installing. After installing the database is already there the agent is also there and also we install leapfarm. To try to install you have to because this installation put leapfarm on your OS and then your OS becoming more secure because there is another kind of authentication like you can see sudo ls-l so before we don't ask a challenge right now they are asking for a challenge should we use these mobile applications. We have to put this number to the mobile apps and then we get the response from the mobile apps and put on the host. So the dead stage is we enabling the sudo into 2FA using offline OTP. We call it offline because the code and the mobile is not connected and I mean the host didn't connect to the mobile of course connected to the internet but between host and internet the connection. So this is why we call it 2FA offline OTP. And then this is also the main reason of having 2FA to preventing the postgres starting up without any concern. So we also already start the postgres but it didn't finish the starting up waiting for OTP OTP and we should open the website localhost locally and we use the same mechanism OK, then the database start so that's the reason why we enabling 2FA. So this demo for showing the encryption. First thing first we are using user, we are creating user database user and grant SL1 SL2, SL3 the role we use for level of level of access, privilege. OK, so we connecting to the system using trial one it means the owner and the owner create another user called Ananta and Ananta has SL4 and Budi has SL2 it means Ananta has higher privilege than Budi and then it is normal grant all privilege to schema public, on schema public to Ananta and Budi OK, then we will login as Ananta and then we will create table OK, begin the first column and the second column user name is Varchar and the third column is mother the name is ESSEC, ESSEC is our data type this is encrypted data type and then we insert the data OK, we have 3 row then we create index this index is special index created for this kind of data type, ESSEC therefore by using this index we can search unless we cannot search the name like, we can use like OK, can search and then we login as Budi remember that Budi only level 2 so Budi cannot see but in this type Budi is not granted permission to table, it is a postgres not us, we have to grant permission to table all on users to Budi, so Budi can access the table but still Budi can access but Budi cannot see because Budi doesn't have SL4, because when Ananta create the Ananta is level 4 so Budi will try to select, can see but cannot see the plain text, because it doesn't have the same rights with Ananta so we have to grant SL4 to Budi so Budi can see the plain text this is a real case demo OK, we can see the result any questions perhaps? Any questions? I have a simple question in database system we have checksums to control consistency checksum, yeah they reduce our performance but they need it, strongly need it can they get something beneficial from introducing encryption in core of postgres, remember that it can reduce our performance so do we need implemented, can we get something from that? Yeah, as far as I know PG checksum is not used anymore because it's not really good, maybe Julian or better than me why? Because computation run in every row, if the encryption also run in every row, I think we will have the same thing happen, but we are trying to make it faster because we use hardware accelerated encryption, since the processor actually has feature of IAS, we use IAS 256 and we use the hardware encryption from AMD or Intel and also we didn't decrypt every select when you saw it, maybe in the table there are millions of rows when it search, they are only using index, the index we create the mechanism and algorithm to the index and when index is being used it don't revealing the clear text because index only contains code, something like that, and then when the data already collected from the filter we select to the page and we use, we take the encrypted data from the page and decrypt it so we didn't use decryption every time, so I think we cannot use for checksum because it will result the same thing in the core, what do you mean with core? From the server level on the server level not only data type, but in the core I don't know, I don't have any idea but because we using data type we already done I think for the business side what business need is not all data being encrypted, they need something very confidential or maybe they just need anonymity like Julian bring yesterday to avoid people can what is it, river to other to one people to another data but for more secure to avoid stealing the data when maybe somebody can steal to the back door because it encrypted and it is as to 5.6, I think it's not easy to get it, the plain text it's not easy thank you very much we're out of time, we'll take a break and then we'll come back it's 4 we have more sessions after 4 after the coffee break hello everyone in this talk I'm going to talk about what are the possibilities of GIS and where is the market of opportunities that are going around that one, it's not be fully technical talks, it is understanding the use cases what are people are building around that how the post-grace which is having the extension called post-gis can be used to empower that and build a GIS application so that's how the title of so called that harnessing the power of the post-grace for the GIS before that let me just introduce myself as he has mentioned you can call me Avi, I'm a senior software engineer at Eagle View, also in GIS domain at the profession I'm a full-stack developer and AWS Community Builder it is selected by AWS, along with that co-organized for Dr. Bangalore Meetups and I do a multiple community contribution, volunteer and speak in multiple community events also and here are the blogs, the internet kutta I keep writing the blogs, it's almost like from last 13 years I'm writing the blogs it's about all learnings, experience and failure what have gone through the last 12 years, my portfolio okay, so let's go to the just keep it to that one, so I have kept like the similar kind of things of that particular GIS, so we are going to route like this one start with the GIS, understanding the trend and then what are the types will be there post-gis and functions and use cases so some of the technical things I might keep it due to the time restriction, but we will see that how the time goes on okay, so he is a world he is a very special American geographer cartographer also what he said that right, he said that everything is related to everything else means that to understand the GIS nothing is a complex, okay because where you are standing here that become your location okay, and if a person is standing there that become his location, so uniqueness is there so everything is related to everything that is how he is saying that in GIS, so we don't require any kind of primary key to identify particular land or particular the type, okay what is the primary key because the geo-coordinate itself become the primary key for the particular that object, that's why he is saying that everything is related to everything else and the gear things are more related to than the distancing, okay so that's generally a statement given by he is about the GIS side so let's understand first the fact about the GIS, what is there I think the, maybe some all of people might know the full form of the GIS like your graphic information system so it's like a special tool that help us to understand and analyze the special data I mean it's a special data right the data which talk about a geo-coordinate okay, that will be always we have a geo-coordinate along that so if a particular point I am taking it, it might be one point it can be a multi-point, so we will talk about in the next slide what kind of data type will be coming across that one so just to give a generalize statement and very close to the real life one okay, which houses are closer to the park what are the fastest route to deliver a pizza and what are the best place to the plant tree and I think every one of the daily routine we are already using the GIS our daily routine, like because we cannot live with our pizza delivery okay, we cannot live without a grab here Vietnam, in India, in Uber like that, so everyone is using GIS without a GIS we are living with the Google Maps also that's how exactly it works so this is about GIS that's the fact about in reality about GIS nothing like a complex which is new topics that come across in industry, nothing like that so before I go talk to the next slide one right this is like a market forecast forecast is there up to the 2025 there are a lot of forecasting has been done by analysis team but it's saying that like the market is going to be grow and it's going to be grow by almost more than $14.5 billion by 2023 there are a lot of opportunities going to be come across it and most of the countries, their government are investing a lot of the monies into the GIS domain because if they wanted to they wanted to configure all the data which is available to the GIS one like example of property investigations also and cropping part everything they wanted to do it by link it to the GIS side and that is where this market is growing very fast and it's going to be a good create more opportunity in like future there is a disclaimer here so I'm not a market expert I'm just a learner so whatever I found in a analysis pdf I just grab it yeah so this is one some few fact about a few things currently what's happening in a across the world I have to go around that once like so but title says that GIS is ever been a trend I will tell you why GIS was there exist since the our earth had gone and it will be there GIS till the earth will be here so it will not go anywhere there is a lot of technology will come and go away like to have a computer desktop and computer and mobile right now is AI is going on everywhere so the technology keep on changing it but GIS will be there every time that's how I'm telling about GIS is the ever green trend people are doing like in India there is a state called Maharashtra building a property binding using the GIS so all the property taxes can be paid and can be a friendly people know that how that is getting calculated also so each property will have a unique property ID and that we will link to the geo coordinate currently that system is not there in some of the places so people are starting using that that can be very transparent to the people who are paying the tax one they are good initiative done by the Maharashtra government for that one there are a lot of things are happening in the world across the globe and you can see this from the news only news report only like trends are happening in GIS and in America like US is already using that GIS because in Eagle view we sell a lot of things in US markets only there are a lot of solar roping is there the insurance claiming happen to GIS okay there is a change detection happens somebody is encroaching the other land how can I detect through the AIM with the GIS data that can be also created by this data multiple products are there using the GIS and already a lot of countries are pending around that one there is another few fact about like recently Indian government started using the geo more and they are spending more on the like investing more on the geo summit also recently so that is the one twitter snippets is there so let's support now little bit some more understanding about the GIS type right so GIS includes the 3 data types there are more but primary are the data types which calls about point, line and polygon when I say point I am standing here that is my point when I say line if I want to measure something from here to that particular standing that become a line here but if I want to go beyond that where we will have a multiple point more than 2 which is like from here to there maybe triangular maybe square one it can be polygons also so that become a polygon so most of the time land will have a polygon road will have a line object where your meter number or the water pipe will have a point along with that this is the example of that particular data type which comes in the GIS type so what kind of format generally the data gets stored right so there are standard format set by the geo special information things like and Esri is one of the top most company which define that everything here so there are like geos on KML format geo json is a json file format where we can see define that particular GIS data into json format I will show you the one use cases how it can be look like that the KML format shape format and there are many more harder but there are two format are highly used which is the geo json and the shape file one so geo json will be stored into generally post data base shape file will be the file format so in generally if you want to transfer data of the any geos special data right I can hand over that shape file to somebody just by translating the file that data will conclude of the map and the points whatever data will be there around that yeah so I was talking about geo json like that this is the format of geo json file will have a feature collection so it will say that type of data type is a feature collection and what kind of type will be there it defined that whether is a point whether is a listing or polygon so as I mentioned earlier right the point will have only one coordinate here and the line will have a two coordinate like this and this like that and the polygon will have multiple based on the points it cover that one this is shape file format so whenever there is a shape file is getting distributed to the one hand to the another hand that can be will be in this format so four files are a mandatory like one is a dbf file which consists a database values one is a prg which actually showcase the map on that side and the shape file will have actual data and linking between the your geospatial point and access is another next level of extension for that one so when somebody is asking for shape file data has to be there the four file inclusion of that any of the file is missing there may not have the full-fledged data so we will have to ask again for that file so make sure that whenever you are dealing with the shape file data or geospatial data this four file has to be there even if you convert from geos on to shape file or any the format km format that's how exactly work the four file should be there for conversion part so we can have a multiple shape file for that one and you can club together with the one shape file also yeah so now main part is like post gis here why the post gis there right so as this picture says that the fastest thing on earth like it like go like a cheetah okay but yeah it is true is there compared to the other database for a gis support it is very fast compared to because that support the special index test and the good kind of B3 mechanism and index mechanism is there which actually run the query faster compared to the other database for the geospatial data there are a lot of other database are available already so for just actually the extension of the post database so you might have like learning from last two days about the post base in this room right the post is actually extension of post SQL database management system which actually store the official data it can contest of type indexes and functions so I just keep this history so limited of time yeah so this is the one important slide of this here as I mentioned my early talks also here just now earth is your primary key and special is universal key by itself so make sure that when you're dealing with the geospatial data your geo coordinate itself is a primary key you don't require yeah you don't require any extra primary to be defined that when you have a column automatically your coordinate become the primary key for that one so that's another amazing part of the geospatial data you don't have to worry about okay what my column will be there and what can be a primary key for that these are the functions available which actually post just has an inbuilt function compared to the other database there are a lot of these functions are available and one of the good part is like as some of the session has mentioned post gets one of the active community to create a lot of extension a lot of different support for that and these lists are keep getting increasing every time so these are common I just walk through on the small functions like it's htarea so if you have a geos on there you can just directly run the query with the htarea and the column where we store the geos and values in a database which will have that it will automatically give the area part of that so that's how exactly you work you don't need to write any separate function for that the inbuilt function already is there same for the htvein also like example if you want to find out which are the hospital within my area of zone right you can just write this query here and you will be able to get queries here you can find out within query I can find out my house my house geocordinate and the within the area coordinates will be there accordingly it will give a list of all the hospitals available within my area same it's like work like I would grab while you're grabbing it like when you're pulling the nearest drivers which is available to your location so there are some more functions like as I mentioned right intersect coverdb, covergestoint, equal, overlap purchase, these unions are there there are many like that this is how exactly the post tables will be created like simple like a mysql or other databases like them like that you can do it only thing is we have to define the geography of the type of that and which are the point you're going to store with like points, lines or polygons so the real word of GS I will quickly walk through on that what are there exactly using is yes the GIS getting used into hardware and software both industries not like a movie software industry also along with that different market segments are there like agriculture, construction, transportation utility mining, oil and gas so let's have a look into the like how healthcare can be used right I think corona time a lot of people have been developed a lot of applications using the geospatial data to find out where is the my vaccine is available where is hospital or nearest center around that or where is the actually the more density of the COVID patients are there that can be identified with the JLH data so people have used this nowadays all pharmaceutical utilizing this way agriculture I think well known that cropping site water management of the crop everything can be used in agriculture there are a lot of people are building the products on that yeah so that's about my talk okay you can scan this code I have listed all the resources which is done research of this particular talk it's available in a github all the list of the links available there so you can scan this QR code okay thank you for flying through the GIS talk there's a lot over here we have time for questions any questions from the audience yes GIS is a very specific subject I have since nobody has a question I always have a question what is the standard GIS system that you use or Postgres comes with it when you actually use post GIS like the international standard there are different trends there is a CRS standard you use CRS 84 PRJ 45 the other question can I ask another question is there a data type that supports the latitude and longitude or you use just integer when you store the geolocation is there a special data type that supports this in Postgres? under the GIS like line the polygons and the point anything can be there correct but the data type when you define a table the data type is geometry there are two data types are there alright that's the answer alright thank you very much you may call me Ivan Kosh I'm from Tantor Labs we develop a database that is based on Postgres so in this talk I'm going to tell you about one of the possible implementations of autonomous transactions autonomous transaction is a transaction that may be committed even if the main transaction is rolled back they are not in a square standard but many databases use them their own syntax in this example a common use case is to declare a block of square statements with some attribute on this example we declare the function with pragma autonomous transaction that means that this function will be executed in the different session let's analyze the example the main transaction we execute begin it does start the transaction then we insert the value 0 to the table then call the function 4 it inserts value 1 to the table and then execute rollback so our main function should rollbacked and when we select from the table we see that value 1 is in the table and value 0 not in the table so value 1 was stored all the main transaction is rolled back so common use cases are login, auditing integration with some external SAP debugging so many databases use them so when we try to convert code from other database to Postgres we could use such construction also non-transaction comments meant be executed in autonomous functions like vacuum and many other create table etc so a bit of glossary main session is the session through which user interacts with the server main transaction is the transaction in this session autonomous session is an independent session that is created from main session and autonomous transaction is AT, autonomous function function that will be executed in the autonomous session so autonomous transaction critiques so this is independent a transaction that don't debate on the state of the main transaction if the isolation level is serializable also autonomous transactions can launch another transactions and this is a high performance alternative to extensions like the bellink and PG background so here examples how used of the syntax of autonomous transactions in different databases free below Postgres like databases they have two types of syntax maybe we have another databases but I don't know about them I mean about Postgres we skip some slides because we have low time here examples how are used in different databases so we don't have autonomous transaction in Postgres but we have extensions that have such functionality like the bellink PG background also we can use PL Python for such behavior but they all have some disadvantages performance and security for example PG background uses background workers and for each function execution we create and destroy the ground worker and the bellink has security problem that passwords are not in plain text no skip so a bit of history first notes of when we try to realize autonomous transactions many questions arise like security implementation semantics replication and many other things we need all solve it that's because I think autonomous transactions aren't in the postgres at the current time so those discussions in hackers were from 2008 till 2011 they discussed the concept and semantics autonomous transactions but no patches were introduced in 2014 Rajiv Rastagi a guy from Huawei have introduced the first patch with implementation he have used PG Pro data structure for historian autonomous context and returned to the main context so we have only one process that implement main transaction and autonomous transactions ok in 2016 Peter Eisenthraut also introduced a patch at the time postgres have had background workers and so Peter used them to implement autonomous transactions so I think I skip this a bit so he has some advantages and based on Peter's patch we implemented our patch that has some some common internal characteristics so we don't have pel python at the current time support but we have PG SQL support using a pragma autonomous transaction we implemented using background workers, synchronous have synchronous execution for communication used postgres client server protocol messages sent through dynamic-shaped memory infinite calls of autonomous transactions are possible so we have pool of autonomous sessions for which we can most statements are supported than Peter's patch and many tests so here is the link so let's see an example here is anonymous function here is a pragma autonomous transaction and in the function we have transaction management commands like start transaction command and rollback by default functions we can't use transaction commands so the advantages of the patch that we have pool of sessions no need each time to create and destroy background workers when we execute an autonomous function also many statements are supported almost no need to refactor as current postgres code we use the generally accepted pragma autonomous transaction and it works some disadvantages it uses ground workers so we need many systems so we need somehow to split background workers that are used in transaction from other using because we could use many times transaction and background workers will be exhausted so the next step is need a common pool autonomous transactions for all beacons this is for security and rule system it's better to have also from Peter's patch 7 years has passed do we need them in the core postgres how we need to implement them more generously maybe we don't need them in the postgres and somehow a different mechanism could implement the needed functionality so now about internals of implementation dynamic shared memory segment is allocated at first then table or shared memory table of contents is created table of contents is a dictionary whose values links to addresses of other entities in the rest of the day some so and then the entities itself allocated we have for entities user data they contain some authentication information then we have two shared memory message queues through which communication between main session autonomous session and will be and then the settings of the main session that are used in the autonomous session so we have such segment then we launch a background worker past the handler function to it background worker is created begins to execute handler handler reads all this data internalizes structures for autonomous session then send the message to the main session that it's ready to execute comments and the infinite loop begins so let's see an example on example this so we have a function with program autonomous will be executed in autonomous session so autonomous session was created and main session we have so we have a comment executing certain table values it's parsed by main session insert comment is extracted this comment is sent this comment is sent to the autonomous session it was executed there and then back the comment is it it means complete and main session continues its execution so we have a pool for which we can't it's created lazily we have destroying of sessions by timeout we needed to clear memory because we just occupied and no this is not beautiful but simple solution the current time after execution autonomous session is returned to the pool at the current time the truth only one session is sufficient and we have synchronous execution between main and autonomous sessions but as some statements not supported sometimes we need more sessions so let's see examples we have a table with integrity constraint all our values should be greater than 500 we have autonomous function with two inserts they executed in two transactions at first we insert 700 and then 100 it violates constraint so the transaction is rolled back and table contains 700 only the second example is all this insert are in transaction block they are executed in one autonomous transaction so we also insert 820 violates constraint so all transaction is rolled back and our table is empty so it's possible to execute non-transaction commands like Wacom and many other other system and so many other exception handling is possible we have here two races one notice and one exception exception was catched from other function and so when we execute this function we have one notice and we have notice about exception that was catched here so at the current time we have it's possible to use transaction management commands in procedures but they have some not very beautiful we may say it so let's analyze the procedure at first insert 100 then commit command and then 800 all back command after first commit our main session one transaction is completed and automatically was started the second transaction that was rolled back so we have in the table only 700 but it's impossible to call procedure that manages transactions from a simple function we have an error here the function that calls our previous procedure and we got an error so it's impossible to execute also non-transactional commands in the procedure we call here Wacom and it's impossible to execute we have an error also also procedures don't return values so it's a disadvantage because we can't use in some operations also we have a different syntax to call procedures and functions we have some time to rewrite them so some benchmarks we have the blin implementation with pgbground implementation also we have pragmatonov's transaction with pool of workers and pragmatpeter means without pool of workers as we can see with more connections we have more TPS using pool so pool is a cool thing so he is code for benchmark it's not interesting for now and third plans we try to implement remaining statements in pg2l also pylpytons support common pool for all backends asynchronous or autonomous transactions and of course we try to solve all questions from reviewers and merge to master but maybe we'll be simply in the next session that's all we have time for questions any questions on the autonomous transactions thank you very much next fourth part now I'll talk a bit about my company next to me Biffalign is operating as a cloud provider to supply internet services Biffalign is a company that will talk about cloud server, romances, katabase, Kubernetes or database when in the operating process or solving problems of the company we are meeting and providing problems that will have a lot of backends that can be used for port grids next for that kind of use we ask for a stable mode so that we can supply to many users at the same time the problem we have to deal with is that we need two VBTs for port grids because it is operating like a cloud provider we need a port grid like a database so that we can supply to users oh it's a bit messy I'm scared so the solutions to the problem we have to deal with when there are too many backends there will be a lot of backends and when we end up with HLAA we will need to deal with HLAA and KibbleLine solutions to be able to handle and in the solution for the problem of the product we will need to deal with port grids for port grids when there is a problem with HLAA and it's just a port grid there will be a lot of problems we need to deal with the solution for the package we will use a solution for the port grids but when it comes to the port grids there are a lot of problems related to the names it's very complicated and the cost is high and it may not be able to deal with the limits we have for KibbleLine there will be a time between the database system and the solution when it comes to port grids it's very difficult to name and it will cost a lot of money and it will waste time and in the process of finding out the solutions we have found Pachoni which is a template for port grids to provide the solution for port grids and it can be used as an open source and it can be used on-frame bit or on-frame test or on-frame docker a few examples of Pachoni Pachoni can be a monitor or a database in a Pachoni or a port grids and when it is done it can be used for example when a server database is damaged it can switch to a normal database and it also has API web so it can use Pachoni through HTRP and it can use it from far away and no need to and in addition it can provide the devices that are connected to Linux to avoid the speed brain in a computer when there are 2 nodes so what is Pachoni like for a system in my case Pachoni is connected to a machine between the server it will be connected to a scope the nodes in the server will be sent to the state in a red scope and it will be connected to the node in the framework of the server so the nodes will be checked to see if it is connected and it will be used for the primary so this is the architecture of Pachoni Pachoni is designed to Pachoni will present the components of the POTGRESQL to manage through the ATC and the application will directly connect to the POTGRESQL through an application this is the architecture of the application so through that architecture what will we use so about Pachoni in the process of preparation with the architecture it is used to supply like a HANA product in that system then it will ask about how it is designed and when in the demand or the application of POTGRESQL for the database all of the features are available or have the POTGRESQL POTGRESQL is managing the database almost from Tamsuri DB or from AI or application and then in the cloud Pachoni is on my side so how can we connect to Pachoni all of the features of Pachoni can be compiled from a C so that can connect to Pachoni through Python then Java or all the features all of the features so that can connect to a C Pachoni then we can use an attribute to create an attribute for the option there will be some values such as NDE or Free First Standby NDE can receive all the server when connected at the same time when using Free First Standby there is a node that is broken then it will activate the node and create a security system and in there the security node will automatically become and the process of using it will continue to be connected to the Standby i.e. the database will only connect to the Standby node in that cluster and not use it to nodefabry then it can be simplified for the database system now I have a demo using Python when connected to Pachoni here I have a simple code in Python then here I will use Pscope for a server that has a series of hot data related to Pachoni in that Pachoni I can complete all the hot data and when connected we will use Archive View Chesson to connect to that cluster and when we can choose where the node we want to connect through Archive View Chesson when we turn on a database or actually a database is broken it can be automatically changed then it will continue to connect to the database I am dealing with and I hope to this is the song I want to sing for you I will stop here then that's it and you finish very early any questions? we don't have a translation problem let me ask I care about Pachoni Pachoni compared to Kibble Line it has more features I just mentioned about Kibble Line what is more compared to Kibble Line here I would like to tell you about the first thing when we think about the combination between Pizipoon, Hariboshi and Kibble Line between our database goes through the center of Kibble Line it forms a network it will go through the center and connect directly to one it has two effects one is that we don't have to name Hariboshi and Kibble Line and compared to Kibble Line it's not related to the increase of the system and compared to Kibble Line it's not related to the increase of the system and the slow of the system we will look at the advantages of using Hariboshi and Kibble Line when we use Pachoni it will connect directly to Kibble Line to the server we don't need to use that because it can find it without this section yet when we connect we open a lot of places and add Kibble Line it will connect we will ask you more about the advantages of using Master Master Standby Master Standby when Master is damaged Pachoni will keep it to become Master exactly BNM will have 3 nodes it's a basic when the Pachoni is active it's like Standby when the Pachoni is damaged the Pachoni will be automatically connected to the Pachoni it will connect to the app it will know where the Pachoni will be connected to and the Pachoni the Pachoni will have 3 nodes 2 nodes 2 nodes will be active normally we shouldn't use 2 nodes 2 nodes 1 node we shouldn't use let me ask you when the Pachoni is damaged how long the Pachoni will be able to promote the Master do you understand when Pachoni is active it will have an interval interval check the nodes in the collector can be 15 seconds, 30 seconds or 1 minute depending on the node the Pachoni will have an interval when the Pachoni doesn't have an interval the Pachoni the Pachoni the Pachoni will be automatically connected when the Pachoni is connected the Pachoni will have an interval and the Pachoni will have an interval between the nodes depending on the free mail the switch to the Smarter will be fast it will take about 5 seconds because it is a hot standby it will be probably wrong because it will be followed by this key after that when the key is ready a new config will continue to be connected to the config thank you this concludes our day thanks for all the speakers and thanks for joining see you tomorrow thank you