 Let's let's get this started then Can everybody hear me, okay? Awesome. All right. Well, how's it going everybody go afternoon? I don't know where my screen there we go So my name is John beyondy. I am a network security software engineer at double verify And today I am pleased to present to you guys my presentation on pub sub with postgres so What I'm going to take you guys through today is You know first off what is pub sub the theory behind it, you know, just what am I talking about? Then we'll talk about what this has to do with postgres You know how postgres can help you with this And then I'm going to walk you through like a sample application. So something a little experiment that I've You know use this for here we go the description So what is pub sub pub sub is first it's it's short for publish subscribe Which is a messaging pattern. So basically describes a way for two processes or two servers or you know threads or whatever The to communicate with each other So it's a form of asynchronous communication It's basically It's very very similar to how a message queue works. In fact, it's usually bundled in it's a package deal with Most of the big-name message queues. So like rabid mq Rabbit mqs provides this functionality So we have it's basically a you know a two-agent service Or a two-agent pattern. So you have a publisher which Generates messages so it generates like events and it sends them off to some queue And then we have subscribers that are then notified of relevant messages So the messages are like, you know popped out of the queue and sent to whoever was listening So the best you know the reason we do this is because It decouples publishers from subscribers. So it's not like like a rest endpoint or something where you have to keep polling It's whenever the data is available if you care You'll be notified So decoupled. What does what does that mean? Actually, it's like it's a loose loose coupling so publishers and subscribers don't communicate with each other They don't necessarily know about each other. They don't really care about each other All they care about is the data being sent between them So a publisher will generate a message on a channel. So for you know, for example I want to know if When like a new user is generated and that's all I care about So the publisher will generate a message saying, you know, new user John created It enters the queue and that anyone who cares any subsystem or process that cares about New users will then be Will then be notified about about the the new information It's not like a fan out thing where we're all subscribers are Told about all an information they a subscriber subscribes to a channel so that could be any one or more Channels Which is like a topic or some some form of information that they care about and that's all they get So a publisher can be publishing, you know on hundreds of channels hundreds of different kinds of information and a subs Subscriber will only get the subset that they care about So this is how it looks with pretty colors. So here you see publishers Broadcasting different kinds of information it enters these queues And then a subscriber who is subscribed to each, you know individual Queue will then be notified So how does Postgres do this Postgres actually implements Something similar it implements this right out of the box and that was my expression when I found that out The thing with Postgres though is it doesn't really it's a little bit different because the in a true pub sub implementation the Like the message queue is almost like a third-party service. So, you know, you'd have like One server talking to it rabid mq which then another server consumes In Postgres Postgres is both the publisher and the the queuing service So Postgres supplies this with Two functions now and listen and notify so notify is the publishing it looks like this So you say notify the channel name so like notify users And then you can supply a payload. So you'd say either notify users which just broadcasts an event Or you'd say broadcast users and the payload so it could be a user ID or username or something like that And then postgres supplies a second function called PG notify, which is a little bit more programmatic So with PG notify it accepts to Two text parameters Which then you know you can do stuff like like string concatenation and stuff in there to get something a little bit more dynamic So notify as defined by the Postgres the very detailed Postgres documentation is an inter-process communication mechanism For all these different processes that are accessing the same database The channels that Postgres can publish to they're Assigned by the programmer. So there's not like a really a standard or anything That assigns them you pick it the common practice though is that usually you You name the channel after like the table that you're You know containing the information that you care about But you can name it whatever you want So when notify is called It generates an event So an event looks like it's basically just a string with a channel name The publisher service PID and then the payload string which if you didn't supply one is just the empty string So Notifies or So you run a notify you can run a notify inside a trigger or a transaction You can run as many as you want like you can call this function as many times you want inside a transaction But they won't actually run until the trigger the transaction has been committed And I guess that makes sense because you know if you have a transaction that rolls back You don't want it to start putting out events because you can't take them back once they're gone or once they've been sent out So if you do multiple transactions in or a multiple notifies in a single transaction Postgres is actually pretty smart about it. So if it sees that you're doing you're Notifying to the same channel with the same payload However many times in a single transaction it can roll them as it can intelligently roll them into one So there's be one notification But if they have different payloads Then they're guaranteed to each each notification is guaranteed to be sent and Postgres guarantees order of messages So the order that they're generated within a transaction and then the order in which the transactions run is the ordering And then here's what it looks like in The console so I subscribe to listen which we'll get to in a second get excited So then I call notify Which just broadcasts a you know an event with no payload and you can see that Postgres has Received the notification Then the second one I do I notify to the same channel again But this time with a payload you can see that this is the payload and you can see that it has been received Then the second little block on the bottom there shows PG notify and how that works so you can see the first you know I call PG notify with Concatenation of fo and oh to make foo and Payload of pay concatenated with the load to make payload And then you can see the the output of Postgres there where it received the notification So the flip side of notify is listen so listen is the subscriber part of this pattern So listen registers a Postgres session as a listener on a channel and you can call listen over and over and over and over again from the same channel or from the same session on the same channel and Postgres were only registered once so every time every subsequent call will just do nothing Sorry, I get so excited about this my mind is blank so So similar to how the notifies don't run a notify won't actually happen until the notifying transaction is complete The listener won't actually consume the event the notification that it's received until it's done with its transactions So if it's in the middle of something The message just sits in a queue waiting for it to waiting for the listener to be ready to consume it Which can cause some problems, which we'll get to in a second Whenever you're done listening or you don't care about You whenever you want to unsubscribe you unregister with lit with unlistened All registrations are automatically cleaned up when a session ends. So when you disconnect either by accident or on purpose You're removed from from the subscriber list And a session can both listen and notify on the same channel as we saw in my little demo code There where I'm doing listen and notify at the same time. There it is again So a few seconds ago, I just I mentioned that there's that queue that messages go into when They haven't been consumed. So postgres puts all of the messages That haven't yet like fully been delivered but have been sent It puts them in a message queue where they sit until all of the subscribers have been had a chance to Consume them or they've been delivered This queue by default is 8 gigabits and 8 gigabytes in size So like I said if a listener enters a transaction The message sits in that queue and if you have a really long-running transaction or a transaction that just doesn't end Postgres can't clean out those messages So that could be a problem if you have like really high traffic and that message queue starts to fill up at 50% capacity it starts throwing warnings into the log with the PID of the offending listener So it gives you the opportunity to turn around to go in and kill that transaction to allow postgres to start cleaning up And it's really important to pay attention to those logs because when it fills up Notify the notifies will fail and nobody will be notified anymore So like possible applications of this You know you could have something like I mentioned before like new users So as an every time a new user is a new row is created like a users table a trigger can run which you know launches off a notification that some client-side up some after some other application can consume that and then you know Spin off a welcome email or something That is super useful Something a little bit more fun that I've been playing with which we're going to touch upon is a cash management So you can use it, you know an event to trigger a invalidate cash or update cash or something like that So now we're gonna some code. I don't know who's feeling who I'm feeling like the guy on the right And maybe you're on all the guy on the left So I'm gonna explain my the problem a little bit So I was working on a we're working on a data store that use postgres as the storage back-end So the data store Basically contained a log table It was supposed to be like a generic object store kind of like I don't want to say the bad word of Mongo But it's kind of like where it'll accept like any kind of data and be able to you know restructure it when you when you need it So So we ended up storing everything in like this normalized log table, but then we had to reconstruct the objects Like at a later date Had to be super high performance. It was distributed So that means we were we're gonna have like replicated instances of this data store all connecting to a single Postgres back-end so like if RDS supported this it would have been like RDS or something Which RDS I don't think supports PG notify so Yeah, but so the ultimate thing was that we'd have we have all these objects with like dynamic basically dynamic columns and We had to reconstruct them at like runtime. So that was the point of this this problem So what we did was we basically I had to build a cache To store the name like what I would use as column names when we reconstructed a regular table in a regular format so What we what I used is I used go some super into go and I highly recommend it and Postgres something else. I'm super into that I guess you all are too and then the the library connecting Them is the is goes PQ library, which is the go-to like the de facto Library for Postgres so This was a really complicated problem, but I'm gonna dumb it down to just not dumb it down But simplify it to two tables. So I have this fields table which describes like all the columns that I have for my objects So here I have like a UUID And an object type ID because you know we can have different kinds of objects being described the title of the field So first name last name And then a value type. So I was we turned everything into strings and then When we would render it into a full object it would return back to its native format It's our native native type So this is like, you know what what the table looks like. So you have we have a whole bunch of UUIDs and then the title first name last name age favorite ice cream like whatever you want and then value types So zeros or strings one or ints and then here's the log table that actually accepts the values for those fields So we have The foreign ID of a field ID the value and then like whatever this This value was created this row was created. So this is what it looks like There's, you know, me John Viani 27 favorite ice cream vanilla Fun fact anyone wants to get me ice cream So basically what we do is The easiest thing to do is I cared about every time Every time we created a new Field it would be a new column So I wanted to cache that on the server side like in the application so I could really really quickly construct queries for reporting So what we would do so like, you know, what we do is we create a trigger on New rows the fields table and in that trigger we run this Store procedure and the store procedure. Basically, I We take the new row and I concatenate it into a comma-separated comma-separate comma-separated value and then we call PG notify on the new fields on the new fields channel with the new field payload Something else that you could do here. I thought a little bit was over. I thought was a little overkill But you can use postgres super awesome JSON capabilities so you can Like json stringify the row and then just send this big json object. So that that's another that's a cool alternative But it was a little bit Overkill for this So now we're going to jump out of postgres and in to go So in go this is how you would This is how you subscribe So I have my my driver object So I connect to the postgres I connect to the postgres database. I Have a report problem function, which is just like the error handler function And then on line 26 there you see I create my listener object. So I Basically I connect to Connect to postgres I have a 10-second timeout Call the report problem callback and then I subscribe using listener.listen to the new fields channel And then I panic if something goes wrong So You can use this so what this does basically is this this creates like a dedicated socket So I have my regular postgres connection, which you use for You know to do regular queries and stuff and then when you call a new listener it creates a second socket dedicated just to listening So we have two two connections of the postgres client here, which makes Life a lot easier when it comes to that that message queuing thing because there's no transactions running on the second session, so The queue doesn't really fill up Okay, that's that and Then this is where we asynchronous asynchronously listen to that that second, you know that that exclusive socket to the listener so basically I I have this infinite for loop where I block on that song listening to that socket So while the saga is empty, this just does nothing and as soon as a notification notification comes in This case statement runs I figure out what channel was in the notification In this case like all I care about is the new field channel and then there I take that payload string and I Reconstructed back into a field object for my application So this would be running like in a second this would be like this is a worker So this would be running in like a separate thread or a go routine or you know, whatever And then you know once I've constructed the object, then I add it to my cache So like I said, you know, there's there's possible there's possible issues with you know using this technique The biggest one is that queue You know that that queue filling up is like is a really big deal because like unless you're monitoring the The logs if that queue fills up you're in really big trouble Because the whole all notifications start to fail once that queue is is in trouble another issue is timing So since messages are only delivered between transactions, you know, so they they only they're only Notifications are only sent when the notifier When the notifier's transaction is done and they're only consumed when a listening transaction is done If you have like a lot of long-running transactions going or something, you know long-running transactions going You can lose like this real-time aspect because like the messages kind of wait for each other to be they wait for each other to be ready So that can cause a lot of confusion if you don't know that's that that's how it works So in summary Postgres does this out of the box You know, there's no need so for the application. I just showed or you know for using an email or something There's no need for like a secondary another middleware between Postgres and whatever you're doing because Postgres supplies this this publishing and subscribing this asynchronous notifications just right out of the box Which is it's really awesome and it's really simple which is actually why this presentation is so short because Postgres makes it too easy So the only thing I can leave you with is watch out for really long transactions and Give this a shot because I think it's it's a really cool functionality and it really opens up a lot of doors for a whole bunch of stuff So I recommend it if you want to talk about it. I'm all for it Once again, my name is John beyondy. I work for double verify We are hiring and thank you for listening So does anybody have any questions or questions comments concerns? I don't think that's configurable I know that the the size of the queue of the the message queue is configurable. I don't know if that triggering is I'm sure it is I didn't do it because I figure if it's triggering at all. You have a problem So I have to research it. You mean like the listen that's second listening socket No, it shouldn't because it just it should just It should just like just stay open and then like that, you know, it keeps alive on its own So there shouldn't be any issues with it like timing out because there's just no notifications As far as I know, I left my running for like a week the server connects directly to the database. Yeah, so there's no issues there So that's something I'm really fuzzy on Because it looks like so I looked at how the C library and PQ both deal with this and It looks like you lose mess like let messages are just lost if you if you did especially if you disconnect too long Messages are just dropped Because like they're either cleaned out or like the session doesn't reconnect it reconnects as a new session so it Yeah, I wouldn't count on like counting on the message delivery is not like Super great But it works really I mean it works really well. I didn't have any issues with like I Didn't have any of those issues man. You guys have a lot of weird stuff going on with your life I mean, maybe I'm just super lucky with this But I didn't have any of those issues like I mean when I ran it. It just like Runs like wait, you see me for the whole like little program. I just showed so we just use it as like a So it was for like a It's kind of secret stuff, but Basically, it's just it was like a data store So we were using it to just be able to dump so from different services would be able to dump their objects into this data store And we'd be able to reconstruct That's annoying we'd be able to reconstruct objects like at snapshots in time or like, you know across different domains So it was just like a it was just an object. It was just a data store like an object store It just a text. It's just like a string Probably I don't I try to keep everything short and sweet because like when I was doing this like it had to be really fast So like I mean, I only wanted like a couple characters Just like basically I wanted like UU IDs and then like maybe a title just so for debugging So like I never went above like above like 50 characters Wait, what? Yes Yes, it goes to all subscribers to a single Channel. Yeah, anyone who's registered as a listener. They all get notified at roughly the same time Yeah, if nobody's listening like it enters the queue and then it just has dropped and cleaned out because the queue only holds the messages as long as There's listeners who haven't consumed it. So if there's no listeners no listeners To consume it. It just falls right out of the queue. Yes Wait, you mean like so my go background worker? No, haven't you haven't done that. I Left all the like the concur like all the concurrent behavior like I do that and go I just let Postgres figure out the rest. I think it's only on the master I mean, I had issues with replication all the way around. So I'm pretty sure it's only the master. I don't think so. I'm pretty sure it has to be a Like a defined channel that would make PG notify block I don't think so because it run it doesn't actually do anything until after the transaction has been committed That's it. No, okay the actual The actual notification is run Asynchronously so it doesn't actually notify anyone until whatever transaction was in is done But if the transaction holds up Then the notify doesn't actually get sent Anybody else cool Well, thanks a lot