 Okay, so I'm actually not the first speaker. There was another speaker that came before me. Talked about Puppet and Chef. I think this is a little bit sort of related. But this is the way we're developing software at Flow. So the title is Achieving a State of Flow, Continuous Integration and Practical Event Sourcing, with PostgreSQL, because obviously, this is a PostgreSQL conference. So Flow is a little bit in psychology. It's this idea of being in the zone. So how do you achieve being in the zone as a developer? We don't have any DevOps guys. Every one of our engineers is expected to do DevOps on their own so that they know how to do things. And as developers, we are inherently lazy. And how do we get to automate a bunch of things that quite frankly take a lot of our time? So that's what the talk is going to be roughly about. So a little bit of an introduction. My name is Paolo Lim. I was previously an engineer at a guilt group, the flash sale site. And more recently, I was an engineer at Blue Apron, the company that delivers recipes and meals to your door. And so the founding CTO of Guilt Group called me one day and said, I'm working on this new thing, are you in it? And turns out to be Flow Commerce. So I was one of the founding engineers over there. What we do is try to make global e-commerce as simple as domestic everywhere. And so just a little bit brief introduction to the cross-border e-commerce space. It's a really, really complicated space. You have need to deal with taxes, duties, shippers, customs. Basically a whole bunch of things that sometimes are not even online globally. So that's what we're doing. And I'll think a little bit of the joke is for all of you international teams in e-commerce, Flow is really your workflow without the work. And that's what we're trying to do. So enough of that and on to my agenda for today. So what I'm going to discuss is roughly how we do development over at Flow. And that involves creating a new database and managing schema updates to that. Streamlining our local development setup and event sourcing. So just a little bit of a background on how our technology is at Flow. We're a microservice architecture mostly based on a Scala in play right now. We also have Node.js for a lot of the front-end apps that we have. And a lot of the back-end APIs have their own databases, mostly in Postgres. And that gets pretty complicated when you have to manage all of that in your own local environment. And say you're developing for one part of the architecture, you don't really care about having all of these other things necessarily in your laptop. And I'm going to take you through some examples of this in a little bit. And event sourcing, this particular piece just basically means how do we make sure we capture all of the application states as a really a sequence of events? So when something happens to one of our models using one of the APIs, how could we retrace every state it has been in for since we created it, an up until deletion, without really peppering too much the database with unnecessary fields? So I'll take you through that in a little bit. And I think I might have covered this already, but just as a refresher, it's a microservice architecture, ScalaPlay, Node.js. We use GitHub for source control. A lot of the database instances right now are in Postgres, which are deployed to AWS RDS. We use Docker and Docker Hub. All of our services are in Docker. We use the automated builds in Docker Hub, and these are the ones that build our images, and we'll take it from there. So what I really wanted to do right now is to do a little bit of a walkthrough. And this way it will be a little bit more interactive, and you get a little bit more of a sense of what we're doing over here at Flow. Okay. So over here right now, I have my command prompt, and what I'm going to do right now is to actually, for those of, how many of you here are familiar with the way Docker works or have worked with Docker before? Okay, so a pretty good number. So right now I have no containers running in my, I have no containers running in my environment. Set up on a basic macOS. And one of the reasons we are really bullish, we really wanted to work with Docker was that the way it runs in your machine, the way the services run on the machine on my laptop over here, it's virtually the same as the way it runs in production. And that way, it's more predictable whether we could, whether when you encounter errors in production, you could easily recreate them over here. And I think one of the cool things also is that we were trying to get our non-engineers to be able to run the website on their own laptops. And so one of the cool things we did was to create this development tool called Workstation. Basically sets up your environment with the necessary Docker containers that it needs. And including all of the dependencies and actually executing a bunch of health checks. So, and we looked at Docker compose also but at that time we couldn't get it exactly to do what we wanted it to do. So we built a really lightweight tool around it. So this one is taking a little bit of a while. I tried it over the Wi-Fi over here, it took about a minute or two. So what this one is doing is that I want to start up the website essentially on my laptop. So I do workstation up app dub dub dub, which is essentially the website. Goes into our, we have our registry which is a separate service and sees that dub dub dub is actually dependent on all of these other services. It requires a user service. It requires a splash page service. And those services actually depend on some of the databases that back it. So splash page is backed by a splash page postgres. Users backed by a user postgres. And now it's up and healthy. So when I do a Docker PS right now, I see a bunch of different images that have started running in my container. In my virtual box virtual machine. And if I go back to Chrome and go to the VM 6050 VM, VM is a link to the IP address of a virtual box. And I'm able to run the site locally. So I do a q123. Hello.com, tell me when flow launches. And then what I could, the cool thing that I could do over here is to actually pcquel into the, into the container. So port, I think it was, I think it was 6049 and splash page DB. And then when I go over here, I should see the thing that I just created. The thing that I just created over there. So let's make this a little bit smaller so that you actually see it. So this one is what we feel is a pretty cool feature if you're working. If you were working on, as a developer, if you're working on one of these smaller things like our token service or our user service. And you just wanted everything else to work on the latest versions that are deployed to production. You could just do workstation up, up, up, up, and it's all there. And right now what I wanted to do is to deconstruct a little bit of what each of these pieces actually do. I think starting with the postgres one. So all of our, all of our postgres, all of our postgres repos are named after like the name of the app dash postgres. We follow a bunch of conventions over at flow and that makes how we, that makes for what we, thermos, service discovery, everything is by convention. So over here right now, so what happens when I want to create a new repository, what do we do over here? There's nothing, there's nothing here right now. And as developers, as being lazy developers, what we did was to create a script to basically initialize everything for you. So, I want to create a new postgres project that I do. We have this Go script that basically creates everything for you right here. And that includes this Docker file which all of our postgres repositories have. So for here we're starting with a base image that we defined. We have a postgresql base image. That there's some stuff that it does over there. Some of the things that we do to actually start up the schema and I'll take you through this in a minute about what it actually does. And then start the postgres container. So what we have over here is that in our script repository, what we wanted to do by having this managed repository for all of our schemas was that we could actually see the changed versions as they occur and this makes it pretty easy to know that, okay, we started out earlier today for 20 with this first script. It was initialized with the Go script. And this one basically contains a lot of the utility functions that we used to set up in a bunch of different places. So over here I could do a Docker build that, oops. Let me open a Docker window, okay. So I do a Docker build of that and this one is basically running through the commands that are defined in the Docker file. So if we go through this one by one, so it starts out over here with importing the base image, creating a bunch of repositories that we a bunch of folders that we actually need. And then it starts the database server. And then after that, it goes into the scripts directory and tries to see which scripts haven't been run yet. And that one will define how the container actually looks like at that point. So over here right now, there's, so if I, the Docker, actually it's not do the Docker run yet right now. So up until this point, we have a Docker container. Doesn't really do anything has a bunch of utility functions in it. But so that's the initial setup. So one of the things that we do, say we want to add a table to the database. But we'll, and I have something over here already, a sample.sql. So you start out with one of these scripts. It adds a samples table, basic ID name and some other random field foo. And then I call setup on that table. It basically sets up a bunch of things, utility functions that are related to event sourcing and to partitioning that I'll talk about a little bit later. So what I'll do over here is to add the script over. So I'll do some add sample sql. And what you'll notice now is that it created a timestamp script. At that point in time, that hasn't been run yet. And then when I do another Docker build of that, it will actually go through over again. Now you can see some of the notice stuff was in red that's coming from Postgres. But it's basically saying, okay, I was able to build your database and successfully built and that was great. So now what I want to do is to go into this, just copy one of this because I don't necessarily want to type in everything again. So here, I want to run this image, say 9999 over there. Now this one is running in the background. What I could do is to pc-ql-api-vmp-999-demo-db. So now it takes me to the shell. And then I could see that it actually created my table. Over there, there's nothing really in here right now. And it's just empty that way over here. So that's a little bit about the portion over here where we create a database. We manage all of the schema updates and basically get set up with that. And so, take you a little bit through my slides before I get through event sourcing. So first in this, so we create usually what we end up doing at the start is to create a database in RDS. And I don't know how many of you have tried to manually configure stuff on AWS, but it's a big pain in the ass. And what we wanted to do was to standardize our database set up across all of our developers. So we have this dev RDS script. And using a few simple parameters goes out, creates the RDS database. And then all of that management stuff that I showed you guys earlier, we're using this tool called schema evolution manager. I have all the links later in one of the later slides to all of the GitHub repositories for these things. But for schema evolution manager, basically the idea is to manage the creation of scripts and update the schema in the Postgres database. And you want a standard way to allow team members to contribute. And each of these is SEM schema evolution manager repositories is internally in GitHub, they're code reviewed. And basically you don't get into this issue where someone did something with a schema file and then when you need to deploy it, something happens, brings down the site. Over here, you could go specifically into the script that introduced that change and then you see exactly what broke. So it's a way for developers to be able to contribute to the schema. And then we did a work station up what we, so review, we ended up using Docker. And this one makes it predictable to run on different environments across laptops. Even our business owners, our CEO and our VP of customer insight, they can run this thing on their laptop and it would just work like it would in production. And another thing it makes up the setup of our CI a little bit easier. I'll take you through that in a minute. And then you have this work station up goes through, figures out all of the dependencies of all the services that need to be run to show the website. And then we do that. And so work station up, you bring this up, it just works. Which we thought was pretty nice. As a developer, you don't have to go into your own system and have to download all of the repositories and start them up, figure out what's wrong with each one of them, it's a total mess. And this one just makes it simpler to do that. And then I did one of these demos earlier, you can see it just works. I think to point out here, I don't actually need PCquel running on my laptop, but it's just for demonstration purposes that I could log onto a database and it just works. So any questions at this point before I go on to the next topic? Okay, so one of the things that we found out about AWS and RDS is that- Yeah, I guess we're the benefit of everyone else in the room. If you're running a Postgres and RDS, they have this list of extensions that are pre-approved and it's a very limited set of it. So if you actually want to run, and there's one of those extensions I'll actually talk about in a minute. But you basically have to create a bunch of these tools to get around it. Because the extension that you're using may not actually be available in RDS. Yes, right now it's living in the container, yes. Got it, yeah, so that's a good question. This one is right now in the development environment. One of the, and you're correct that when you shut down the container, all of your data is lost over there, boom, it's gone. But what we've found out right now is that it doesn't really hamper us that much because we're a new startup. We don't have to pull in production level data to get the things to work in development just yet. But one of the things that you could do is to have data only container. Where basically, and then you have your postgres container point to that. And that one is the source of its data. So that's one of the things that we would, and then you could control and tag the versions of that data only container. And that's one of the things that we could do in the future. And if it's really important like seed-based data, then that's one of the things that you might want to actually commit also in the schema evolution manager scripts. I think it's roughly like that. But we really haven't gone into it that much. But I think the idea really is just to have a container where all your data is based, and then you could do things around. When you start up a container, you can mount data volumes on it. And then it could read volumes from another container that's already running. Okay, so that was a little bit about the setup. So we now go a little bit to event sourcing. So one of the things that we actually tried to solve for earlier on when we started up at flow as this idea of event sourcing. How do we get, how are we able to tell everything that happened to every row in every database? And this one is driven primarily out of our experience at the Guild Group before. Where I dealt a lot with inventory management. And when you deal with something as important as inventory management and the auditors later on will come to your door asking like, hey, what happened to this unit of inventory? What you end up having to go through is having to go through a big audit log. And for inventory, there's one way you do that log. And then for something else where what happened to, when a user calls in what happened to my order, or what happened to, I signed up for this email and nothing happened. There was no consistent way to record all of this audits and journaling across the board. So that's one of the things that we wanted to standardize at flow. So there are a couple of ways that we could approach it. One of them was soft deletes. This is something that we actually did before, but we found this kind of clunky how you would create essentially the table, it had a deleted that field or timestamp over there. And every time you wanted to update the rows since it was immutable, except for the deleted that field, you set the deleted that field. And then you insert something else. So that was one of the ways that we did updates. Obviously, this is not really the nicest thing around. Because you're still thinking about, okay, I need to do this, and then this other step. And then before you know, once you have multiple developers, they're not exactly on the same page about how to do these things. And then how can you enforce that across 100 different databases later on? It's just hard to do that. So we decided not to go against that. Another idea was to create a ledger of change events. So one of them, and then you have this field and then there's a lot of inserts and updates, deletes, maybe later. But then you can capture all of these, you can capture these things. And then sort of recreate, okay, at the start I had this, I had this insert. And then here are the other series of events that got to there. And then here's my final state right now. You could do that also. What we chose to, what we ended up doing, was to create more of a journal of states. So this one is mostly trigger based. So anytime we insert, update, or delete the field, we record that in a journal stable. And then this one makes it a little bit easier to work about. We don't pollute the fields. And over here in my temp table, I don't really pollute it too much. It just contains the data that I actually need. And then all of that journal and ledgering stuff is in another, is in a journal schema where all of the changes made happened. So I'll take you through another walkthrough over here just so that it gets to be a little bit easier for you to see what gets to be a little bit clearer. So where we left off over here right now was I added this new script. It shows basic samples and audit data. So what I'm going to do now is I'm going to commit this samples, sample schema, get push, origin, master. That's on git right now. And I'm going back to my trustee dev tool, where I'm going to tag the repository. I've had 11 versions of these already just because I've been preparing for the talk. So what this does now is that I can go, what we've set up with Docker Hub over here is that you can do automated builds. So it can detect whenever I publish a new tag. It creates two images. One has a tag name latest and one is just the tag number. And I will get to the reason why we want latest a little bit later. But let's start with right now it's building two images latest. And we do this across all of our Docker containers. And so while this builds, I'd want to take us a little bit off the engine over here. And that latest tag for that Docker image earlier, we could see we run all of our CI on Travis. And I set up this demo app, this one is a Scala play app. And just like anything else that's running in CI, it needs a database back end to actually run any of these tests. So what you can see here is we've set it up so that it actually looks for the Docker container from the Docker repository. And this one makes it much easier to set up like, okay, I want to run my tests in this environment. The developer shouldn't really be concerned about like, do I have a schema up and running? Do I have, is it up to date? It just happens in the background as we tag versions of the schema. So it will always come up here and then they don't have to worry about it anymore. So that's where we actually use the latest tag. So let's see if it's built now. So I get the success over here for 0.11. Let me start up that image. So over here, demo, postgresql. So now that that image is in Docker Hub, I use my workstation script to pull down that image again. So if you're a developer working on the demo app and someone else is working, say on this postgresql schema, you don't really have to worry about it. You can just rely on the script, pull down the image. And if I go to api.hvmp, 6189 demo DB, all of the stuff is over here now. So now I'm ready to actually try using my sample service that I wrote. So I'm using postman. This one basically allows me to write a bunch of our REST queries against our APIs. All of our services expose our REST API. That's fairly standard across the board. And my local service is not running yet. Let's do that. So say I'm running this demo service and I'm doing active development on it. And so I run this locally. And then now I could run all of these health checks. Let me see. Okay, let me change up the font over here so that it's a little bit larger. So 40, status is healthy. So now I want to create a sample. So I post a sample hello and bar and then I get this back. Just created. So we have this get endpoints, get the hello sample, get this one. And then there's this endpoint over here. So with all of our APIs, we expose a versions endpoint that basically gives you all of the versions that have ever been created. This one is paginated. We could do it like, I think it's IDs, hello. Oops, IDs, hello. So this one just shows me all of the versions for that particular ID. And let's say I want to do an update on it. And so I don't know if you could see this, I can't back. But I'm changing the name to Boo. And I'm changing the foo field to hello. And let's just send that over. And then I receive a 200, it's okay. And then I come back over here. I'm able to see all of the changes as they occur. So I see that the first update was an insert and then the second thing over there event was the update. So and then you could actually see this as it happens in the database. You could see that my samples, that's all of the updated, just the updated data over here. And then when I create a select from journal samples, I see the actual events that occurred up until that point so that it shows the current state of what I'm actually doing. And so running something that involves, running a startup that involves a lot about duties, taxes, inventory orders, that sort of thing. And then you have to talk to all sorts of different clients. A lot of them would want to know what happened to certain things along the way. And this one just really captures everything that we're doing up until that point. And so at some point you want to, let's say you want to delete that. I send a delete message over here. And note that when I try to get everything, nothing is found. There's a little thing over there saying 404, I can't see that thing. And then this one returns, I don't have any sample models inside. Over here, I don't have anything in the database, so that's good. But then when I go back to my version's endpoint, I think this is pretty cool, is that you could actually see at that point. It's not in the table, I can find it and say, okay, what happened to that? What happened to the sample where the ID was hello? Well, it was deleted over there. And then if I do go into the database, I see it here. And then there was a user ID associated with that. And that one came from the authentication header on the delete call. So everything is audited. Everything can be seen up to that point about how it got to that state. Even though it's not anymore in my sample stable. And so one of the things that you might say is that, well, okay, you've been running for a while. This journal table right here could get quite big with all of the updates and inserts and all that. So one of the things that we did was to partition every journal table by month right now. So every month, there's a child table in our journal for that one. This one is created automatically when you do that audit that setup thing, when you created the new schema earlier. So, yeah, it's April now. So if I go into the April partition, I will see all of the data that I just created over there. But if I go into the one from last month, it's not there. And what this enables us to do is to set a policy where we're going to keep data for, say, the past three, six months. Admittedly, we haven't defined what that policy is yet. But it allows you to be able to set that policy and be like, okay, after six months, I'm going to throw away all of the, I'm going to purge or send all of the others to a backup. And then we have this really lightweight schema going forward monthly. So, what we found very useful over here was a consistent way to journal everything that goes into a creator insert into any of our tables across all of our microservices. And it's generally something that a lot of us, I feel, take for granted that flow, but it's just there. It's doing its work. It's useful when it needs to be. So, you don't have to worry about too much additional overhead aside from doing one, four, three. Aside from doing this audit set up step over here. And then everything just sort of takes care of itself after that. So, that's a little bit there. So, that's this event journal of states. That's what that one is. Okay. And here's a couple of the links to all of the stuff that I actually demoed, that we're using the partitioning portion. This came actually really a fork version of this thing from Keith Fisk, who works over at Omniti, it's one of the sponsors of this event. And then the journaling piece was something that we worked on at Guilt. And something that we are actively contributing to at Flow. And then you could download our journaling utility a little bit later when I create the pull request for some of the changes over there. Schema evolution manager, this was written by our CTO. That's the one that basically allows multiple users to contribute to a database project, a postgres project in a systematic way. And then you have a way to track all of the changes that went into the schema. There's a sample repository for postgres that the splash page postgresql, which was used to the back end for this website over here. So you could have a little bit of a look at it if you want. So I think that's it. Here's my email, you could reach me at paolo at flow.io. I'm on LinkedIn also. I think that's pretty much it. Did anyone have any questions? No, we haven't heard about Sketch. But so I think one of them, and thank you all probably will look into that one. The schema management tool that we ended up using, this one was built at a time when at our previous company guilt we were moving over from a Ruby and Rails monolith app to a microservices architecture. So I don't know exactly when Sketch came out. But at around 2010, 2011, when we sort of got used to this way of doing code-based migrations in Ruby and Rails. And so I'm not sure how many of you are familiar with it. But it's basically the same concept. You write a bunch of Ruby scripts there versioned. And then the gem basically looks at have I run the script before? If not, I'll go ahead and run it. And so that's sort of something that we tried to do by writing our own schema management tool. And to be agnostic whether you are using Postgres back into Rails to Scala. I think some other groups tried to use some other languages. But as long as it was a Postgres repository, it was fairly easy to use. So it gained pretty wide adoption within the organization at that time. So it was pretty good. Right. Okay, awesome. Yeah, that's a good question. Let me take you through that one. Some fortunate this doesn't get a little, this doesn't get. So let me run this. Run this again. I don't know if you guys could see these headers over here. So what I'm going to do is I'm going to generate this in curl. You could see it in the command line here. So what this one does is it sends an authorization token over here. And basically this token maps out to a user in our database. And so if I send something that actually doesn't exist, this one should return a 401. It's unauthorized. It was able to figure out that the token was invalid. It doesn't map out to our user in a database. So for the earlier one, it maps out to a user in the database. We get that user ID. And then if you notice over here, what we do is to set the updated by user ID over here in the samples database. And what the journaling essentially does is that it sees this row, gets a copy of this row, and then puts it into the journal table. And so it just really copies this row, puts it directly there. It doesn't do anything else aside from that. Right, yes, it just does that. Yes, that's correct. So I have a little less than five minutes left. So does anyone else have any other questions? No, we haven't. So like Sketch, it's one of those things that we didn't actually hear about just yet. So for some of these things, it came about at a time when there weren't a whole lot of tools yet. So we essentially built our own open source tools around it. I think that's a good question. Right now, we haven't run into any issues with time stamps yet. But it would be interesting to hear a little bit more about if you did run into issues around these things before, what they actually were screwed up. And I said, oh, we're going to just write it as this one. So we're going to do the time stamp that's screwed up. Got it. I think for one of the things that you actually mentioned, we do have a select from journal.samples. We do have this updating field journal.id over here. And then that one is just a sequence of integers. So I'm not sure if that one actually helps, but I think it's a little bit close to what you were actually saying.