 Yes, I'm Valentine Valentine which are really and I'm a long-term speaker on this Conferences and this time I decided to Talk about more about how one can use postgres in your development infrastructure and Not exactly about how you do things in inside postgres directly So what is salando a very short introduction because we are in the States Probably not many of you know salando salando is very big Retail store we sell shoes and fashion items and We are growing quite rapidly. We have many tech hubs in in the whole Europe and So when I started at salando five and a half years ago the technology department was only 50 people Now it's 1,100 people so you can imagine that that's kind of the rapid growth of the Organization of technology organization made a lot of Pressure on us to improve our processes to make things work nicely so How did we start we started actually as the good old Three tire application The application back-end and the database It was my sequel database. We used PHP and magento as our Kind of for four actually prototyping But we were growing this five and a half years ago we were growing with the speed of more than 100 percent per month and At the moment when I came there to help the team with Postgres We've been the biggest magento users in the world We were patching it like hell. It didn't work anymore. So we couldn't scale it So we decided to do the reboot project. So the reboot project was a very Interesting endeavor we rewrote everything from stretch in Java using Postgres as our storage infrastructure and we went into relatively radical approach The our CTO back then had a lot of problems with other projects using Java and hibernate. So he prohibited using hibernate so We went and also transaction managers and it was quite a nice pressure on Us as technologists so we went on Postgres using very heavily these top procedures and We did it very nicely. I think so this is the typical way how the micro microservices back then worked at salando and Every service at its postgres database some databases were shorted and What we achieved by using the store procedures that actually store procedures give you the possibility to reduce the transactional scope you are your Kind of as a developer you cannot simply open Transaction and then have and then close it half an hour later when you are finished sending files to the ftp server Yeah, so you really have to think differently when you have to work with the store procedures And it also helps you to really do it very nicely in so reducing transactional scope Thinking differently about your business logic. That's what what we achieved actually with this approach. We're also Using store procedures led to very clean data states in the databases because the store procedures were containing more or less Data logic as I call it so it's a more logic than It's more logic than business less logic than business logic of your application But more logic is that just a simple foreign key constraint. Yeah, so And processing is very close to data. So that means you can process a lot of data being directly in the in the database To to support our Java developers. We also implemented sprock wrapper that Helped us to map complex types easily and that helped us To do sharding It also converted our postgres databases into more or less RPC servers. So the from the Point of view of Java developer. You are talking to the database as as if you are calling the functions From from your RCP server or more or less from from another interface I Talked about post about stop Java sprock wrapper Already in previous talks and many times So if you are interested in this just come to me after the talk because this is not the main topic of this We also introduced Debative based schema versioning. I also described it in my previous talks And it also improved the possibility for us to really move fast with our day kind of Schema changes for now for example our technology team is Doing more than 100 schema changes per week in our database Service and it's kind of they really don't have an issue with changing schema adding new columns. I don't know Improving on store procedures But as every fairy tale It comes to the end. Yeah, so we we are still growing we Wanted to wanted to change our organization a little bit more Though we implemented everything ourselves and the architecture that we built Works until now very efficiently We actually have something like this. Yeah, so this architecture was forcing us into into very rigid technologies tech and We started losing Patience of our developers. They want to be cool. They want to use other interesting things but Yeah, so the new era came last year actually to our organization We call it radical agility what radical agility is that every team gets autonomy as much as they can get they get purpose from the company and They have to achieve mastery in what they do and this works quite Good so far, but the problem with autonomy is that actually They can choose every technology stack. They want even our Postgres team is now more or less fighting for Kind of to to convince others that postgres is still cool. Yeah, so and they're doing quite good job It's not so difficult to go to show how good postgres is But still the persistence layer Should be chosen by the teams by themselves and they can work in AWS before that Move we were all all concentrated in our own data centers Going to AWS in Germany was unspeakable for a big company So we are breaking taboos there as well but of course as a public traded company we have to be audit compliant in most cases of course in every case and the audit compliance Brings us to very interesting Things that we had to work a lot to to enable teams to to to to work in AWS. So Anarchy autonomy is anarchy. No autonomy is not anarchy in autonomy. We even have Constitution for the teams where the rights and Possibilities are defined but What we did to support them in AWS we created stoop stoop infrastructure that enables us to comply to the audit requirements from Kind of from from the PCI authorities and from governmental authorities The stoop is not so easy. So I myself have Problems understanding it and definitely stoop is not the topic of this talk. I simply wanted to Say about this because it's kind of the all the tools that we are building. They're based on the infrastructure here and Yeah, it makes it so quite so if you have questions about this, please ask afterwards so Also, one of the big decisions that was made for Supporting autonomous teams and is that we said autonomous teams will be building microservices The microservices the the applications running in the form of of microservices will be communicating with REST APIs and Databases are hidden behind the walls of AWS VPCs What we do in the database then we are not managing the central databases as we did before the database team is consulting autonomous teams and Spillow is our appliance that we built that is using using Patroni as the high availability system can be used to To support Postgres, I will mention Spillow and Patroni later a little bit more But the problem with microservices is that Classical ETL processes are not really possible. I think I'm missing I'm missing some slides No, so and and I will Describe and try to describe why ETL processes are not so easy in the world of microservices So if you are if your organization is running towards the microservices Please think twice about this decision So the classical world everybody is really very cool and everything works nice. So the developers they work with the I even have the junior developers there, you see They they write the applications they come to DBAs for consultancy and the our business intelligence And scientists kind of getting data From the databases that we manage our developers Sometimes even didn't know that the data was transported to the back ends of our BI systems and analyzed and reports generated and This is the classical way of doing the Data warehousing and business intelligence you have the ETL process you extract data from the database and you do the reports With more teams that we had and bigger services so to say we also had the classical ETL process a huge database Everything was fine, but Classical ETL process has some disadvantages as you understand as well because mostly it's a very hand driven process so you you have to write your ETL process by hand mostly and Yes Advantages is of course that you can put everything that you prepared by your hands into that well structured SQL database that can be used by your analysts But the microservices what what is what are microservices there are kind of if microservices lead to fragmentation of your data kind of making even Smaller areas of domains that are separated into different systems and Everything is there are no there is no access to these databases directly for For the data scientists or or business intelligence people Yeah, and our world is kind of Going down. Yeah, so but we have the solution Postgres has the solution for that So what what we are working now on and then in the prototyping stage is that we are building the system that will extract Changes for kind of flowing from the teams Make it available for stream processing making available for kind of archiving in data lake making available and materialization into some kind of automated Dev a TV age process So One of the possibilities is that business logic itself writes data into your kind of collection system Bypassing the database but this problem has a huge This approach has a huge problem and this problem is not visible from the beginning if you haven't thought about it Enough so to say so first of all, it's very error-prone If you don't have to make the way how you push changes that you do to your entities in the database You will be just making a lot of mistakes Another big problem is the problem of Double-write yeah to do double-write into two systems and to have consistent consistent State of your data in your storage and in the bigger storage of the business intelligence people on world You have to do a two-phase commit and doing two-phase commit is usually very inefficient It's very difficult to implement By the end it's very difficult to implement Another possibility is to extract the changes directly from the database This approach has advantages You you cannot miss anything you you write to your postgres postgres commits it and then Generates the wall replication stream that actually contains all the changes that you need to your data And no additional work is needed on the business side so on the business logic side the your application writes to the database as if it was the database and All the information all the changes that are happening to your data Automatically extracted of course there are some can now I'm a little bit lying about the simplicity of this approach Because you still have to map your entity somehow So the the this arrow there could be quite complex, but still this is one central kind of central centrally managed not managed but developed system where bugs can be tracked and fixed in all your Infrastructure and not in every application and every business logic that you have to look at so How to implement it so actually I want to thank Simon ricks very much for Envisioning and pushing the ideas of wall replication in postgres When he when I was talking to him several years ago, and he was pushing this I didn't really understand the whole importance of the of this Feature that is come that came to postgres and this feature is really immense So we have already Pgeological from second quadrant that allows you to extract data from the replication stream There is a bottled water by confluent that extracts data Converting into Considerizing in Avro and pushing to Kafka We also did some work on the bottled water and we patched kind of forked it to extract data in jason for our purposes and We have a tab on System that we use for parallel snapshotting so you can extract the whole existence database Into your stream and then go on fetching data from the logical application stream How Does it work so you have the streaming plug-in so you can use either of Two or hopefully more plugins that are coming in the future You do is the snapshot of your data when you're starting your process Into the queue and then you use the stream processing application to go on sending data Into the queue this way another project of ours is a I'm Really scared to say this it's kind of rebuilding the enterprise message bus It's a small broker on top of Kafka to enable us to Kind of monitor better and to Keep track of these structures that we are pushing into the Kafka But this is also out of the context of this talk so you you can you are more than welcome to ask me Questions about this and I will be really very interested to discuss this topics for anybody with anybody who is actually related to Messaging buses and event sourcing and all this things and stuff This is this are very interesting topics, and I will be really very glad to discuss these things with Yeah, try it yourself so We worked very hard To push the possibility of Decoding logical replication stream. It's kind of fetching logical application stream in Using psychopg unfortunately the pull request is not yet Merged but everything is done. So we're just waiting for the committer to to push to to merge so you will be able to Experiment with data streams that are flowing from the From the Replication logical application slots and rewrite it or push it into Kafka or push it into Send it to the moon or whatever you would love to to do with it Yes, I wanted initial snapshot export tool as I already said and Yeah, the projects that we are doing are all open source other projects that we do You can have a look on on them at these tools that I mentioned on in previous years There are other tools that are available in our GitHub account Yeah, I'm somehow very fast I Usually I trained to to to do to do it faster, but Yeah, it's I usually need 35 minutes to do to say that okay, maybe I was talking too fast or There were no questions at all. Yeah, so but now you have the possibility to ask questions. Yes Yeah, mm-hmm. I Didn't understand the question Stoops this the So the question is I think the this stups you mean stups in structure Yeah, so what what stups into structure is doing? Yeah, so the stups into structure actually manages so what AWS is usually is the The audit log of what you do so from the point of view of auditors you always need to know so when when something is running in your AWS account you always have to have a possibility to say that the Okay, this application is running. What is the version of your code in your? Versioning control system that is running it and Who made the change that is responsible for this? Kind of for this application. So that means that there there should be From the audit auditing perspective at least in Europe What the requirement is that you have to always have a possibility to say that any Merger in in the production code should be done using the four eyes principle and So somebody should be reviewing it and this should be documented who reviewed it and then rolled out When you roll it out you have to have a possibility to track from from the version of the application and runs to to back to the commit message that was kind of Leading to Emergence of this application that this is application These are requirements when you start to think about how to implement it It's not so easy because you have to really track all the services that you have you have to track the versions and You have to somehow register the versions and so on. Yes. Yeah Yes So actually there are the systems that Manage the Security so you we have the we use OAuth for all the authentication for all the rest services that we that we have and We have Also, for example, when you are rolling out your application, we have the special Sensor plugins that are using cloud formation to roll out the changes Kind of to to package and in it in a we use docker as well So you your application is packaged in a docker and rolled out in specially prepared Linux version so to say that is rolled out on the on the EC2 instances and this Linux version includes underneath the integration with Logging services like Which which logging services I think locally and it integrates with our other systems that are kind of Sending out the kind of storing the audit log smallest and the whole system is open source. You can actually replicate it but You will definitely need to Read documentation a lot to understand how this thing works and it it makes sense only for big organizations I don't think that for smaller organizations. It makes sense to add this additional complexity For for you for you to kind of to work Yes Yes, yes, so this is the project that psyche team is doing so this this bi team that is here so this this team so the Automated entity modelization engine what we what we do is the that we We try to have the registry of all the entities that are flowing into into our Kafka system and Using this registry of events or registry of types so to say to Materialize them from Jason into Tables kind of tables in our Deveha system and then bring the Entities that are spread across different services Into one materialized super entity. So for example, if you have customer information customer payment information customer address information customer I know Risk scoring information so all these Topics could be actually merged into one super entity because we know that all these entities use the same primary key as Customer number so you can actually merge them into one big structure That will be much easier to use by the analytical by the analytics Teams then just having many different entities so this is the We're now experimenting with that and Yeah, I can show you some prototypes if you if you're interested Yes the the question is Why do you want to why do we want to do analytics separately from from the machines that? So why not to do analytics directly on them on the databases themselves? This is actually very close to the previous question the problem is that Microservices keeping the databases separately They don't have the full picture so you cannot do analytics on Data you cannot actually correlate data between different systems If you don't have access to different systems, and this is more or less the reason why we're trying to collect Data from different flows and from different applications Yes, yes the question is that Or the claim is that microservices Usually so people who write microservices Are not using Relational databases, yeah, so quite often I don't know if it's a property of microservices because I think it's just Happens so that people who happen to write code during the era of microservices don't like Relational databases So what we do we talk to people and explain to them all the problems of That they will see with non-relational databases And we have several teams that went to Cassandra and then went back because the modeling is so complex and it's Actually, you need Cassandra only if your application should scale really Really really very much and that will not definitely fit into Postgres interest so world so We talked to people we try to understand what their real use Real usages are and we also train people to to understand better how Postgres works so in a salando we have the whole set of trainings for our developers to kind of for understanding how Postgres works What sequel is and what are the? Advantages of having one master for now Then What one master and not kind of classical non non no no sequel database Yes The question is how much Logic is extracted away from the database to a business logic I think that there is a tendency of the developers to Bring the logic into the zone of comfort So they tend to extract as to put as much business logic as possible into the Environment where they feel most comfortable and these environments are usually not PLPG SQL store procedures So we have a very strange situation that it's kind of older people who know very well how PLPG SQL works because they work with them they they tend to keep business logic and data logic there and people who are kind of new and Haven't had all this experience. They are tending to push business logic into the into the application I Personally think that there is some there should be some kind of a golden Yeah golden average and I started as I said talking about data logic and I ask people to put put the Logic that is related to consistency of data into the store procedures or into SQL and Yeah, do the calculations in Java He didn't I think we didn't roll out the PLV 8 on our Classic on our old databases. Yeah, but now I think spillo appliance that the database guys Developing includes all these store procedure Languages and they're also experimenting to include the site as DB for example into the appliance. So Yes Yes, the question is how the infrastructure people are distributed between the teams. So we It's complicated question. So the the thing is that we are coming from the Organization where infrastructure teams were centralized. We had the platform team so to say where the database team was also part of that was doing all these infrastructure related things and Database if somebody wanted to install a new database, they were coming to database team If somebody wanted to install new Redis system They were coming coming to database team if they wanted to I don't know to bootstrap new Apache machines Tomcat machines. They were going to the system team Now the database team and system team are more consultancy teams. Yeah, so they still also take care of the old vintage applications But the the idea of DevOps is kind of we support this idea So we want people to learn themselves how to manage their services and AWS helps a lot because in the WS you don't need to think a lot how to Bring Redis system up. Yeah, we just we just bring it up And but this is a good question. Yeah, how how to What is the balance there? So but I think that the it's quite difficult not to have a centralized team that Maintains the know-how and We we want to keep this for these People with knowledge that will that are working as consultancy for other teams any other questions I'm really I really feel stupid that it was so short I actually actually I have another 40 slides because I Thought that there will be some questions that there will be and kind of showing and going on Answering that the questions are going into the other direction Yes The question is if one can compare microservices and the old way of doing it, I think microservices leads to For lead to the process of rethinking of what what is your What are the borders of your domain model and what are the borders of your application? I Think it's it's a good direction, but One has to consider the problems that that you you have with it Having in monolith is not inherently bad I Have completely different ideas about the I personally have completely different ideas about microservices I I personally think that Microservices is just another step to something better And lambda architectures are rising and I Think that it still the development of the Architectures will go into something that is more Kind of data and analysis driven so where you have the execution plan for the whole application and the Execution planner will distribute the your code on different machines and so on so but but it's my crazy ideas that But this this actually happens already with the lambda architectures if you look on the AWS lambda and all other big Cloud providers are working in that direction It's more or less happening already You can write your application the whole application can be written in different languages in the form of small lambda Snipplets and then they will work together and what what is still missing is the execution planner that will Understand and measure the flows between these functions and then make decisions where to put them How to run them because theoretically you can put the Yeah, but as I said, this is a More I would love to discuss these things with people who are interested in it Yes Yes, I think I think this is a side guys now That that goes into that direction. Yeah, so automatic scheduling automatic understanding what you need and how to distribute the things I think in the future and relatively close future we will have something like Actually now we never think about how the Processor is distributing Processor commands between the course. It just happens without our knowledge So this should happen also for the for the bigger scale for the applications So we have still a lot of time for questions I hope it was not too short or too boring