 In zelo, da nekaj nekaj koncistensi databazov, nekaj nekaj nekaj koncistensi. Tako, to je najbolj problem. Zelo, da je mongo dompo, zelo vsega utilitva for mongo dompo, for mongo vsega, vsega rečenja, sekvenšelj document after document. And the most terrible thing that if it can take a while, if you have plenty of data, it can be 10 minutes, 20 minutes, and if your application, your backup in the middle and your application change document at the beginning of the collection and at the end of collection, you will get backup with old document in the beginning and changed document in the end. So you don't have consistency by default in backup in MongoDB. But what we can do, we have a kind of workaround there. We have Oplog option for MongoDump, which is pretty simple that option store all operation, write operations to your collection in separate file. And that's great. After all, you have a backup and you have Oplog. And during restore, Mongo restore tool, restore a backup and after that replay that operations to your database and you have consistency, but consistency at the end of the backup. Not like in other databases where you have consistency of the beginning of the backup. That's great. That's solution if you have only one chart, only one replica set. But what if you have multiple charts, that's become more interesting. If your idea is the next, what is sharding at all? It is when you split your data between different charts. So each chart have own unique part of data and serve it by own hardware, own set of hardware. So you have multiple charts with unique data. But the most interesting thing is that different charts have different load and different speed of disks. So unfortunately, you cannot finish backup on multiple charts simultaneously on the same transaction. So chart A finish Mongo dump at one transaction, chart B finish own backup earlier. So in the end you cannot make consistent backup. Even if you are doing snapshot, you cannot finish backup on the same transaction. So that's why we created the tool Percona backup for MongoDB. What is the idea of Percona backup of MongoDB? We are running Mongo dump on all charts, but screening of log until the point in time when all charts finish backup. And after that we stop on the same transaction, on the same millisecond on all charts. And after that we just when restore, we replay and have consistent backup on the end point and point when backup is finished on all charts. So Percona backup for MongoDB. When we need sharding, in general we need sharding when one hardware cannot handle whole your database. It is tens of terabytes, for example. It is not feasible to handle such big size database on one hardware machine. So in that case you are using sharding. That's why we decided to create an agent. We decided to create agent which will run backup agent which will run near each MongoDB process. So it is not like Mongo dump tool which you can run remotely. And it is not like tool which can connect to all shards and dump data from them to one point, to one machine. It is impossible when you have huge database. So we decided to have PBM, Percona backup for MongoDB agent on each machine and it connects directly to MongoDB. You don't need to open any ports. You don't need to care about quantification, about TLS, encryption about anything. That agent just locally connects to your MongoDB and all communication between agents is going through MongoDB. We create just system collection inside MongoDB and communicate through it. Also we have CLI tool that simple. It also connects to MongoDB and send commands to agents through MongoDB. So how can you start? That's pretty simple. You need to create special user which have all needed permissions in MongoDB like dump your collections, dump system collections, create system collections. And after that you run PBM agent locally near each MongoDB process including config servers because config servers should be also be backup. After that you configure remote storage and you run PBM backup. Every simple. What's the current state of the project? First of all it is production ready. We made it GA after year of development. That works fine. We support 3.6, 4.0, 4.2 versions. And I am proud to say that we support, for example, 3.6 because it is hard because 3.6 don't have transactions in MongoDB. So that was also considerations about design and architecture and other stuff. We support two kind of storages. First kind of storage is, of course, S3 protocol. It is Minio project which support S3 protocol open source tool. It is AWS S3 and Google Cloud storage also support S3 protocol. Also we support second kind of target. It is locally mounted network attached storage or storage attached network. It is like NFS, you can mount it into some directory and doing backups into that directory. So it support that also PBM selects appropriate database instance. For example, if you have shadow instances or secondaries it will take back up from secondary or shadow instance, if not from primary. And, of course, we support zip compression for Oplog. It is important because it is just operations, list of operations and zip for dump itself. What is our roadmap? Our biggest goal in that tool is, of course, 100% open source. It is point-in-time recovery. What is point-in-time recovery? Point-in-time recovery is when you can restore it to any point-in-time, for example, between backups or between the last backup and current moment of time. That sounds like magic, but in fact that's pretty simple. We just stream Oplog continuously, always, nothing else. But it is great possibility if you made wrong operation on your database like deleted whole collection. So you can roll back before that moment of time. Exactly before that moment of time. Also, we are going to add some UX improvements like we want to have ability to delay backups from CLI tool, from S3 or from locally mounted storage. It is not a big deal, but currently you need to clean up your backups manually with that possibility it will be not needed to do manually. You can just send a comment to CLI tool. Also, we care about user experience. If you have plenty of instances, for example, some of our customers have like 200 shards. If you backup such huge cluster, you must have centralized logs, because you cannot log into each instance and check logs. That's terrible user experience. So we want to have centralized logs from CLI tool. Also, CLI tool right now is non-blocking. It is just sent backup command and not blocking, just not waiting for anything. After that, you can check the status. But in future, we want to have blocking mode and some progress bar maybe or something. What is next? We want to have support canceling backups. We don't have such functionality right now. So if you run backup, you cannot stop it. And next thing, very interesting. Perkona has own fork of MongoDB. And in Perkona fork we have functionality which allow you to do physical backups, not logical backups. Logical backup, it is like you dump a collection as it is. But physical backup, it is like snapshot, consistent snapshot. It is most important thing in MongoDB, where consistency is eventual. So that's functionality. We want to use that tool to support that functionality and support both logical backups and physical backups for MongoDB. And that's all you know. We are hiring. Also, please attend Perkona live if you care about databases and I am ready for any questions. First of all, if you have Kubernetes cluster on Azure, maybe you need to use some MongoDB operator. Honest, because MongoInc, MongoDB operator or Perkona one, MongoDB operator. In that case, both of them supports backups out from the box. You don't care how they are builded in. But anyway on Azure you need to spin a Minio project, which is open source project. And Minio project allow you to backend. It implements S3 protocol in the front and Azure blocks storage as backend for storing data. So you need Minio. Of course, it depends on your load. Sorry, that's... So large databases, it is mostly about speed of two things. Speed of your disk and speed of network. Yes, it can take on... I mean, it is just speed of MongoDump and your network. We are doing backups through the MongoDump. Yeah, so it does not have any performance penalty bigger than that. Yeah. Right now, no, but it is in our roadmap, the first and first point in time recovery. So we want to support incremental backups. Yeah? Good question. Nice. Yeah, so question was, does PBM support encrypted backups? At that point in time, we do not support encrypted backups, but if data was encrypted in database, we can get that data. But sorry, answer is, it depends. If you use S3 like a destination, you can transfer your data through the TLS. Yes? So it is encrypted in transit and S3 protocol allow you to set header that encrypt the data with such key. So S3 provider like Minayo or AWS can encrypt your data when they store it to the disk. So it is encrypted at rest, it is encrypted in transit, but it is not encrypted by tool itself. Yeah, thank you a lot.