 So I guess we're about ready to get started. Well, good morning, and thanks for coming to the session. My name is Alexis. I'm a software engineer working on the Google Cloud Platform, specifically on the Cloud SQL team. So we have two relational managed services, and we recently announced Postgres. So briefly, what we're going to be going over today is a. I think your mic is off. OK, can you hear me better now? OK. So briefly, we're going to go over the Cloud SQL architecture. And part of that is persistent disk, which is the storage. Next up, we have data integrity and protection of the persistent disk. And then we're going to go over performance of Cloud SQL architecture. And at the end, we're going to wrap it up and see how the architecture is going to benefit Postgres and users in general. So in short, Cloud SQL is a fully managed Postgres and also my SQL database. Also, it's accessible from just about any application anywhere, because it is in the cloud. Testing. Should I start from the beginning? Let's keep going. All righty. So one thing about Cloud SQL is that it's affordable. It's no commitment pricing for applications of any size. You can scale up and down the instant size. As I mentioned, we're supporting two types right now. So just as a show of hands, how many of you are using or your company is using managed service right now? OK, it's about half. So as I mentioned recently, we've added Postgres to our offerings. And we announced that at a GCP Next in San Francisco. It was a great event. It brought together around 11,000 attendees. And it's a platform for us. GCP Next is a platform for Google to announce different products and changes and launches. And it's a great way to bring together the community to which uses those services. Although everyone here is likely very familiar with Postgres as users, contributors, or both. Running Postgres as a managed service may not be so familiar. Supporting Postgres has become a necessity as more and more businesses rely on it for a data store, which is one of the contributing factors for why we added Postgres as a managed service. Just at a very high level for those who may be new to Postgres. Some of the noteworthy points about it is that it's been an open source community with 15 years of active development, strong standards compliance, and probably one of the most powerful features is its extensibility with extensions. So for those of you not familiar with Google Cloud, it's a public cloud. And there are multiple storage options to choose from, including relational, non-relational, objects, warehouse, and also in-memory. So as you can see in the relational category, that's where Postgres fits in. So one thing about a fully managed database service is that it's very easy to set up, maintain, and manage. And that's a benefit for those who are looking for something that's more of a turnkey as opposed to running something on-premise. So just going over why running a database requires a lot of effort is from the very low level. You have to worry about things such as power and the rack and server maintenance if there is components failures. So that's that slide. Additional concerns about running a database are you have to back up the database. You have to patch the database. You have to patch the OS. And then, of course, you have to install it to begin with. So bringing that together and why a managed database service is beneficial is that it makes scaling easier because the managed service takes care of that and also adds in components such as high availability, which may not always be straightforward for someone to set up the first time they're setting up a database. In addition, there's monitoring, which is provided. And so I'm sure many of you have gone through the troublesome process of setting up monitoring and then worrying about not only keeping the database running, but also keeping the monitoring running. So a managed service makes it a lot easier by providing all these components. So something in terms of details on the instance sizes that we offer at the Google Cloud Platform for Cloud SQL, so you can go up to 32 cores and announce that GCP Next was being able to go up to 64 cores. And so we plan to add that soon for Postgres. You can add up to around 208 gigs of RAM, which is quite a bit. And also, there's on the other side, if you want, development instances or to test out a project. There's also very inexpensive instances that allow you to get up and running quickly. And so one of the things that I'm going to be covering later is the disk part of having a database. And for storage, we offer solutions from 10 gigs to 10 terabytes, both on the equivalent of magnetic disks and also SSDs. So when you go to the larger size disk, there's higher IOPS. And so we can see up to 25,000 IOPS on the largest disk sizes. And one very strong point for the storage that we have is that it automatically increases in size. So you don't have to worry about running out of disk space if there's large influx of data in the evening. Also, as I mentioned earlier, we have backups. And so providing backups, the way that we offer that is through snapshots. And so there's oftentimes, when you have a backup, you worry about the consistency of the backup, how quickly it can be performed, whether or not it interferes with the database itself. And so that's taken care of by snapshots. So there was also maintenance windows. So we released beta about three weeks ago. And so then in that feature set, point-in-time recovery is not offered. However, it's on the roadmap. And we plan to offer it before GA. So part of having a managed service is that occasionally you do need to patch the database or provide an OS patch. And so then we have maintenance windows, which you can specify so that it doesn't interfere with your business needs. And as everyone probably knows, extensions are a huge part of benefits for Postgres. And so we have PostGIS as well as a few others. And we are actively adding more as time goes on. So that's one thing that I guess I was just asked was point-in-time recovery. And we're also going to be providing HA for Postgres currently in beta. We don't have HA. However, it is on persistent disk. There is also a roadmap to add read replicas and additional connectivity. So every user of Postgres has certain requirements that may differ from others. And different businesses have other different requirements. So on the next slide, there's a list of different requirements. And so if I can just buy a show of hands, I'll go through each one. And if everyone can just let me know which one is the most important to them. So first up, we have HA, High Availability, so if I can just show the hands. Yep, there's one. You can pick many. Yeah. So next, we have read replicas. How about App Engine? Let's say Google Cloud Service. OK, not so much. And then Cloud Functions connectivity. That's also for Google Cloud users. Point-in-time recovery. And third-party extension support. Those are extensions outside of contrived. OK, so the way that Cloud SQL looks in terms of running Postgres is that the OS image is a Google Cloud image. And that's running within a GCE VM, which is a Google Compute Engine VM. And within that, there's the Postgres processes and also Cloud SQL agent processes. And those include monitoring and services such as restarting the database in case it crashes and also logging. And these are managed by a control plane. And so the control plane takes care of managing the metadata for your cluster or your instance, which is oftentimes one of the most difficult parts of having a cluster is the configuration management and keeping everything in sync. So with beta availability of Cloud SQL for Postgres, including the GIS extension, it allows Cloud SQL to now take care of a lot of the mundane tasks that, otherwise, you would have to be responsible for, which is one of the major benefits of a managed service. So we have also partners that we're working with in order to integrate usage of Postgres more. So some of these names may look familiar. They have to do everything from ETL to analytics. So probably onto the interesting part, what is persistent disk? So persistent disk is the storage layer that Postgres sits on top of. And persistent disk is also Google proprietary technology that we're making available through this offering and also in general through GCE, which is Google Compute Engine. So with Cloud Storage, we also have Object Storage. And then persistent disk or PD is the block storage. And so PD allows EXT4 or another file system to be ran on top of it, transparent to the file system. Actually, I may have skipped that one. So a little more detail into persistent disk. So persistent disk is attached to the VM. So you have the VM, and it's using the root volume from local storage. And then the persistent disk is actually network-connected storage. So there may be some misgivings about having network-connected storage, but later in the slides we'll explain why network-connected storage is actually very performant. So with log-structured volume, it's basically data is written into logs, and the logs are then merged, and there's a map kept of the logs. And so then that map is also stored in the underlying layer for persistent disk. And it's able to keep track of that and also recover in case there is a failure of a particular machine. So I say a particular machine because PD is actually spread out over many hundreds, if not thousands of machines when any single machine can fail, it doesn't actually impact PD. So now a little more detail on LSV. Now, all writes are appended into a log file. And as I mentioned, there's a map which also keeps track of the metadata. And so the greatest challenge with this at scale is the number of bytes over the number of machines. So Cloud SQL uses PD much like any other service running on GCP would use it. Cloud SQL runs as an application in the guest OS. The hypervisor running the guest exposes as a SCSI device using virtual SCSI. The PD block data ultimately resides in Colossus. And so Colossus is a Google proprietary technology that I guess really became public around 2010. So a sector. A sector is the smallest atomic unit LSV can read or write. This exercise shall be an exact power of two. It's a configurable parameter defined during creation of a device. The default sector is 512 bytes. Since sectors are smaller than blocks, a right to a sector range not aligned on block boundary requires a read, modify, write sequence and consequently is slower than an aligned right. Block is a configurable unit of physical data access. The block size is set at the moment of volume creation and is fixed for the lifetime of volume. The block size must be an exact power of two and multiple of the sector size. The default block size is 4 kilobytes. No, so what I'm describing actually is PD. So the question was if it's a Postgres or a Postgres. So we don't modify Postgres. What I'm describing is the underlying storage layer for Postgres. So for example, if you were to run Postgres on premise, maybe you have a RAID 10 and you pick your SSDs or whichever disks you want, and that's the storage layer. Maybe you have a RAID card with battery-backed or not, or maybe it's just a single SSD. But all of those are different storage layers. And so then what Google has is a persistent disk, which is a network-based storage and it attaches to the VM and then it allows. And so Postgres is just a process running on the VM. And it's able to use the remote storage or PD. This is not related. No, it's not related to Postgres block size. So a benefit of this is the ability to restore the state of the system by scanning the metadata of all stored blocks. And it's an important design concept since metadata files are updated infrequently. So metadata files are, as I mentioned, the ones that keep track of the block files. And so if you were to, for example, provision a 5 terabyte disk, those blocks would be in the metadata from the beginning. It is basically an append to the lock file. It is now. Does that mean you are favoring rights by turning them into sequential rights and then basically turning every read no matter whether it's logically contagious into random I.O.? Correct. Yeah, cool. So the rights are sequential. And the reads are not actually sequential. I mean, sorry, the reads are not random I.O. They are sequential as well. And so there's another slide where we go over how actually the reads are optimized. It's pretty interesting. So a block map is necessary for issuing reads and copying snapshot data, compacting data log files. And there's two classes of block maps. Intermediate checkpoints describing the state of a live device at some moment in time. And also a second class of block maps is a block map that provides a location of all blocks at the moment the respective snapshot was taken. So this is for your question, life of a read. Assume the user wants to read 24 kilobytes of data. Using the block map, the device manager builds a list of triplets and that's the file index, file offset, and the logical block index. So there's a few bullet points here where you can see block one stored in file 35 at offset 12. So you have X number of blocks stored in different files at different offsets. So the real interesting part is where, since there's a metadata map, it knows where all the blocks are and is able to actually send requests in parallel to obtain those blocks. So you can think of these sort of as coordinates. So these are coordinates of the file that you want to read, the 24 kilobyte file that you want to read. And so then the list is actually split out because there's different coordinates. So you can send requests to, let's say, the three different files all at the same time, the log files in the underlying storage layer. And since all those files are sent, all the requests to read those individual blocks are sent out in parallel, it actually achieves much higher performance than trying to read sequentially off of a disk. So one thing else about PD is that all data is encrypted on the wire and also at rest by default, so you don't have to enable that. Read data is decrypted and its integrity is verified. And there's another slide on how the data uses CRCs for verification. So copied into the appropriate location in the buffer for the user data. And so, for example, if data integrity verification fails, copies of that same log file are read. So if you have X copies and one of them is bad, there's still Y remaining. Any questions about the reads? Okay. So on the data integrity and protection, which is probably the most important thing is you want your data to be non-corrupt in good state. So this is something, of course, you want to avoid is your database crashing. So although there are design considerations within Postgres to avoid corruption, things do not always go as expected. So as I mentioned, CRCs, the cyclic redundancy check is basically a short check sum of the block while it's written. And this was actually... So PD volumes are highly virtualized. Any given logical volume is split into many small pieces which are stored across hundreds or thousands of physical devices. In the process, we add redundancies that volumes can survive various failure scenarios. We also encrypt this storage on the wire net rest for increased security. Lastly, we do check sums to make sure that when you read data, you can be certain that the data you get back is what was originally wrote. On this way, PD is designed to avoid silent corruptions. So snapshots are a critical feature for PD volumes. Snapshot is a copy of a PD volume stored in Google Cloud Storage or GCS that can be later restored to constitute a new volume identical to the original at the time of snapshots. There are two key features of snapshots. First is snapshots are efficiently made and stored, minimizing cost and time to snap. If you've ever tried to back up a 5 terabyte disk, it's not always the fastest thing. But with snapshots, it avoids those limitations. Our differential snapshot feature only saves the differences between the previous snapshot and the current one. The second key feature is inherited from Google Cloud Storage, which is globally accessible. So because Google Cloud Storage provides a namespace that is accessible from all worldwide zones, as soon as a snapshot is taken, it's available in any zone for restore, which means you can snapshot your instance in US Central and then access it in US Europe, I mean sorry, Europe West. So there's no need to replicate the snapshot from one zone or another. We also encourage snapshots to be taken regularly so that if there is an issue that needs to be restored, it's very easy to do that. For example, if a user accidentally drops a table, it needs to be able to recover from that. Something interesting with snapshots is that you can move them from one VM to another. So for example, you have a production instance and you want the team tested in staging or to run different integration tests on it, you can quickly move a snapshot into a different environment for a different VM. Also with PD, there is volume resizing, so you can do that online and there's also AutoGrow. Today all snapshots are global. As I mentioned, they can be accessed from different regions without having to actually wait for the data to transfer over. And that's because of PD and how it's replicated globally. So this just shows how the delta snapshots work, basic link lists allowing a delete in the middle. So as I mentioned, one snapshot is based on another one. So if you're familiar with Docker, PD does something similar where it allows you, with the AUFS volume, it allows you to layer on the changes. So now we're going to go over some performance metrics that can be expected from PD and also we're constantly improving PD so that these metrics may change. So check the docs for even faster performance. So with PD SSD, we can achieve, so actually this one's already out of date, it's at 25,000 IOPS now, but this is 15. And the read throughput is 240 megabytes per second, which is, I think this number is also lower, so we should probably update this slide. For PD standard, which is going to be at a lower price point than SSD, there's also significant performance. And the way that it's able to achieve the performance is by the size of the disk and the increased nodes and the underlying storage system. And the underlying storage system is Colossus. In an environment with a shared local disk, there's a known practice to spin up many VMs or to spin up very large VMs to try to avoid the noisy neighbor issue. So I'm not sure how much cloud experience everyone has, but has anyone gone through that where they have some VMs that are performing at different rates than others? As mentioned previously, a new device may be forked from a snapshot without having to try and test to find consistency. I would say the noisy neighbor aspect is to have consistency over your fleet of database clusters or machines offers a lot of reliability, which can always be found when you're running in an environment that uses shared local disk. So some virtualization has lazy allocation for local volumes that causes fragmentation and is unavoidable. PD avoids that issue with many layers of virtualizations. Also, there's auto extending volumes that running out of disk space is not a likely occurrence. There's a 10 terabyte cloud SQL volume limit right now. So one thing that's a great aspect of PD is that it supports live migration. So I'm sure you've all received an email at 2 o'clock in the morning saying that one of your instances has died or been terminated. With live migration, it avoids that. So that's this slide here. So some unexpected wins for having live migrations is that if there are flapping network cards, the VM migrates. If there's inconsistent power supply issue, the VM also migrates. So here's a visualization of how live migration works, which is actually very impressive because it's able to keep the memory state as it migrates, which is a big challenge. So how does this make sense for cloud SQL? Live migration means avoiding downtime from physical machine failures, making them transparent to the customer. Auto extending volumes means that space can be purchased when needed and does not require close monitoring or instant size upgrades. Some cloud providers have instant sizes with a fixed amount of storage. However, with PD, you're able to grow that storage as needed or keep it smaller when not needed. Performance also increases with this size. Data integrity comes from PD's extensive testing and use within Google. In snapshots, that can be easily mounted in any region. So I guess I'll take some time for questions. The brain and how long does it take to figure out a failure and how many seconds until the service is back up and running? Okay, so the question was about our HA implementation and see how replication is done, an instance is detected that it's down or fencing, and then also the time to failover. So in terms of fencing, there are agents that live on the VM. So you have not only the database process, but there's also agents for monitoring, logging, and also to restart that instance in case it fails in the case of not having HA. So that's how the fencing is done and there's a metadata storage layer and also the control plane. So let me actually go back to the slide here. You can look at it. So you can see there's the database process, also the agents, and then those communicate out to the control plane where you have the metadata server and that is where the fencing happens. So in terms of the time to failover, in some aspects it depends on what type of failure it is, if it's a soft failure or a hard failure, but then also with live migrations, it mitigates a lot of those types of failures, preventing the requirement to failover to begin with. Any other questions? Right, so Clauses does the log-structured volume, which is the append-only storage basically, and it's a distributed system globally. And it's a system that Google has been using for around since 2010 in order to store data in all their services. And so then PD is virtualized on top of Clauses, which allows a block device or an EXT-4 volume to be able to access their data virtually through requests that go to Clauses. And so then Clauses is basically a distributed system which is able to handle failures in zones or regions. It is a distributed file system, but it's more than just a file system because it has a lot of checks for consistency and also for optimizations such as parallel read requests for a single file. So the question was whether or not backups are compatible with Postgres. So the way the backups are done is that they're snapshots. So snapshots the entire volume. Well, so there's another question about access to the data directory. And so to answer your question, though, is if you wanted to have your data in a different location from maybe your production machine to somewhere else, you would just take a snapshot and then mount it on that other VM. So you wouldn't have to wait for backup to be taken and then transferring it around. If you did want to pull your data off of Cloud SQL, there are export commands, so you could export it that way. And then about direct access to the data directory, no, because it is a managed service. Kubernetes? Oh, Gennady. Okay, so I guess the question was about Google's Gennady. You know, I'm not actually familiar with that, so we can probably sync up afterwards and you might be able to answer your question. Well, there is a Cloud SQL proxy, which is a Gulling-based application where you don't actually have to whitelist that IP. And so that's an application that's running on an app server and you can connect in through a socket to the Cloud SQL proxy, which is open sourced in on GitHub if you want to take a look at it. So that may be one alternative than whitelisting. Any other questions? All right, well, I guess that's it. Thanks for coming to the talk.