 So, I'm going to talk about exporting, I said, for jigsawish data to the outside world. So, it's about the sync modules which we introduce in Rado's KTV. So, this is a brief talk, it's only for 15 minutes, so it's just going to be talking about RGW and then briefly about multi-site, and then what sync modules are and then the basic sync modules that are implemented right now. Okay, I assume everybody here already knows what CIF is because this is the storage step room and it's pretty much the end of the day. So, I would assume people have attended enough talks about CIF, and I guess everyone knows what RGW already is. So, it is an object service client to CIF cluster and it exposes a RESTful Swift industry API. It basically translates your HTTP REST calls into LibRadoes calls. Then, you have S3 and Swift API being exposed and then objects are not exactly one, one is to one map to Radoes objects. You would have over the normal Radoes objects, you have implementation of users, ACLs and buckets and stuff like this. Then, there's already a heavy ecosystem of Swift and S3 client tooling that can be leveraged already. So, you don't need to develop a third API and these APIs are good enough. So, these are the APIs that we are exposing. Here, there are a lot of S3 like features supported. So, you have multi-part uploads, you have object versioning, torrents, then you can lifecycle objects. You have support for encryption, compression, static websites. From luminous, you have support for metadata search with elastic search. So, from dual release onwards, we supported this concept of multi-site, which is this geographic redundancy across the F cluster for object storage. And that is the basis for all of the sync modules and exporting data to the outside world. So, basically multi-site allows for geographical redundancy with async data application. So, it's not exactly async completely, the data is always asynchronously synced. I'll come to that. So, you have the concept of something called zones, zone groups and realms. Zone would be any RGW or a group of pools. Usually, a safe cluster would be a zone. And then you always sync data within a zone group and which is a collection of zones. And there's always a zone which would be elected as a master zone, which would be the source of storage for metadata. And metadata, that way, is always synchronous to the master. So, it always goes through master. So, that is the point which I mentioned when I said async. The metadata is always synchronous to the master, but it's always asynchronously transmitted to the secondary RGWs and data's always asynchronously transmitted to the other RGWs. So, the basic premise we build this on top of is the fact that data changes are always very frequent in a cluster whereas metadata changes are not so much frequent. And metadata changes tend to have a cluster wide effect. Like when you create a user or a container or something like this, it's usually something that would have to be relied across to every other safe cluster over there. So, that's the premise. Whereas object changes are something that you would have local to a safe cluster and asynchronously transmitted. So, yeah, data's CP in a local cluster, AP in a remote cluster, if you talk about cap consistencies. So, under the hood, what is basically implemented is you have, I mean, three kinds of logs, basically. So, you have logs for data, metadata and admin logs. And then, every time there's a metadata change, there's a notification sent to the remote zone which then goes and gets the objects or, you know, synchronizes the metadata log entry. So, we basically have an ability to notify the remote zone when there is a data change. And this is the concept that we built the sync modules on top of. So, sync modules themselves, they're built on top of the multi-site framework. Since we have the ability to actually, you know, synchronously notify that there are metadata changes and data changes to the cluster, we can leverage that and then a remote site can actually not, instead of writing to the safe cluster itself, just rely this data across to something like elastic search or even rely it across to a different cloud or, you know, make notifications or something like that. So, the default RGW multi-site itself is defined as a sync module, which is the default sync module which you run when you actually create a RGW. And currently, all sync modules are written in C++ and requires entry builds. And usually, you almost always create a sync modules on its own zone, which allows you to separate, you know, SF cluster for just processing metadata or something like this and, you know, design your applications like that. And if you want to get started writing a sync module, you should, there is a sync module entry called the log sync module, which does nothing other than just log the actions and that is a great starting point if you want to start writing one on your own. And that is the basic idea behind the sync modules and now I'll be just going across to, I mean the various sync modules we have in RGW right now. So, any questions so far? Apparently not so. The first sync module that is entry is elastic search, which basically sends metadata of objects to elastic search. So, we already get notification when an object is created. So, this object metadata itself is sent to elastic search. We also expose a end user API to the RGW so that the end user itself can make elastic search queries and that is actually forwarded to elastic search and you get a response back, which is again forwarded to the user. And as a cluster administrator, you can actually have more interesting queries on the cluster itself, like finding out, you know, how many uploads happen every hour or like, you know, what sort of, I mean, users does what sort of files and stuff like this, which is not usually possible if you want to just use the admin API. Yeah, it's a basic diagram of how it looks like. You have a secondary RGW zone, which would be the elastic search zone. And this is a standard diagram for any sync module. You would always have a secondary zone that would be the zone for the sync module and it'll just forward it to whatever sync module that is in this case, it's elastic search. Cloud sync module, it basically allows you for a multi-cloud redundancy of sort. So you can actually back up your S3, I mean, your RGW data to a different cloud which supports an S3 like API like Amazon S3 itself or any other cloud that can expose a S3 API. And it, so technically you can even expose, I mean, send data across to another RGW without having to wire it up to a multi-site sort of a scenario. And you have a configurable mapping how you map a user and a bucket in this cluster to the remote S3 or the cloud cluster. And this has, I mean, use cases when you have compliance use cases or something like that where you want to have data, I mean, backed up to a different provider or something like this. This is a new feature in Nautilus which is an archive sync module. So RGW already supports this feature called object versioning which is an S3 and which basically allows for every object to have different versions. So every, I mean, every different object that, I mean, every object right you create will not actually delete the object but create different versions so you can roll back and journal between different versions. So the archive sync module actually allows, for example, if you have like three or four different sites and every site has its own concept of different versioned objects you could just do away with versioning in the local site itself and just have one site which is just archiving everything. So there is no delete in the archive site itself, it just archives every version of the object that comes to it. So you can actually avoid versioning at a local cluster and just do versioning at a remote cluster. So that's one of the use cases with archive and then you would always have use cases where you want archival of that sort. And this is also another new sync module that's introduced in Nautilus. So you have this PubSub sync module where you can basically subscribe to notifications, notification events on a bucket. So right now the basic notification we support are like object create and object delete and of course if you're dealing with versioned objects it's the delete marker. And you can actually forward it to a different HTTP and the AQMP server. This is actually in progress, the PRS in progress so it should be making it in time for at least upon point release in Nautilus. So that is the PubSub sync module and for more details there was a talk in KubeCon in Seattle and that should actually cover this sync module. And that's mostly my talk. These are the links for the upstream Chef community and then questions. Up idea, so is it possible to use the archive module with some more extensions like a tier so that you can have archive data that are called but when it's needed on the edge so that it gets proxied back to the archive? So the question was with archive module do you support something like tiering so that you can actually call back, separate tier and archive a separate tier or something like that. At least in the initial implementation so tiering support is just coming in Nautilus. Tiering plus archive it could be possible even in its native way but it still has to be tested out. So the way tiering works itself is that you actually would, I mean life cycle in object to a different tier and then what you would have to do is like configure archiving in a way that this particular pool which is actually getting tiered is the one that is getting archived and that way you can sort of achieve this in its first implementation but you might still have to figure out that there might be references around there. Yeah, does that answer your question? Any other questions? All right, thanks.