 Hi everyone, I'm Prashant and I've been working on a project that integrates OpenStack Swift with Leicera Fest. So OpenStack Swift is a suite of tools that is used in cloud deployments and Swift is one of the components of OpenStack. So let's get into what OpenStack Swift is and what it can offer to object store users. So when I say an object, what comprises what is called as an object. So object has data which could be let's say text file or binary data or anything. It could have metadata associated with it. For example the owner of that object or the mind type let's say audio, video or the content length. So that is what is called as metadata and identifier is the name of the object. So that is how you access and store objects in the Swift cluster. So how many of you have heard of Amazon S3 or use it? For those of you who did not know this, Amazon S3 is used by Dropbox, Pinterest, Tumblr and not many other services. So even though you have not used it directly, you might have indirectly used Amazon S3. And it's an alternative to Amazon S3 and OpenStack Swift is used by Wikipedia. So if you go to Wikipedia and see some images, those images, the thumbnails of those images are stored in OpenStack Swift. And also there are companies such as Rackspace and Swift Stack that provides services similar to Amazon S3 based on OpenStack Swift. So Swift is already in production. And unlike other OpenStack components such as NOAA, CENTR, GLANs and other stuff, OpenStack Swift is fairly independent from the rest of the components. In other words you can deploy OpenStack Swift as a standard product without having nothing to do with other OpenStack components. And it's suited to store structured data, for example images, video files or even books or something like that. And it scales horizontally which means if you need more storage just add more nodes or add more devices. But Swift is for a very specific use case which is an object store. So you cannot have a file system hierarchies in Swift. In other words you cannot mount it and edit objects as files. So that is a plain Swift. That is a workaround for that. And what you need to remember about Swift is it has this predefined hierarchy. So a cluster can have many accounts and each account can have many containers and containers have objects. So in Amazon S3 terms these containers are buckets in Amazon S3 terminal. So let's see what Swift has to offer. So it's truly distributed in nature. There is no central server, master, slaves or anything. So all nodes are treated equally and it can scale to petabytes and there are deployments. For example there is a cancer research institute that has a deployment of about six petabytes. And it's highly available and it has a very modular structure such as a cluster interface, translator interface like Lala mentioned. So if you want to add a feature to Swift, you just add a middleware. It's also a desk filter. You can extend Swift by doing that. So this S3 API support is implemented in that way. So if you have existing applications that access Amazon S3 they can be ported to access Swift with very minimal or absolutely no Portuguese. And also you can set account quotas based on number of objects and also the size in bytes. So you can say that this account is allowed to store one million objects or objects that sum up to one terabyte. And also you have authentication filters. So having a modular structure allows you to have different kinds of filters. So based on your use case you can have filters that store username, password and account information in plain text files. You have filters that store it in databases. Also you have filters that allow you to have active directory or LDAP integration so that in an organization if you have existing users they can access Swift using their existing function credentials. And also there is a feature called as container to container sync. So you can let's say you have one data center here in Bangalore and another in a different location. You can keep containers or sync for you know what sync means. And there is object versioning. So if you upload more versions of an object it will fetch the latest version. But there are APIs available to fetch the previous versions also. Also you can set an expiry date on an object. So you can say that when you upload an object you can say that delete this object after one hour or one week or something like that. And also you have support for global clusters. So this is analogous to the geo-replication feature in BusterAppness. So you can replicate all stuff across data centers. So they have a provision to use a separate replication network for replication. And also you get to use region affinity. What that means is let's say you have a global cluster one in Bangalore another in Paris. So clients that are closer to Bangalore their rights or needs would be served from the closest location. And also the last feature which is the most recent and the most important one is storage policies. What that allows you to do is it gives flexibility in terms of where your data gets stored. For example, until now before this feature was implemented all your data would be stored with a rigid replica account. For example, for an entire cluster the replica account would be three then all the objects would be replicated thrice. But with storage policies you can set one storage policy per container. You can say that all the objects that go into this container would be stored in three copies. And let's say all the objects that go to another container would be stored in two or one copy. So maybe one use case that I can think of is your thumbnail images. So if you have, let's say Facebook uses open-site script and it uses thumbnails for profile pictures. So you can store thumbnails in containers with replica one and actual images in containers with replica three. So if you lose that thumbnails they can be regenerated. So that is the kind of flexibility that storage policies allow it to. And also you can store objects based on different packets. For example, you can store objects in regular XFS or the client can choose to store objects in cluster reference. So the power is given to client with this storage policy feature. Any questions so far? Is there any problem? I'm not answering. Yes. So there is no caching for objects as such that is not provided by surf. But that caching has to be provided by the storage, the storage layer. But metadata caching is provided in-built. So this is how clients would access a script cluster. So you have storage nodes which are not accessible to the outside world. So they are behind the land. And you have proxy nodes which are publicly available. So clients can talk to proxy nodes. So proxy nodes, if you see it in one sense, it can be a single point of failure. So if a proxy node goes down, there's no way for the client to access the data that resides in the storage cluster. So usually in production environments, there are multiple instances of proxy nodes for load balancing and also for data balancing. So Swift mainly has four components. That is account server, container server, object server, and proxy server. So proxy server is the one that talks to clients and replies back to clients. And account server maintains a database which contains information about accounts. Container server maintains a database which contains information about containers. And object server stores objects as files on a lane except as partition. So storage nodes usually have object, container, and account server running. And proxy nodes have proxy server running. So this is another way to look at it. So when a client requests proxy server to either fetch an object or store an object, the proxy server would look into this static data structure called as ring and come back to it later. And the ring would fetch information appropriately from account, object, or container server. For example, if the client wants to know what objects reside within a container, it does get on the container and the container server would read the container database and return the listing back to proxy server which would return to our client. And about the modular structure that I discussed earlier. So each of these servers would have a pipeline of WSJI middleware. So the request would go from client to the server in this order. So this is what provides modularity. For example, this step one module provides authentication. Account Quotas module provides Quotas based on accounts. So let's say you get a use case where you want to turn your entire cluster to read only, maybe during an update process. So you could write your own middleware, let's say in less than 100 lines of code and put it somewhere here and you can turn your entire cluster as read only. So that is the kind of flexibility that the modular structure provides in questions. So let's look into how the Swift API looks like. So all communication with Swift by clients are over a stringy space. So you can implement that URL, that structure there, accounts slash containers slash object. So for example, if you want to fetch an object from Swift, this is how your request would look like. So get is a fetch. Similarly, you know other HTTP works in post daily. So if you have authentication mechanism turned on, so initially the client will use a username and password to get a token and all subsequent requests to either fetch objects or put objects would have that token. So that token would be a header. So this would be proxy server if I have one instance, if I have multiple instances behind a load balancer, that would be the IP for load balancer. This is the account. V1 is the API number container and that entire thing is an object. So one thing to remember that the slashes after the container that A, B, C, B, Z, that whole thing is one single entity and the slashes do not mean anything. Any questions? Now let's move on to really important data structure in Swift that decides how data is placed and how many copies of data is placed. So it's called the ring. So the ring is a complex data structure which decides how many copies of data and where the data would go to. So just like BlasterFS, ring uses MD-Pfism for hashing. So the entire MD-Pfism hash space is divided into partitions. So these are logical partitions and these have nothing to do with the partitions on the disk. And each and some number of partitions are assigned to a particular device which is a disk. So then the URL which is account slash container slash object is hashed by MD-Pfism. The first few bits of the hash is taken and that is found on this ring structure. So it would point you to a partition and you have a table which maps partition to devices. So based on that mapping, you get to store, you get to know that where this object would go to. So these ring data structures are managed externally and at a very important part of Swift. So external tools are used to create these ring data structures and these ring files are pushed on to every node. So the intelligence of data placement resides in these ring files. So when you add more nodes, obviously the structure is too big and pushed to all the other parts. So in BlasterFS, the hashing is very rigid. So when a new device is added, in BlasterFS, compared to Swift, more amount of data is moved. Because we divide the hash base here into partitions and assign a certain number of partitions to this. So the amount of data that is moved during a rebalance operation is less compared to BlasterFS. So we need to have these ring files on each of the Swift nodes and the Swift nodes would read these ring files to locate the actual physical location of the object. So whenever you add a disk or a node to the Swift Blaster, you need to regenerate these files and push these files to every node. So there are management demons that allow you to do this. So yeah. And when you generate these ring files, what are the tunables that you get to change? So when you add devices, you get this option called as device weight. So let's say you have a 2 terabyte hard drive and 10 terabyte hard drive, such thing exists. So you can assign a greater weight to the 10 terabyte hard drive. So what that will do is it will assign more number of logical partitions to that device. So what that means is in the ring data structure, more partitions are assigned to the bigger device. So it's more likely that more objects would go to the bigger device. And also you can place devices and nodes into logical zones. And Swift has this algorithm that would place objects as uniquely as possible. Let's say you have 5 hard drives and 3 zones. And you assign each hard drive to a zone and one would have 2 drives. And you set the replica contest 3. And Swift would place those objects in 3 different zones. So how you decide which drive would go to which zone is totally up to you based on how likely or on what basis your cluster would likely go down. For example, let's say you have a data center with 3 buildings. So the tendency is to divide mark one zone as a building. Or if you have one data center, you can have one rack as a zone based on how likely your network would go down or the power would go down. And this is how the MD5 zone is carried out. So you have this path account slash container slash object. So that is hashed. You have a per cluster prefix and suffix that is added to prevent MD5 some collision attacks. That is to prevent client from guessing what the hash would be. So that is to remain secret. You can configure that. So that MD5 sub index is mapped on to the ring data structure. And this table maps on to the partition number. So let's say a gate on an object is sent by the client. So the server, the proxy server would look into the ring data structure and from this table it would get where these copies reside actually. And based on the number of partition number, you know which node and which device the actual file resides on. Any other questions? So this is a script that generates the initial ring files. So each object server would have its own ring files, container server would have its own ring files and account server would have its own ring files. So you can see that R1 is region one. Z1 is zone one. And that is the IP of the node and on the port on which one instance of object server is running. And the 10 there is the number of partitions. The three is the replica count. And the one is the node status. And the one here is the drive weight. So let's say the SDV4. SDV4 there is a larger drive compared to the other three drives. Can assign a larger weight so that more partitions are assigned to that device. And yeah, so here the partition number is three, but there are four devices. So the shift can choose any one among the four device. And so let's say when a write is coming from a client and one of the devices go down, they could choose the other device as a handoff node to complete that write until the other device comes up. So this is how ring files are generated. Once these commands are done, you will get those files and you need to push that to each node. And here this part is where you get to define storage policies. For example, this is the second ring file for object server. And this is a different instance running in a different port. So this version of the object server has a code that can talk to cluster access. So this is how you generate your ring files. Any questions here? Sorry? Yeah, I'm here in that surface. Yeah. I was using... Yeah, no, no, no, no. Self? So this device is under this one. To file system? Yeah, it's under this one. Yeah. So by default, OpenStack Swift talks to XFS. So with storage policy feature and some code. So OpenStack Swift has this pluggable disk file class. There's this class file disk file. So if you have your own storage system, such as LesterFS or CessFS. So you can overwrite that class to provide your own implementation. So for example, let's say there is a write API. So you can overwrite write API to provide your own implementation. Let's say you have your own file system. So Swift allows you to plug any disk file API implementation as a storage policy there. So here, I'll leave a demo later. Here the number is one. So this is a different ring file. So the one would have a friendly name. So when a client creates a container, you can say that use this storage policy. And all the objects put into that storage policy would use that this ring files to decide where the data would go. So this is a very recent feature. It's not merged into master yet. It should be merged properly this week. Any other questions? So here, we'll show you in one of the other talks, the open stack summary. There was a comparison of Swift and Gluster. This data could be somewhat outdated because the lines of portals reached some 9000. If you see, Swift is a relatively younger project and it's based on Python. So some statistics. But one thing that you need to keep in mind is Swift has a very specific use case of object storage. It does not provide a file system interface on its own. So I have a Swift cluster setup, which is a four node cluster. And I have four instances of account server running, four instances of container server, one proxy server. And this Swift instance of object server is the one which talks to our Gluster interface. So let's create an account. So as I said, account is a database. First one. So I have proxy server running on port 8080. Account here. So if you can see here, there are three databases created. These are SQLite databases. And all those three database reside on different nodes. Here I have emulated different nodes in a single machine. So you can see one, two, and four. There are four nodes. You can see one, two, and four. So Swift has placed those accounts into different zones. So this is how Swift places data. So I'll create a container within that. If you remember the syntax there, it's account slash container slash object. So I've created a container now. So now we have three copies of container data. And I'll create an object called over. And put some data into container. It's not an object. Okay. And now you can see three copies of the object. So these are based on normal except as partitions on different nodes. So if you can open these files over a file system interface, you would see this data of hello. But you won't get a human friendly notation of this path. Sorry? Yes. The content, for example, here is a hello. That would be stored in these files. So it may not be just data. You can just upload a file. Sorry? Yes. Yes. Sorry? Based on the number of drives and nodes that you have. So we have these views like this for all the other sort of things. So the point here is no one accesses these files over a file system interface. Yes. So everyone accesses it over Swift only. Any operation of this key? Yes. Yes. The caching of whatever is given is based on the file system interface. Yes. There is metadata caching. For example, a container listing that would list objects. Those are cached, but the files as such are not cached. The last thing is code line counts. Code line counts. That is not performance. Performance itself is important for the computation. So, like I said, Swift can only talk over rest. And Ceph has a rest interface that supports Swift API. But their layer is integrated with their object store. So their object store is not analogous to Swift's object store. But we have not done any performance comparison with Cephian. Yeah. Those databases are SQLite databases. Sorry? Yes, sir. So we have account servers and container servers. The account servers contain a list of containers in there. Those are stored in those databases. Okay. Yes. Yes. Only metadata are stored in data. And the metadata of actual objects. Those are stored in X actors on file system. So the metadata of the actual object, for example. So the actual object here, the actual object here, that file, that would have some metadata which is stored in the X actors. Extended attributes of this file. Is that the file? Yeah. The content length and its container it belongs to. So now let's see how it integrates with the GlusterFS. So I have defined an additional storage policy here. Point as GlusterFS with policy number one. You remember that one from the ring files earlier? So this is a different policy. And I have GlusterFS volume mounted over here at this path. Now I create a container called C2. So this is all, the client is doing all this. And draw it a header that would say use this storage policy. So another container is created. Put an object to that container. So remember that C2 container is marked with GlusterFS storage policy. And you can see this hierarchy created here. So this is the naming convention that GlusterFS's object server implementation would use. So the account would be a directory. The container would be a directory within it. And O1 would be the file. So the advantage here is apart from the script's API resting, you get to access those objects as profiles over fuses or file systems. So another thing that storage policies offer is let's say you have a third policy which says SSD. And you can build a simple middle layer. Maybe under 50 lengths of code that would place objects on SSD based on the content layer. So that is the kind of flexibility that storage policies have. So from a product perspective, you can have different policies. Let's say named as gold, silver or bronze. And let's say gold would provide four copies to on SSD, something like that. And bronze would provide two copies, one stored on eraser coded image. So that is how you can leverage storage policies. Any other questions? I'll just show you the pipeline once more. This is a bump file of a proxy server. We have the pipeline here. So I can add an authentication module somewhere in between and that would take care of authentication. You can write your own custom middle layer there that would turn the entire cluster into read-only or maybe certain accounts into read-only. So each of these filters can have a configuration option. For example, here the tempoth filter can add users here and define roles. This is an admin or normal user or things like that. So SIFT has a very pluggable architecture to the next step. Any other questions? Some people get on the store and it says, I need some solution like this. The solution again is because each of them has one type of file. This one can be very useful. So in Wikipedia, some people, I don't know the examples of how they have blue strip, but they survey each image on Wikipedia at a file point. So for example, Facebook has this object storage called as ASTAC. It's not open source yet. So what they do is they have a huge, except this partition of 100 terabytes and they have one single file of 100 GB. And all the images that are shown on Facebook that would go to ASTAC. So that 100 GB file is FN-only storage. So as the image comes in, they append it. So they have their own implementation based on their use case because they couldn't scale. They did this. And also each file would have its own metadata such as UID, JID. And that was of no relevance to them. So they had their own block store. So it depends on the use case. If you do that, it's more than just for companies. Now you can ask me if there is something for it. See, what Swift is built for is the client just does a put and a get. And the client doesn't care where it is stored, how is it stored. So from... Is it going to be a supplier for clients? No, the API is simple enough. So this is all what the API is. So you just put images or upload images here. And this logical separation, account slash container is purely logical. There is no one-on-one mapping unless you use your surface. There is no one-on-one mapping from this path to the actual five path. So as a user of Amazon S3 or Swift, you don't really care where it is stored on disk or how it is stored. You just get this API. That's all. So when you make a connection, it is not as good as... It is what... I mean, I don't know whether that is... There are many devices. We have got a few computers. We have got a few computers. We have got a few computers. We have got a few computers. We have got a few computers. We have got a few computers. So here the object server, it has no database. The object server has no database where it has to look up the location of the actual object. So this is the path here that is derived from the hash of the object name, account slash container slash object. So it just has to get that hash, compute this path. It has no lookup or no reader or anything. Just compute this path and fetch. This is one operation. So what is lost is the small amount of additional metadata. For example, UID, JID or something like that. But except for that, it's very efficient because it's just one operation to fetch. So here in the management of the hash, this graph is connected to people in time. Hash has a problem of moving forward. UID also has a problem. I could give the hash to a group of people in time. That means we are taking the same process in time. So here in the management of UID, there is something to do about it. Now it's not done on the... There is no law as such in the hashing. But PlasterFS also uses hashing. But things there need additional care on file re-names. So in PlasterFS, when you rename a file, so the hash would change. So there you would want to link to the actual hash list there. But here the re-name operation is not allowed. So there is no re-naming. So that could happen in one scenario where this is purely imaginative. Where all your ring files in all nodes get done. So at least one ring file would get correct on one node. You would have the ring file copies in other nodes. So as soon as you copy the proper ring files to this node, it would correct itself. So the ring file has all the implications to place data. And how you build ring files that freedom is imaginative. Any other questions? Thank you.