 OK. So we are going to present this taxing Dropbox-like personal cloud for OpenStack Swift. So first of all, to present ourselves, who are these crazy guys creating their own Dropbox? So OK, I'm Pedro Garcia, I'm a professor in Spanish University, Universidad Rabiré-Brigelie. And well, I'm also leading a research group on distributed system. And I am the coordinator of European Research Project that is Cloud Spaces. The title is Open Service Platform for the Next Generation of Personal Clouds. It's a three-year project. We are now just in the middle with six partners. We have three industrials, three academics. We have EPFL in Switzerland. We have Iurecom in France and Universidad Rabiré-Brigelie in Spain. And then we have three industrial partners. We have TISA, that is an infrastructure provider in Spain. We have iOS, that is our solution provider in Spain. And we have Neck, that needs no presentation, is a global partner. So the two major challenges of this project is to create an open personal cloud, overcoming two main limitations, that is privacy and interoperability, that I will explain later on. So taxing this Dropbox for OpenStack has been created in the context of this research project. First of all, let's define what is a personal cloud, because it's not a common term. So we have a definition with three main services, that each personal cloud should accomplish, that are storage, synchronization, and sharing. So for us, a personal cloud should provide reliable, redundant, scalable storage, that it's cloud storage. It should offer file synchronization from heterogeneous devices, like desktop clients or mobile clients. And last, it should provide sharing capabilities, so enabling users to share information with other users. The most well-known personal cloud that everybody of you know is Dropbox. So they are using cloud storage in Amazon S3, and they are offering synchronization from different devices and sharing capabilities. So our main motivations in this project are three. So we would like to create, first of all, an open source personal cloud. And we consider that in the market, there are a lot of proprietary solutions, but we don't find a good solution for the cloud. So we're taxing, it's, of course, using OpenStack, that is the open source cloud, and OpenStack Suite, that is object storage. Another important problem that we find in existing systems, in existing personal clouds, is that users lack control of their information. And we really want that users retake control of their data. And then you need an open source project to control your information. And then we are also going to provide some interesting privacy features, like client-side encryption, or privacy-aware data sharing mechanisms in our infrastructure. And last, the other problem is vendor locking. So most systems are closed. You can share with users in your system, but not with others. In our project, we will bet on open APIs and interoperability so you can really change information and share information with other providers or other installations of the taxing. So I will let now my core developers explain you the architecture of the system and the performance of the taxing. So Christian, please. Thank you, Pedro. OK, hi, everyone. My name is Chris Sancotes, as Pedro said, and I am one of the taxing developers. I'll start explaining you the big picture of our architecture. But to make it clear, I will explain it in three different stages. So in the first one, we created this architecture that, like Dropbox, we separate with a couple data flow from metadata flow. As you can see here in the picture, there are the three main blocks of the architecture, which are the desktop client, the taxing server, and OpenStackSwift. And as you can see, the desktop client communicates directly with OpenStackSwift in order to upload data and directly with the taxing server in order to upload metadata of the files. In a second stage, we put a rabbit in queue, which is a well-known message object middleware to allow push notification to the clients in order to stay in sync. And we also develop an elastic sync protocol that we will explain in more detail later. So in the current stage, we created, we develop an API as a Swift proxy module. So unlike the desktop client, we've got our desktop client that has to communicate with OpenStackSwift directly and with RabbitMQ in order to upload metadata. We've got the mobile clients that communicate directly with the taxing API in order to interact with the architecture. OK, now that you've got the big picture more or less clear in your mind, I will explain what happened when a file is going to be synchronized. So I'll start with the lifecycle and I, and later Adrian and me will explain in more detail each block of the architecture. First of all, OK, let's suppose that a user creates a new file in our synchronized folder. So the client will receive a notification from the operating system that, hey, you've got a new file here. You've got to do something. The first thing that we do is to obtain metadata from this file, for example, filename, file size. So the second step is to upload the data to OpenStackSwift directly. Once that this is done, we communicate with the taxing server in order to give them the metadata of the files. And if everything is correct, the metadata is correct, the version of the file is correct, we have to communicate to other clients that a new file has been created by another user. So we just write it in Q to send to the other clients, the client 2 and client 3, the metadata. And when they process the metadata, we'll download the file from OpenStackSwift. So this is the lifecycle that we've got when a new file is created or not. Now I would like to explain what's happened with the desktop client. So the desktop client, when receive a notification that there's a new file, we have to do some process here. So the main task of the desktop client is, first of all, watch the file system. We have to be always watching the file system to detect user actions, for example, create, modify, or remove a file. And the second one, if the user creates a file, we have to do the process that I told you before. This process is first split the file into small chunks, compress them, and optionally include them or not. Once that this process is done, we upload the chunks to the OpenStackSwift, all the chunks. Some of the features that the desktop client has is file versioning, file sharing between users, and OS integration. As OS integration, we understand overlay icons and notification. As you can see in this picture, we've got a synchronized folder, and the green dots are explaining us that, OK, everything is synchronized, everything is OK. And this bubble of here is explaining us that some file has been updated. After that, Adrian, the other taxing developer, will explain you in more detail what's happening with the taxing server and the rest of the architecture. Thank you. Hi, everybody. My name is Adrian. I am the other developer of taxing. And I'm going to explain a little bit about the taxing server and the rest of the components of our architecture. What we do with the taxing server is to create a virtual file system on top of the data objects on OpenStack Swift. So the taxing server offers many operations to the clients. And here, we will display three of them that we consider to be the most important. The first one is the get account. And the clients call this action in order to get information about the user. And this information includes information, for example, about Swift. And this can be, for example, the username on Swift, the storage URL, and the information about Swift in order for them to communicate directly to Swift, to upload and download data objects from Swift. The second one is the get changes. We use this action in order for clients to keep synchronized. So when clients call this action, they get the remote repository, the current state of the repository. And they retrieve this information, this metadata information, and check that they are consistent with their local repositories. If there is any update, what they will do next, they will communicate with OpenStack and download the chunks that they need in order to create the new file and keep the system consistent. And the third one, and the most common operation in our system is the commit. And this commit is called every time a client creates, modifies, or deletes a file. It sends the commit to the server. The server processes the commit. Everything is correct and so on. And then it makes it persistent to the database. On the right, we have an example of a metadata in the JSON representation of the metadata. We can see the typical parameters of the file. So as many clients can modify a file at the same time, the Staxing server may face some conflicting scenarios. In this case, for example, user1 and user2, they modify the same file at the same time. And then they commit the changes to the Staxing server. So what we do here is pretty simple. So the first version to be processed in the Staxing server is considered to be the winning version. So in this case, let's say, user1 was the one that was processed first. So he gets an acknowledgment. So he gets, OK, everything is correct. You are synchronized. You are OK. And then the user2 receives a conflict message saying, OK, here we have a problem here. So what user2 will do next, he will get the metadata from the Staxing server and get the new file, the file that won, the user1 file. And with his conflicted copy, he will create another file and treat it as a new file and upload it again to the server, to our server. So the workspace is a very important concept in our architecture. It represents the relation between files and users. And every time that a new user is created, a new workspace is assigned to this user. By default, every user has one workspace. And on this workspace, he will store all his files. So now let's say that user1 and user2 are to share a folder. So every folder that is shared reflects on a new workspace. So this shared workspace will hold files that are shared by user1 and user2. And as you may have noticed, work spaces have some similarity to Swift containers. So yes, work spaces are mapped into a Swift container. And they both have the same permissions, both logical in the workspace and physical on the container. We have different encryption settings for work spaces. We can store files in plain. We can set the server side or client side encryption. So let's say that we are going to store it in plain. So then the file won't get encrypted. It will travel plain, and it will be stored plain. So the other scenario, server side, let's say we upload the file using a secure channel, like HTTPS, for example. And then it gets to the cloud provider and the cloud provider with his key, he will encrypt the file. But he will be able to access to this file later on as he is the owner of the key. And on the last setting, we have the client side. In this case, clients encrypt the files before uploading them to the cloud so that only they know the key and are able to encrypt and decrypt the files. And not even the cloud provider is able to see what's going on. So we have some important concepts of architecture that reflect directly to OpenStack Swift. The first one is that every StackSync installation is reflected into a tenant, so that every user in a StackSync installation goes to one storage URL, in Swift. Next one is a user, which a user in StackSync is a user in Swift, no problem with that. The third one is a logical workspace A logical workspace is a physical container. Then we need an administrator in StackSync that will be a user with admin rights on the tenant, on the StackSync tenant. And we need this because we have to create users on that tenant. We need to delete them. We need to create containers. And we need to set permissions to these containers in order to allow the right people in. And finally, there is a file, a logical file, metadata, is reflected into small pieces, as Christian said, that are called chunks. We store these chunks as data objects in Swift. Let's say a user, user one, wants to share a folder with user two. So what he will do first, he will create a share proposal, he will send it to the StackSync server, and then this StackSync server will notify the other scene, the user two. After that, the user two accepts or denies the proposal. And this server, using the administrator, the tenant, the user with rights on this tenant, will create a container and set up the ICLs in order to let the right people in. Using the container rate and container right tax. So we use RabbitMQ as a message broker. You all know RabbitMQ. And we use it in order to allow communication between clients and servers. So RabbitMQ provides us with some benefits. And some of them are listed here. For example, it allows us to get push notifications so that clients are not constantly sending requests, pulling the server, so that every time there is a change, a message is sent directly to the client. It also allows us to balance the load, to share out the load between the different server instances that we may have in the server. And it also, we have another benefit, which is sending multicast messages so that sending only one message, we can reach all interested clients. In this picture, we can see it better. On the right, we have three clients. Each client has its own queue. And we have on the top the server queue. And clients will always send messages to that queue. And the Staxing server will consume and consume messages from that queue as our sequentially. But as our system is stateless, we can add more instances of our Staxing server so that it can adapt to the demand and it can cope with the load. And we can also see here that user three is not sharing anything. It has its own workspace, and that's it. But client one and client two are sharing a folder, and they are interested in receiving messages from Workspace One. So every message that is sent to Workspace One, it gets multicast to the queues of client one and client two. So now we have seen how desktop clients communicate, they synchronize, and so on. And now we are going to see how mobile and web clients are able to access the information on the Staxing. So what we've done is to create an API. This API lives in the proxy pipeline, alongside with keystone, logging, caching, and so on. And any other middleware that we may have on our pipeline. And it communicates with the Staxing server in order to get metadata and authorization. One thing that is very important to point out is that our API only activates when an specific header is set, which is this xStaxing API. Otherwise, it won't interfere whatsoever with a normal request to Swift. And before that API, we have another model, which is an authenticator, that it's an OAuth 1.2 implementation. And what it does is it gets the OAuth parameters, like access token, the timestamp, signature, and so on. And it communicates with the Staxing server. It gets the user information, user ID, and some other important information. And sets it to the YSG environment. So that later on, our API is able to get this user and retrieve files knowing that the user is correct. So here, we have a figure explaining what I said before. On the bottom, we have the OpenStax with the pipeline and the proxy. And on the top, we have our Staxing with the metadata database. And the white client on the left. So let's say that the client is to make a request to our API. So he wants to get a file, and a specific file. And he sets our header, Staxing API. So the request goes through the pipeline. And when it gets to our authentication plugin, communicates with the server, it gets the user, the correct user. It's authenticated. And then it goes to the API. And knowing that the user is correct, the API will get the metadata. And the API will check that the user has permissions on that file. After that, the API will get the data objects and return the file to the mobile clients. So now that we have seen the architecture, we have an overview. Christian will explain to you a little bit more about performance. And we have done some tests that will be explained by Christian. Well, now you've got a clear picture of our architecture. But where is Staxing among the rest of personal clouds? To answer this question, we do some benchmark to our platform, to our clients. And well, this is one of the first graphs that I want to show you. This is the traffic overhead that we capture from the personal clouds that we test. And executing just a trace, creating files, modifying files, and directing files. As you can see, we are the blue ones that we are transferring about 570 megabytes, which is not bad, as you can see. And surprisingly, Dropbox exhibits the highest traffic overhead. Another one is we wanted to compare push versus pull. To create this benchmark, we decided to compare Staxing with Dropbox, which is another open source personal cloud that is using a web app protocol in order to stay in sync. So as you can see, we captured the kilobytes per minute of metadata captured during the test. And as you can see here, Dropbox is getting a high bandwidth consumption, not like Staxing, sorry, not like Staxing that we are getting about 20 kilobytes per minute, approximately. The last one that I want to show you is the server elasticity. So to create this benchmark, we use a trace. And as you can see, the gray part is the number of the instances, and the red line is the number of requests that we have. And as you can see, when the number of requests go high, the number of instances increase. And when the number of requests decrease, the number of instances also decrease. So that's all. I hope that the question is answered. So now I'll leave you with Pedro to explain the last part of the presentation. OK, so I've finished now explaining how to use or when to use Staxing. They explained the architecture and the performance of the system. So first of all, I really want to stress and to outline that we are not presenting some early release or alpha release with a prototype. I mean, this is a stable release with thousands lines of code, with extensive testing, book fiction, and usage by real users. So it can be really used out of the box right now. And we are going to continue on this. So it is a stable product after two years of development. So it's stable. We didn't want to come here with a prototype. OK, so where can you use Staxing? So there are three main deployment scenarios. You have private clouds, hybrid clouds, and public clouds. So in the first one, you have your Open Swift installation. You install the Staxing server. And then you control your data, and your users will be able to access your information and synchronize from different devices and share information. A second one is a hybrid one. You prefer to have the metadata server, the Staxing server, on your own organization, but delegate the data handling on Swift to a public provider. You can do that. And the third one could be you are a public infrastructure provider. You want to offer these to your clients. You already are expert in OpenStack. Then install the Staxing server. And now you will be able to have a personal cloud that you can offer to your clients. So there are current deployments now that are being used with real clients. We have one private cluster in our university that has been tested in the last months by students and professors. This would be this example of a private cluster. Then we have Tissat is one of the partners of the project that has a tier four data center in Spain. And they are beginning to offer Staxing in the past months. They are entering clients. And they are beginning to offer these to public organizations because the security of the tool is very interesting for people with sensitive information. And finally, we have Rediris, that is the Spanish university network. They are beginning to offer Staxing as a service to the different universities and encourage them to adopt Staxing. So there is some now discussion, because some ones want to use some cloud, others Staxing. And there is a discussion there in the network. But Rediris is betting on the Staxing solution. So OK, we really are open to the community. This is an open source project that you can modify, you can use, you can do whatever you want. We will give you all the support to this thing. We already have a community of people and companies interested in this thing. We have now clients available for Windows, for Linux, for Android. We have some web clients. There is a lot of documentation available at Staxing.org. So how to deploy on top of Swift, about the architecture is really modular and extensible. Everything, if you want to modify chanking, storage, the duplication, encryption, everything is modular. And it's easy to change. And it's easy to deploy for Swift. So here I don't have to explain why you should use Object Storage and OpenStack Swift. But for many universities and places, they have to evangelize about that. Because they say, no, I install on cloud. It's a single web server. And you have the system running. And then I have to say, OK, but with Object Storage, you have Redundant Storage. And it's scalable. You can cope with demand. It's different. So I don't have to explain this to you. You know what is OpenStack Swift. And then if you already know how to install Swift, this is the harder part of the installation. Then installing Staxing is easier. So my conclusion here is, OK, Staxing is ready to use personal cloud for OpenStack Swift. You can really use it now. And we are going to continue for the next two years working on that. Apart from the people that want to contribute, what is next, we are going to incorporate more advanced privacy features. So now we are now offering client-side encryption for desktop clients, not for mobile clients. In the next months, we will offer that in mobile clients. And we will offer interesting privacy that we're sharing mechanisms to share information in a secure way in groups. Interoperability, we are going to offer APIs to share information between different Staxing installations. For example, the university network in Spain, they were very interested in different universities creating a federation and sharing information between them. And even because we have Neck on board, Neck will push these APIs so you can also communicate with proprietary protocols that implement these APIs that are open, simple, and based on OAuth. Well, finally, we are going to provide more clients. So we are already working on an iOS and Mac desktop client, so it will be available in the next months. And as I said, it is open source. It is available in Staxing.org in GitHub. It's all the code. We have open benchmarks, so this test that Christian showed, we did not invent these benchmarks. So we use ones for our researchers in Holland. And we tested them, and we have the public traces of how we validate the system. We are really open to collaboration, so please use the system and contribute to it. And thank you for your attention. Questions? Questions? Questions? Questions? I noticed that you guys are using either Postgres or MySQL for your metadata database. We are using Postgres. For us. Yeah, right now we are using Postgres, but we have a data extension object, so we can, in fact, use whatever relational database available, for example, MySQL or Postgres or whatever. And it's also extensible, so anybody can go and make an adapter for a key value store, for example, or whatever is in mind. Yeah, I was curious how many foreign keys and that kind of database SQL features you have tied to before. A key value store is obviously a good... Idea for that, so thank you. Like one little thing more, Dropbox also is using MySQL. So in the beginning, we used a key value store, but then you have a problem with consistency, so it's much easier to go for a relational one. And Dropbox is also going for MySQL, and it's just like that. Thank you. On your mobile API slide, you guys had a website, but then throughout the rest, you didn't show much with a website. Do you have a website kind of like ownCloud has a website where users can access that from a mobile browser or from mobile? Yeah. Yeah, we have a website. Well, what we did was make a branch of the ownCloud website, and we connected with our system. But in parallel, we are working on a standalone website created for us. But for the meanwhile, we have the adoption of the ownCloud one. Yeah, and I would like to add we have this ownCloud version. We have another version in Python offering you all the web interface like Dropbox, but also we have one partner in the consortium that is iOS with a web desktop. It's a virtual web desktop, and they already published a connection with Staxing, and they are our Google Docs. So they have Office in web and everything there, and they are offering that also as open source. So you can also download iOS as the web client because they are using the APIs. So it's like, yeah, there are different web layers. I think I have a question in the similar direction concerning the problem of managing the data without having a client that has enough space for all the repositories. So the one question would be, is it possible to synchronize certain folders selectively different on each client? And second one, can I maybe manage on the metadata level without downloading all of the files first? For the first question, right now we don't have this feature. We cannot get selective synchronization, so we synchronize the whole repository. But it's on our roadmap. And yeah, we already thought about that. It's a feature that we want to add. For the second one, do you? Well, the second one I thought that you asked, if it was possible to download metadata without data, is it possible? The question is, I mean, let's say I want to reorganize my data, but on my client I don't have space for all of that data. Let's say we have 20 gigabytes of pictures, and I want to reshuffle them. And one way I think would be go via a web page because then I can do it in a browser and don't have to download it. But another possibility, the question is what else is possible or is that a possibility? Well, right now, as the clients synchronize, so they have to download all the content. So you have to download everything. But what you can do is you can go to the website, and then you have the list of your files and folders. You can rename them. You can move them from one folder to another, and so on. So I think if you have a lot of space, a lot of storage, a lot of files, the best way is to go through the website or the API. You can, for example, go through the mobile clients, and you can also move a file into a folder and so on. Okay. On mobile networks, you have the problem with the cloud or the mobile network being inconsistent. I noticed that obviously in your API, you chunk the data, you encrypt it, you send it. What happens if you've sent a partial of a file or you've received partial of a file, you've sent the metadata, and then your connection, are you able to resume that, or are you able to ensure that other clients aren't downloading because they see the metadata, but an incomplete file send? How do you recover? I want to say. Well, in this case, what we've always done is first, upload data, and once all the data is uploaded, it's in OpenStack 3, for all the clients, we then upload the other part, which is metadata, and once that metadata is updated, all the clients receive them, and then start the download part from Swift, so. So that's great, but what if during a large upload or a large download, so great, the metadata's not there, but you're interrupted, does the API have some sort of resume quality? Well, nothing, the API, I will explain from the desktop point of view. If you're trying to upload a file and something happens, the file remains unthink. So the client starts trying to upload the rest of the file, and once that the file is uploaded, he will say, okay, this file is synchronized, and then I have to upload metadata. So we don't allow this situation, this strange situation, so once that the client is really sure that all the data is ready, then it changes to the second step. No, and the good thing of architecture, for example, other systems, like on cloud they store this entire file, so if it's a large file, you have problems, if you lose the connection or there is some problems. In our case, working with tanks, you resume the connection, then you have problems that you are not losing anything in either uploading or downloading, so it's different, yeah. You don't start from the beginning, you start from where you left. Yeah, of course, you can start from the chunks or chunks that were there. The remaining ones, yeah. Because on cloud you gotta restart, right? Entirely, yeah, you have to put everything in again, yeah. You were talking about the authentication with Keystone, with the users, and users in Swift, and I was kind of curious how that works with an existing OpenStack implementation where the users are typical users of application developers and not necessarily users of the applications themselves. So what we do, in order for the client, the desktop client to authenticate, we use Keystone, okay, we use Keystone users and the password and so on, but for the IPIs, we use our implementation of OAuth, so in the end, you have to log in in Keystone, and every time you want to access to the data objects, you have to provide your credentials, your token in Keystone. You are right, I mean, in our case, we map users of the system to users of Swift and containers to users, so this is one design decision, we have another candidate model to bringing all users in a logical way to one single container, but for the moment, we decided to map this because it's simpler, for example, Quotas in Swift can be applied to Quotas in the container of the user and all the rest. So are you looking at possible alternatives down the road so that you... Yeah, logical division, yeah. Okay, good. Thank you. Well, I think that's all. Yeah, I think we are finishing now. Okay, so thank you very much. Thank you.