 Okay, good morning, everyone. It's great to see so many people here that I don't know yet. That's a good sign for me, actually My name is Christian Schwede. I'm a principal software engineer working at Red Hat and Today I want to talk to you about building web applications using OpenStack Swift So about 15 years ago I was asked by a photographer or a small consulting firm to make their pictures public available on their website for their customers and Back at that time it was quite complicated because you had to use all to store all the photos somewhere You had to build this storage system on your own And you had to split the photos across multiple servers. It was quite complicated. It was a huge effort that you need to take and Happily today, it's much simpler Object storage made this much simpler today, and this is what I want to talk to you today about So let's get started with an introduction basic introduction into object storage itself What it is? What's the difference between? Well traditional storages like block or five storages It's basically is a very simple way to talk to your storage system on an application level That's the difference. You really talk on an application level to your storage system using very simple rest API interface So that basically means you have an HTTP URL for each of your objects each of your data sets and Can access this data using the HTTP URL You normally have a very flat namespace also So in a traditional file system you have a directory tree with a lot of nested folders for example That's not the case for an object storage. So an object storage normally only contains Containers buckets, whatever you name it, which is a collection of a few or more more than a few objects But object storage has or makes it possible to scale the whole system very massively So if you for example take a traditional file system, it gets really complicated if you want to go beyond a single node you need a distributed file system in that case and That makes it very complicated an object storage as a developer at least application developer You know don't need to think about scaling up your application because the URL still it's the same You don't need to take care about where you read the data from Also most object storages store each object multiple times so for example three times is mostly used within the traditional and public known systems and There is a cave eats that you need to take into account if you're using this object storage system It's eventual consistent. So what does it mean? If you store an object on this system and you do later on a listing of the container where the object was stored The container listing might not yet be updated that happens especially if you have some failures in within your system Object storages are traditionally built around to to work around these failures So as a normal user, you don't realize when there's a note down or a disc fails or something similar also Object storage is stored as a metadata or you can store metadata along with your objects directly in the system It's different than your traditional file system, of course so for example if you have a large video file for example and You want to store some informations like who recorded the video? What's the content of the video and so on and so forth? You can directly start within the metadata that is directly assigned to the single object videos and other unstructured data is The best use case actually for object storages. So don't use it if you need kind of database like applications It's best used for really unstructured data and large data sets for example video files image files also for backups and other large files sets So as a developer or a company you might be interested in using object storage. So you start looking into object storage solutions and Most likely you start investigating with some public clouds, but that is not for everyone. So Your legal or corporate requirements might make it impossible to use a public cloud For example, if you're storing health care data or financial data or other similar things That is not intended for the public It's very likely that you have to store the data within your company and be the owner of the data It also might be too expensive and that's not only the expense of the storage Per month that you need to pay what really makes the bill bigger is Most of the time the bandwidth usage actually if you want to read data. So if you have for example Video files large video files and users are downloading them. You always pay for the per download and not only for the storage and of course you might need miss some features on your public clouds if you start if you deploy an own cloud and A private cloud within your company you have to control over the features that you want to offer your customers your users and your developers So you start using or you want to use a private cloud and now Especially with storage systems. It becomes important to avoid the vendor lock-in Anybody transferred or migrated petabytes of data from one storage system to another storage system in the past No, okay, that's good for you I did that in the past and it's really really a tedious process It creates nightmares for developers for operators and so on and so forth If you have a traditional storage system and you need to upgrade to for example or exchange For example to a different vendor or storage system You need to take into account that it takes a lot of time that you have during the migration To storage systems that you need to support. It's not done within a weekend or so If you're migrating petabytes of data, so you're normally talking about weeks if not months or more so That brings you hopefully at least to open stacks with with open stacks with and other open source Storage systems you have the full control About the whole stack of your object storage system. So that doesn't mean You have the choice which operating system you want to choose Which swift version you want to choose and even if you later on decide I want to use a different operating system from a different vendor for example You don't need to migrate all the data that is already stored in your cluster because the data on the disk itself stays there You just replace the operating system level or maybe the swift level But the data on the disk is still there and can be reused afterwards That makes it also very simple or very nice if you want to upgrade your cluster because you can upgrade your cluster in production Taking a few notes out of the cluster upgrade them and later bring them back online later on and Of course you have the full control About all the settings that you want to provide within your swift cluster for example where your data is stored If you for example need a key role application Storing data for example in Tokyo site And another copy of the data for example in Paris Your you can do that with swift Swift is proven in production since a few years. It was originally Started by rex space. I see at least one rex space guy here in the audience and sorry and It's I think these guys really showed that It's it's possible to run a storage cluster with hundred of petabytes of size in production for a few more few years So talking about swift, let's have a short look at the architecture behind swift as a developer Application developer most of the time you're talking to the proxy server. The proxy server is basically the the entry to the object storage system and If you want to store an object you for example send a put request You have an account name. You have a container name and finally an object name where you want to store the data on the proxy server then takes your object and In case of this configuration where you have three copies of every object It creates three copies on different Discs and different back-end storage servers So even if one of the or maybe two of these servers or discs are failing your object is still readable later on so as you can see as you can see in this graph you You could take out a few notes or two notes in this example upgrade them bring them up later again and During the whole time the user still have access to the already starts objects What is also important is that for example if your disk breaks and you need to replace it There are some demons running inside the cluster on the storage nodes for example object replicators That ensures that every object is stored three times on different locations inside the cluster If it's not the case the replicator creates the missing copy from the already existing copies Sometimes it also happens that you have read errors on your discs So that are not really detected by the disc or by the operating system itself but Detected by Swift by some processes that are running in the background all the time so There's a process called an auditor and it compares the read data from an object with a previous start checksum And if there's a mismatch it replaces that faulty object with a copy from the existing objects that are hopefully not yet corrupted If you're running a larger cluster you see that Permanently that you have read errors from the discs that are detected by this mechanism Most of the functionality that is interesting as an as in developer is running on the proxy server so There are middlewares That you can configure to use The most interesting part is probably authentication middlewares There are some middlewares that are shipped with Swift But it's also very easy to write your own Authentication middleware if you want to hook up Swift to your existing corporate environment for example There are middlewares that are specially Targeted at web application or application developers. It's a temporary URLs and formpost middleware We will come to that in a few minutes again You have support for quota systems both on the account and container level You have object versioning So if you store an object and later on you want to store an updated version of that object within the same name Swift keeps both copies and you can later on retrieve Whatever version you want from that object There are also some third-party middlewares and the most prominent one is I think the Swift 3 middleware the Swift 3 middleware makes it possible to use existing applications that use an s3 API Together with Swift that means as an application developer if you already have an application that talks to s3 It's very likely that you can hook up this application to Swift With no or only a few smaller modifications You don't need to reinvent the wheel and start working with a whole new API set it makes it easier to migrate to Swift and Then there are some other functionalities built in that are not yet middlewares But they are more tied into the into the core code of Swift for example expiring object support I Can set a special metadata on an object telling them the object? Okay, you are only valid for example for one week and after one week Swift denies access to that object and deletes it in the background Important as an application developer, especially if you're using scripts or JavaScript for example is the cross original source sharing So that makes it possible if you want to run JavaScript on one domain and you have your Swift cluster on a different domain and want to retrieve data We come to that again in a few seconds so if you want to talk to Swift you're using the rest API and Most of the time you require that requires an authentication token so the authentication token is sent as a as an HTTP here and It's normally the content of that token or the token itself comes from your authentication system for example Keystone if you use the one that is shipped with OpenStack and Then you can access all the data on the cluster. Hopefully if the user is allowed to do that There are only a few basic operations that are needed. It's basically get requests If you want to list objects or retrieve objects Had requests if you want to do read metadata post requests if you want to store new metadata and put in post to store metadata or actually upload a new object and Of course, you can also delete your objects using a delete operation and Copy existing objects to a new name So if you want to upload data to Swift, there are two ways to do that While the most prominent one is probably a put operation if you use a command line client you are always using the put operation and It may it normally requires an authentication token. That's not very useful if you want to so create a web application So there is another way to do it. It's with a temporary URL So you have a specially crafted URL that has a signature appended to it And the signature is then read by Swift and compared to some previously set settings To allow or deny the access to that objects But put operations are only possible if you're using some scripts with a new browser for example, JavaScript So if you want to use or if you want to build an application that doesn't use JavaScript for example, that's still the case for some some use for some companies Then you need to do that with a HTML form using a post request. So there's a middleware for that The cave eat here is that you don't know the final object name in advance. So you tell Swift, okay? Here comes a form request HTML form request to this container maybe with a prefix but Swift doesn't know yet and your application doesn't know yet. What's the final object name? because your browser sends the object name together with a request and You can't modify it also on the browser for security reasons So what you need to do then is finally you need to do a listing of the container and to retrieve the final object name Because otherwise, it's not possible that you know it and if your application refer does a reference or holds a reference to this object You need to know it somewhere. So It's probably a good idea to do this as synchronously. So after you upload an object later on You just do a listing a special listing on the container and retrieve the object name one thing here is Swift makes it possible to avoid direct uploads to the application itself So if you do an application if you do an upload Upload it directly to Swift don't upload it to your application and the application then forwards it to to Swift Because if you do that it gets well the scalability of your application is really lowered in that case If you can upload the data directly to Swift It's much easier for your application your application is much more lightweight Because it only needs to handle metadata and small requests and all the big requests go directly from the browser To to Swift itself Okay, so let's get started We wanted to talk about web applications and not only Swift in the back end the simplest way to get started if you want to use Swift or in a development environment is an instance called Swift all-in-one There is a documentation available on the OpenStack webpage, but there are also various numbers of automated scripts using for example WayGrant To to make it even simpler for you I have a few links at the end of this talk and then you can fire up a VM with a completely running Development environment within Swift in a few minutes, which is a really great way to start actually by default all Swift all-in-one environments use username of test test are and a password of testing so If you don't if you use that one all the examples that I will show you Soon should be working fine So let's talk about first talk about client side applications with Angular and Swift So Angular is a web with a JavaScript framework That makes it very easy to start writing clients that applications so the Application itself is running inside your web browser. So there's no normally no need At least not in the beginning for a web server in this case You can store the whole application for example on Swift itself. It runs directly off Swift Served by Swift using a public available container as I said, there's no application server needed in that case Swift happily returns Jason for example, if you do a container listing and that's directly usable by Angular You don't need to convert anything in that case which even simplifies the whole process a little bit more and What I want to show you during this talk is to make container listings a little bit more powerful So there is a middleware inside Swift that you can configure It's called static web if you remember the Apache Directory listings that was invented probably 15 or 20 years ago. It's very similar to this You have just a basic listing of the container content making it downloadable That's a great way actually to to exchange data or large data sets with for example customers or clients for example, if you're a media company and want to Ship really large video files to your customers. You can use that. So I want to add a few more features to this sorting one showing metadata because you can show access to metadata within Swift itself and Actually do some range requests. So range request and HTTP is basically we have a large object But you tell Swift or you tell your web browser to download only a part of that object And if you have media files that makes a lot of sense because very often in large media files You have smaller embedded previews for example if you have picture from a still camera Record it in a so-called raw format you most of the time you have a small jpeg embedded that you could read and I want to show you that One thing we need to talk about is the cross-origin resource sharing. So as I said earlier, it's normally not allowed for strips with running in your browser to access data from a different domain and That's where the cross-origin resource sharing comes into play so The simplest work around is actually to if you want to just start with Swift and Angular To upload your data to a public container on Swift itself because then it runs on the same domain but that might not be The best way in the long run. So you probably run Your application code from one server and running Swift on a different server domain But happily it's quite easy You just set a special metadata on the container and then it is possible to retrieve the container data within your application So that's some html code and it's a full html code for the first example It's not that much. You can already see there are some directives That are not well html5 code, but that are related to Angular You start with an ng-app Which tells basically Angular what application is running there? You have two included files one for Angular itself one the JavaScript that is running for your application and then what you can see in line number eight You do some repeat some so you have a listing of objects finally there and for each found object in that listing you want to print out Table line in your html code or in your browser The corresponding JavaScript code. It's also the full example in this case. So you have a controller Angular has a controller you have an url and you have a heater and what you're doing is Line number four is you do an HTTP get So if you do an HTTP get on a Swift public container You retrieve a JSON listing of all the objects that are stored inside the container and if you get if that was successfully you Use there is to return to response data Assign it to a variable in this case objects that was a variable number before in the html code and It gets iterated over it if it fails then you just lock the content in this case to the JavaScript console So let's have a look what happens in reality if I execute that one in my browser, so This is a I make it a little bit bigger Yeah, it should be fine this is a public container directly served by Swift and It executes directly the file example number one html and this one directly lists the content of my Swift Swift container in this case That's not very spectacular. Yeah, I agreed So maybe you make it a little bit more useful. I talked about metadata before So each of the objects that you can see here Has some metadata applied to it. There are some default metadata in Swift and Some other metadata that you can configure or set on your own so let's call the second example and This directed there are some some new things here. So first Now the the whole content becomes Sortable, so if I want to know what's the biggest object I just click on size and angular is then doing the sorting of all the stuff and I can find the biggest or the smallest object but there are also This is also a link at the end of each line and for example if I click on this one The angle act application does a head request now onto this object and retrieve some data that is stored on with this object and There's some some metadata that I said in that case. It's always prefixed with x object meter and There's one in the first line. There's the x object meter preview links and there's one and they In that line here x object meter preview start And as I told you earlier Some media files have embedded previews. So I make use of that In the next example. So what I'm doing now is I'm adding a little bit more CSS stuff to make a little bit more pretty and Make use of the embedded Metadata so the only thing that I changed now is I used the very familiar Boot bootstrap CSS environment to make a little bit more pretty for you And add it a preview setting. So if I for example As I said, we have a preview links in the second line and we have a preview start That's an offset within the whole object as you can see the whole object is around 8 megabytes in size And if I just click here That works nice It only retrieves a few hundred kilobytes the embedded preview in this case And that's much more useful than a normal static listing, right? So if you're a media company You could for example, you could of course extend it a lot more But this is the application that is really running or using only swift There's no other server needed and I think it's a good way to start actually with swift Any questions so far Everybody's wondering what's what's happening here? Okay So if you want to go a little bit further, then you probably want to add Some some database stuff or similar things because now we are only we are limited. Yeah, that's a question Okay, so the question was to to see the actual request that is sent Let's do that in Chrome because it has a nice boot and So Is that big enough if I just do the listing the object listing You see some some smaller requests here at the top at the bottom Basically the one in the in the last line and last two lines and if I do the previous request Let's see where it is That one The second line from the bottom There's a request and as you can see as you can see at the end the size of this previous like 580 kilobytes totally and the original object was 8.2 megabytes The headers There this is a The headers why do you want to see that so there are no special headers set sent with this request It's just a basic get request Oh, you want to see the content length and content range stuff that I sent along with a request. Okay, so I Can show you directly what's happening in the In the source code Sorry Yes, but I just want to show directly what what I'm doing in the source code. Let's see So the question was what what did I send along with a request as a headers? How did I get to the request? What I'm doing here is I do an HTTP head request and then One second I retrieve the heaters that I start previously on the together with the object That's the first task and the second one is It should be yeah The actual loading of the image happens here in this case in the in the first 15 lines or so and in line number two I sent the range That I want to retrieve from this object So I give it a start and an end point and then it gets gets retrieved by the browser all right so We don't have that much time. So I switch over to Django now Miss a normal questions. Otherwise ask me afterwards. There's one question. Yes, so The the common was that swift itself can be used as a web server. That's true As long as you have static content or content that is executed on the browser side So in that case you can use swift itself To store content directly without the need for another web server. Yeah So sometimes it's Required that you do some post processing For example of your objects of your data that you upload it More some pre-processing whatever in that case you need to do it normally on the server side So you build a server site application one very popular framework because we are doing a lot of Python stuff here in OpenStack is Then Django Django is around since I don't know a few years already It's a very popular project and I Really like it. So I'm talking about that if you're new to Django, there's a very good tutorial on the Django project website and That gives you a basic introduction how to build a basic web application using Django itself and For the examples in this case, I'm using the two of the middlewares that are shipped with swift So temp URL and form post as I said earlier With these you can make it possible to download directly data download and upload data directly From the browser to OpenStack Swift without the interaction of the application. The application only creates the URL signed URL in that case and Returns that URL to the client and the client can make use of that So there's no need to route all of the data through the application itself So for a simple example, we have or I have a simple file sharing application. So Maybe your manager comes to you tomorrow and says, well, we have a nice Swift cluster in our data center I want to have a way to share temporary data with our customers or clients and It should have a nice URL and it shouldn't live on your public cloud cloud provider It should live on our own instance of Swift So for that in Django, you normally have views. So if you Access an URL in your browser a special view is executed and this example makes use of three different views I have one for uploading data it basically Creates a signature and a form that is shown within the browser. I Have a second view that is executed after the upload So after the upload Swift redirects the user or can redirect the user to a new or different URL and this finalize view updates the entry within the database application and Of course, I want to have a download view. So how does it look like? If you are just a shorter mark here, if you're developing within using Python it's Probably the best idea to start with Python Swift client because it makes it much easier to to start with and It's it's actually very simply the first operation is you get an authentication token and the storage URL and Using that storage URL and authentication token. You can do all of the All of the requests to Swift itself without caring for the rest API itself. So you don't need to reinvent the wheel for example, reading metadata or List the list of containers and objects storing new metadata and so on and so forth and These comments are then used in this example Yeah So the temporary URL key, so if you want to create a signed URL You need a temporary URL key that is stored within Swift itself normally on the account level with More recent to a swift versions. You can do it also per account per container level so This view or this helper function basically tries to retrieve that key and if that key is not available it uses a Swift client to actually create a random one and To do a post request on the account level to start a new key The value of the temporary key and the storage URL is then used for example in the download view So that's the whole download view The first or the second line actually is a request to to a Django model or Django object Which is stored within your database? So you request the URL for example was a primary key of let's say one and then you have a table in your database and If it finds it, you know the container name and the object name hopefully afterwards Then you set an expiration time in this case. It's only 60 seconds That should be plenty of time for for the request itself But it requires that you are that's a server that is Computing the signature and the time on the swift cluster is more or less identical if they're of course different By more than 60 seconds. It won't work. So make sure you're using NTP everywhere Finally, you have a signature and this signature is appended to a specifically crafted URL So I have at the end and URL that includes container and account an object name and Appended are a few parameters and This URL is then valid for this for the given time that you set in the expiration time See upload part looks very similar I won't go into the Python details for the upload part because it's also creating just a signature basically More interesting as maybe here the HTML form. So that's an HTML form for a phone post request You see some hidden fields here The most prominent one is probably the signature field. There's a previously computed signature stored in it Then you have a file upload field and you have also a redirect field So the redirect field gets populated with well the next URL that should be executed after the upload has finished so after the upload is done we're going to the redirect URL and Then for all the objects that we found that are stored on your swift cluster It does a listing of them in the in the container where you uploaded the data to and if it finds an entry itself It's in the database. Do you see any problems here? Clay should see a problem. Oh No, so I told you I told you earlier about the eventual consistency if one of the database if one of the Parts of swift itself is maybe down or it's overloaded at the moment you upload an object to swift But the listing of the objects in this container might not yet be updated a few seconds later So if you do this one Which gets executed directly after the upload it might not find an actual object yet because the listing is not yet updated So it's probably a very good idea to do that in an asynchronous process later on For example, you could trigger an action here saying okay that that upload should be finished right now Please check that and if you don't find anything right now do that later like five minutes later 60 seconds later or whenever and You need to be aware that this can take a take a lot of time So the default time for a valid upload request within swift is 24 hours So if you start as a client to upload a new object to swift and You upload for example only one byte per second then it will take a lot of time until you're done and You need to take into account is this amount of time if you want to do the asynchronous listing of the objects because if you start right now and the client is very slowly and You look at the list of objects right in five minutes for example might still not be there. So you need to be aware of that Okay, so let's have a look what happens in reality with jungle so That's one second need to start the jungle So jungle has a build and development server and if I start that one it listens on your local host and The simple example that I showed you before on the slides just Presents you a basic form for uploading data. So Oh So I select my talk slides I Do an upload so I was very quick. It's only two megabytes or one megabyte in the swift cluster running locally What is what happened here in the in between is the upload was directly sent to swift The upload finished Swift redirected the user to a new URL There's some finalize in it then a more or less random prefix because I prefixed the object and This finalized view then created the database entry in inside your Django application So if I click that link Then it will open download preview or a download starting with my summit talk PDF so I don't get a Sixth as a name or whatever The Django application forwards three or redirects a request directly to swift using the signed URL that I created before and Swift gives you the possibility to save that object. So you have some nice URLs Only with a D and a six at the end That should be much simpler to read for your customers clients. Whoever or your manager And yeah, that's that's a basic example in this case So I need to come to an end. I think So a few notes from my side Downstair millions of objects per single container. That's probably a bad idea if you're not using SSDs inside your swift cluster Because it might get a little bit more slow The requests if you do for example storing new objects If you in charge of the application itself, it should be fine to To distribute data across multiple containers without any problems Don't mimic any renames. So There's a copy and a delete Possibility within swift but don't do that to real to mimic the rename behavior Because actually you're moving data inside the swift cluster. That's fine if you do that for one object That's going to be a problem if you do that in a short time for a million objects Keeps a venture consistency in mind as I said earlier so container listings might not be updated yet And also if you use temp your inform post check the metadata that is stored with your object a user might Or a bad user might intercept might use that one and store His own metadata along with this request and there are some special metadata called for example It's delete ad which is for an expiring object And if you set that for example to one day then swift will delete this object after one day But you still have a reference in your application database Because the application database is not aware of that so after the upload He checks the metadata of that object to make sure that you users don't fool you in that case of bad users Normally just won't do that That said I have a few references the first one is a small repository with the examples from this talk That you might want to look into Just ping me if you have any questions on that and also all the Interesting slides and documentation from the swift and other projects as well And if you want to have a look into developing your own millwares there was a talk one and a half years ago Maybe or hopefully that's also useful for you. Have a look at that too And that said I'm done. Thank you very much for attending this talk