 server-side related. We use, of course, Objective-C and Java for mobile applications, but for everything which relies on a server, it's in Python. I've been a member of the TurboGears 2 web framework or development team for the last four years. If you don't know it, it's one of the oldest web frameworks together with Django. And I contributed to various Python web libraries like the MongoDB Object Open Mapper Ming, which is used at Swissforge.net for everything related to MongoDB. I have been the beaker maintainer since this year, and I worked also on Tosca widgets and former code, which are libraries related to validation and forms for the web. Most of my work has been related to the web world for the past years. What I'm going to talk to you is about a project that really happened, and we had at our company, which started just as a plain proof of technology. The customer came and said, hey, I want to try my idea, see if it can work, if it works properly, if people can use it, and it's not a huge mess and something like that. So we started with a really simple code base that then became the final product, became what the customer launched. As usual, it happens always like this. The customer came with something, just an idea, a test, and then it became the real Frankenstein. And the core part of this product was that it saved a lot of files, mostly images in this case. So we decided that as it was just a proof of concept, and we were really short on the budget, it should be done like in two days. We decided to not rely on the cloud storage, because it would involve more time to bring in any library to store the files and more money to actually pay for the storage itself. So we just decided to go for storing files on the disk and letting engines sell them. So the most simple solution, because it was really simple and for a proof of concept, was good enough. The issue is that the customer had a technical guy on his side, and this guy was in charge of deciding how to deploy the solution, which servers, which infrastructures, and so on. And here started the real problem, because the customer provided us the final decision with where the software is going to run just three days before they go live. So we didn't know where the software was going to run until three days before the public launch. And the issue is that as they were obviously short on budget, because at the beginning it was just a proof of technology, they decided not to rent a real server. And this was actually my face when they told me, because they decided to go for the worst possible solution in this case. They went for a free solution on Heroku, and Heroku doesn't support storing files on disk. Well, you can store files on disk. They will just disappear whenever the application launches. So actually, we couldn't deploy the software on the platform, because we stored a lot of files, we stored them on disk, and we knew that whenever the application was started, the files would just disappear. So that was a huge hope. Right before the launch, remember that we had like three days before they go live of the world's software. And so we decided to rewrite everything we had from scratch. Everything related to storing files, generating time, making them available, serving them, everything we used, just plain. We just relied on engines to serve them. We just saved the file on disk to serve them. We had to switch everything to another solution, which could work with Heroku. In this case, we decided to go with GridFS, which is the file system storage of MongoDB. I don't know if any of you know what it is. Actually, because the application relied on MongoDB for the database, and MongoDB has support for storing files in MongoDB itself, and it's actually a really good support, because it scales through MongoDB, and it's pretty fast to serve them, because it's just a key value storage. So you just put there your file and MongoDB will serve it. And usually it's really fast, because it's going to serve it from memory if the files is able to stay in memory. The issue is that it was just a huge hack. We didn't have time. Maybe we could have time to write it properly, but as we were in total panic, we just started to look for the fastest solution to make everything working. And so we monkey-patched all the classes that we're going to save data and replace them with something that saved on GridFS. And then we monkey-patched our whiskey server to actually, whenever a specific path was asked, it went to GridFS with the data and sent them back. So it was actually a huge mess and it went online with practically no testing, because we finished it like the day before. We tried it on our testing environment, but we didn't try on the real world deployment. So we didn't have time to try it on another evoke application, for example. And so we went online with just that solution. After we went online, and thanks God everything worked. So we didn't have any major failure, because actually what we did was pretty easy. We came together and thought that we actually needed a better solution. It was obvious for everyone in the team that this kind of thing should not happen anymore. We knew that the customer changed idea. We knew that we did the best possible things with the budget time and knowledge we had at the time, but still we had an issue. Still we did the wrong choice. So we wanted to find a solution that could work independently from the budget constraint, from the customer change of requirements and ideas. And we decided that this solution should be a tool that our developers could use and just rely on the tool and don't care about how and where their files are going. Everything related to storing files should be moved to the production, to the deployment phase, to the configuration phase, and not to the coding phase. So that's how we created a D-pod for that purpose to make our life easier to store files and be able to just say, hey, D-pod, store this file. I don't care about where you're going to store it. I just want you to be able to give it back to me when you need to serve it to the client. Actually, we wanted it not only to be easy, but of course to be fast enough for most web application use cases. And here starts the interesting part because I started to think how it was the best to design a framework that should be used in a web application environment and was related to storing files. There are a few things I learned by working on Turbogeostu for a few years. Turbogeostu has been used since 2007, if I'm not wrong, and so it evolved a lot. We saw a lot of changes. We started with a template engine, which was named KID, then we moved forward to Ganshee. And now Ganshee is not supported anymore, so we are going to move forward to Kajiki. And of course, every one of our users needs to be able to continue to run these applications. And for example, some of our users didn't like Kajiki and Ganshee and KID and used Jinja too, some used Neco and so on. And we needed to be able to support all of them and let the users work with all of them. So what I learned is actually that web applications, at least for the part of developing them, are much like a little kid. They have a lot of issues. They want things like they want them to be, and they might change their mind like every five seconds. Okay? Whenever you're working with developers on the web world, the web world is really fast. So your infrastructure might change any time. You might start small, then you have like 10,000 users the next day and you need to scale and change everything in your infrastructure. And I start with a specific technology. You decide to go with storing files on disk, and then the next day you need to change to MongoDB for storing files because you need to scale or your developers just don't like the previous idea anymore. Or maybe the library you are using as dialed, like in the case of KID when we switched to Ganshee. And so everything you do for the web world requires to be far more able to change on real time while on production because the web world environment changes pretty often. Okay? For various reasons. Not all of them are good. Sometimes it changes just because it's cool to switch to asynchronous technologies or things like that. But whatever. Your user want to be able to change what they are working on. And the third point is that automatic testing is actually something which is done for real for most web applications because it's easy to simulate the environment. It's easy to perform a request and check the response. So most web applications want to be able to provide automatic tests and test with. So wherever you write a framework for the web world, it should make really easy to monkey patch the framework. Well, monkey patching is the wrong term. But to drive the framework in a way that it's good for making easy to write tests. So to simulate the production application without needing the world production infrastructure. I'll make you an example. SQL Alchemy is really good. And one of the reasons why it's really good that is able to work on SQLite. Because when you write tests, you don't need to set up a whole SQL environment or Postgres environment just to run the test suite on your computer. You can go with SQLite or you can even go with SQLite in memory, which doesn't even need to store your database at all. When we decided to choose a MongoDB support library for Turbogears 2 because whenever you start a new project in Turbogears, you can choose to go for SQL databases or MongoDB. We decided to go for Ming because Ming had a feature which is called the Mongo in memory implementation, which made possible to write test unit without needing MongoDB at all. It simulated the world MongoDB server in memory. So you can create a record, check them and so on without needing to even start MongoDB. And it should be able to do the same thing. I want to be able to save the files without needing to actually start the file storage itself or without needing to actually upload them on S3 if I'm going to use the Amazon Web Services. So, and the last point is that what I learned is actually then making things really simple and easy to use, wins over providing them a huge amount of features. Providing a huge amount of features requires a real big investment in trying to keep them together and moving them forward, keeping them in shape and so on. And usually, you are not able to cover all the use cases on all the features because maybe you are going to use just 20% of the features, but there will be one of your users which will rely on the other 80%. So just focus on the really important features and let your users write extensions over them. If the good foundation is solid, then people will start relying it for writing their own extensions. This is one of the reasons why, for example, Deepot doesn't have a file system structure, it doesn't have directories, it doesn't have the concept of collections of files. You just store a file, you want a directory, you want a hierarchy, write it yourself. It's not hard to store file, to set the pointer to the file somewhere where you can have the hierarchy and so on. And in fact, there is a guide which wrote Deepot FS, which is an extension for Deepot that provides support for file system-like because it works also on things like with FS which do not provide the file system at all. You just can save that file and you cannot say I want to have a group of files in any way. So the first thing we focus on is to allow for infrastructure changes because that was our first problem. We faced that problem, so we knew pretty well what we needed to check and what we needed to do. So the first thing, three things we decided to do was to allow to configure multiple storage engines. So whenever you use Deepot, you can say, hey, I want to save something here, something there, something else there too. I want to have three different storage engines because I want to use locales and also read FS and also Amazon Web Services and we wanted to be able to switch storage engines at a runtime with a graceful restart, of course, not that you can actually switch it in your configuration without starting the web server unless you properly write some checks. And it didn't have to, it should continue to keep working on the previously uploaded files. So you can say, hey, from now on, upload files on Grid FS, but everything I uploaded on the disk should continue to work and Deepot will do that. And we wanted, of course, to be able to rely on multiple storages concurrently. So not only you could have Grid FS, S3 and whatever, but you could also use them in your application at the same time. And this is because actually it happened for real. One of our users came and said, hey, Deepot is really cool, but I want to store my avatars here, my items uploaded on my social network there and whatever is a temporary file for my own use should be on disks too. So how can I use three different storage engines at the same time? And this has been like the second question we had on Deepot. So it's been a real need from one of our users to be able to use multiple storage engines concurrently. So whenever you upload a file, if you do not specify anything, the file goes to the default storage engine, storage actually not storage engine. If you specify something, you can drive the file to be uploaded on a specific storage. And storage are actually identified by a name. So that storage right now can be on Grid FS. But if you configure a new storage, which is named the same, but is on S3, your old files continue to be served from Grid FS and whatever you upload new will be served from S3. Because Deepot knows that the old files are on Grid FS and the new files are on S3. And you are still using the storage which is named avatars in case of user images. And then you can of course use multiple of them during runtime. And that's made possible because Deepot, as I told you, has no concept of a file hierarchy. So it's able to identify files by an ID. And the ID is paid to the storage name. So every file is uniquely identified by an ID and the storage name. So as far as the storage has the same name and the file has the same ID, he will be able to look up for that file even if the underlying storage changed. Okay. And the other part we wanted to do is provide a really easy way to use everything. So we provided something which is called the Deepot Manager, which is in charge of actually doing all the configuration so that it could work on practically any web framework. And we were not bound, for example, to using the INI files, which is what we use in TurboGears for configuration. You could use Yamle or whatever you want for storing configuration or you can even write the configuration in Python itself. Because the Deepot Manager is the one in charge of keeping the real configuration and is able to load it from various sources or from dictionaries or from whatever. And it keeps track of what you have currently active and configured so whenever you need something, you go to the Deepot Manager and say, hey, Deepot Manager, give me this storage. I don't care where it is, how it's configured and how it works. Just give it to me and I will save the file there. And if you don't want to get any specific storage, you just ask for a storage and it will provide you the default one. So this is an example from the documentation of Deepot, which is the most simple case. We are just configuring a storage, getting the storage itself and storing the file on the storage. So you can see that the configuration in this case is made through a dictionary and we are configuring a default storage, in this case, in name default. And the storage uses the GridFS backend and provides some additional options which are related to the backend itself. So in this case, it provides the MongoDB URL. Then we get the storage itself. In this case, we don't specify any specific storage, so we are actually getting the default one. And then we just create the file. Whenever we create a file on the storage, we get back the file ID and we can look back for the files through the .get method of the storage. So you see that the interface is pretty similar to dictionaries. Just create something, you get it back by key. Nothing more, nothing less. This is the core foundation of Deepot. And over the core foundation, there are more advanced things, more complex things. We focused on providing a solid foundation on which we could actually implement more advanced features. And one of these features is the support for database systems. Like in this case, we have support for SQL Alchemy. So you want to store a file which is somehow related to your model, like in the case of a user, you have the avatar. And you want to store the avatar inside the user. You just declare a column which is of type, upload a file field. And you can specify the uploaded type. In this case, it's an image with a thumbnail. So whenever you upload the image, it will also get a thumbnail too. And then whenever you save your document or user, you just assign the photo to the file, and Deepot will upload it on whatever system or whatever storage you wanted to, or if you don't specify any on the default one, and we'll link it to the actual model itself. So I told you that one of the things we learned is actually the web application changes offense. Maybe the developer change, maybe the technology improves whatever. So it should be easy to support different technologies. So in Deepot, we focused on making everything a layer or a layer. For example, we have support for SQL Alchemy attachments. We have support for MongoDB attachments. We have support for storing files on S3, local files in GridFS. And we have implemented everything as plugins. So if you want to support storing files of your own system or whatever you invented yourself, you just write the plugin and everything else in Deepot continues to work. The SQL Alchemy support will continue to work even if it's written on your own plugin because you just need to implement the storage engine and nothing else. And there are even files it's made by a whiskey middleware. So you can use it with any web framework. We use it with Turbogios, but if you have a Flask user, you can just attach the framework to Flask and go on. Actually, most of our users are actually Flask users because currently it's what most commonly used for web APIs, I suppose. And then it works together with your database. If you don't know what it is, it's actually a real query. It's called the query of despair. It's a really, really long SQL query. And what it means that it works with your database, it means that it copes with your transaction, for example. You uploaded the avatar of the user by saving the user face, updating the user face. Your transaction gets rolled back as far as you have a transaction manager properly working. And then Deepot detects that your transaction rolled back and will recover the previous states of the files. So if you try to save a new state of the user, and the state includes a new avatar and a new name and surname, and storing the name and surname fails for whatever reason, maybe a dialogue on something in your query or whatever, Deepot will detect it and will recover the previous states of the avatar too. So you didn't save things, alph, only the name. Your models will change in a proper way. And whenever you delete an item, it actually deletes the attachments only if the deletion of the item properly worked on the database. If you fail to delete the items, you don't end up with an entry which is in your database, but you don't have the avatar anymore. So Deepot detects the transaction fail that will recover the files that wanted to delete. And the last thing is that it should be really easy to extend. So we focused on two types of extensions to provide additional behaviors of our Deepot. One is attachments themselves. So whenever you provide an upload file field, you can provide an upload type. The attachments are actually in charge of changing the file itself. So whenever you want to replace the file with a new file, you want to go for an attachment type. And then you can, over attachment type, you can also provide filters. Filters do not replace the file itself. I'm not able to change the content itself, but they can add additional information to the content, which might be additional metadata or additional files in this case. And you can, of course, apply multiple filters. For example, you might have a filter which generates thumbness and you might apply four of them because you want a small, medium and big thumbness. And you just declare the same filter three times with different construction options. And you will end up with three different thumbness. Let me show you a real case of an attachment, which is took from the documentation of Deepot. And the interesting part is actually that they not only can change the content itself, but they can also add additional behaviors to the files. What does it mean? It means that whenever you recover the file from your file system, it will be converted to data plot type. So if you apply the type provided additional methods, like, for example, I don't know, give me the histogram of the image, you can call them on your ready stored files. So Deepot will know the original type of the upload and we'll be able to recover its state and provide all the additional features and behaviors your files have, not only to just change the file itself. Or, for example, if you want to add additional information, like you want to store not only the file, but also, for example, the primary color. For example, if you want to look for the images which are red, you can store that inside the file as a metadata because Deepot keeps tracks of the files and all the metadata over the file. So you can add additional details over your files. And this is the example of a custom attachment. In this case, it's applauded an image unless it's bigger than a specific resolution. In case the image is bigger than that resolution, it gets shrinks to that site. So the first thing we do is getting the content itself and its data. And this is done through two helper functions because we don't know what the content is. We know that Deepot is going to save files, but we don't know what the user is going to provide us. For example, it might provide a file. It might provide bytes in memory. It might provide a byte IO. It might provide a GGI file field if it was something applauded from the web. And we have this pretty convenient function, file from content that whatever is the content will convert it to a proper file. And it's pretty efficient because users in memory storage for files which are smaller than the sites and then it stores them on disk only if the site is bigger than the maximum size. Then we open the image, check for its sites. If the site is bigger than a specified limit, we create a new thumbnail for the image of the maximum size and we replace the content. We see that in this case, we replace the content variable with a spool temporary file, which is that kind of temporary file which stores everything in memory until you make the data bigger than the maximum size you specify. And then you save the image itself inside your spool temporary file and then gone and provide to the process content, the replaced content. So you just call your parent method with the new content and in the middle you can do whatever you want because the real logic of saving the files is inside your parent implementation. Moving to filters, we already know that we already know that attachments can have more than a filter and we already know that they run after upload. So while the attachment itself runs during upload, this call runs before the file gets uploaded. So in this case, this is by design because if we fail in generating the thumbnail, we do not want to go on and store the data in the database, for example, and we end up with a user without avatar again. So if the avatar for user fails, the depot crashes and you won't have the user created at all. So not only if writing on the database fails, the depot recovers the files, but also if creating the files fails, you have a proper exception before saving the data to the database. So we try to do the best we can to keep in sync the few things. If any of the two fails, you don't have done anything, you haven't done anything at all. Then, so in the case of filters, actually you do not work before uploading the files, but after. Why? Because filters usually provide additional behavior-saving information. So in case it fails a filter, it will just go on and provide the details that the filter failed, but you already have the files. So you can recover the additional information from the existing file. So even if the secondary thumbnail fails, the medium science thumbnail fails, it's not a huge issue because you can recreate that medium-sized thumbnail from the original data. And as I told you, you can add additional data to your files, but in the case of filters, not behavior. So you cannot add additional methods to your object through filters. And here is a simple example of a filter, which actually saves the thumbnails for a specific resolution in a specific format. And you see that we just receive the on-save event and inside the on-save event we have the uploaded file. And at the end of the code, which mostly just creates the thumbnail, we just add to uploaded file any information we want. In this case we add the thumbnail ID, thumbnail part, and thumbnail URL to the uploaded file. So uploaded files work like dictionaries. You cannot add anything you want to them and you have the file itself, so the content and all the metadata you added to the file. When you look back at the file, so you query it back from your database, you just have the thumbnail URL property, because we added it here at the end of our code. So you just get it back and look for that property. If the thumbnail URL is none, probably if you thumbnail failed and you can recreate it from the original file. And one of the core parts of Deepod is that it's meant for the web, specific to the web. So we wanted to make easy to use content delivery networks and we wanted to make easy for people to rely on Deepod for serving data to the web. So everything which is needed for serving files themselves is provided by Deepod itself. So when you store a file, Deepod already gets the content type, the last modified time, the content length of the file itself and the file name. So when you serve it back, you can properly add the ethers, the HTTP ethers for that file without having to work on them yourself. And we already know that whenever you want to serve them, you just rely on a whiskey middleware. So you just create, make the middleware and wrap it around your application and Deepod will do the proper thing to serve the files. And if the backend you are storing the files on supports HTTP itself, for example in the case of S3, you can be sure that the middleware will not serve the files itself but will redirect the user to the middleware itself. So in case of the content delivery network, you will end up serving the files from your content delivery network. So please try it. If you have questions of anything, let me know. If you find bugs of anything, I'll be more than happy to fix them. Everything is supported from Python 2 to 2.6 to Python 3.4. We haven't tested it on 3.5 but it should work. Everything is fully documented, so if you find something missing in the communication, let me know. We will cover it. And everything is tested with 100% coverage, so you can be pretty sure that it works. And we are already using it in production on various environments. So try it and let me know. Thanks. Questions? The microphone for them? No, for them asking. Yeah. Okay. Okay. He asked how much it costs, how much effort would be required to make it work on an asynchronous framework. Well, we used it on production on G-Event but G-Event is not a really asynchronous framework. It's a synchronous but it's far different from 2-Lip AsyncIO, for example, or Twisted because it's implicit as synchronous and not explicit as synchronously. So I'm not sure how much it will take to adapt the middle of itself to something like 2-Lip which will require to move from function to coroutines and so on. But it should be fairly easy actually because it just gets the files and sends it back to the content so it's to the browser. So it's a pretty good use case for a synchronous framework and the middle of itself is just one underline of code. So even if you have to rewrite this from scratch it will take like two hours no more. Okay. So the middle is already divided in utility functions. So the code itself that serves the file is like 10 lines of code which you can probably move to 2-Lip or something like that. But I haven't tested it. Only use with Gavant and I know that on Gavant it works better. She was before. Okay, we have it. So you mentioned that in case of a rollback you restored the files. So do you need some sort of storage for the depot itself or some metadata? No, actually what happens is that depot generates a unique ID for each file. So if you create a new version of the file you actually end up with a different ID and the old ID gets deleted only when the new one when the transaction gets committed. So for a time when you have the transaction rolling you have both the files and they have two different identifiers. If the transaction goes on and successfully commits it will say hey this new one is the proper one delete the old one. If the transaction rolls back to say hey the old one was the proper one delete the new one. So it just keeps both the files available at the same time and then decides which one to keep at the end of the transaction. And you mentioned it is transparent to switch for what from one type of storage to the other. So when you get a request for a file how do you know if you need to store it to serve it from the old storage system or the new one? Okay that's actually stored in the file metadata itself. Okay. So every storage engine need to provide support from some kind of metadata. In the case of GridFS it stores the metadata together with the file on the DB. In case of S3 it stores the metadata as HTTP either of the of the file itself. In case of the local file system it saves the gson file with the metadata and so on. Every storage engine is in charge of providing a way to add metadata to the file. And then DIPOT will rely on the metadata to know from where it should serve the file itself. But when you get the request from the user you only know the file name. So how do you know what's the storage? Not really because when you store the file at the low level we only know the file name. But if you bound the the file to a column of SQL Alchemy or MongoDB or whatever inside the column it gets actually stored the JSON with various information including from where to look up for additional details of the file. So if you use DIPOT at low level yes you have to provide the fallback yourself. But if you rely on the high level APIs they already provide it for you. Okay. Does it out of the box support uploading to a temporary URL on something like S3? Sorry. On like Swift and I think S3 as well you can be provided with a temporary URL to upload directly from the client. Does it support that? Okay I understood. No currently no as the most of the logic happens in DIPOT itself the client needs to upload the file on your server which processes the data and then uploads it on S3. You cannot directly provide the data on S3 as otherwise you will lose all the metadata that DIPOT calculates for you. We will need to provide some kind of DIPOT support in javascript itself so you can get the metadata before uploading there. I don't know if we have more time. We can ask outside of the room. Thank you.