 I'm Claudio, and I'm technically self-employed, but I mostly work for one startup in Singapore right now, which is a second-hand luxury fashion called Star Tribute. This is a different, like this project was done for a different client back in Switzerland, actually. So, yeah, the topic is on-demand image skating, so there's also lambda in there. Can get a bit deeper into that, maybe. The project I was working on, or the project this is part of, was a move to the cloud for, like, an old-school PHP-based CMS application, and there was an image-scaling already built into the templating system of that CMS called Type 3 CMS. It's kind of big in Europe, but nobody really knows it outside of Europe. It has, like, image-scaling all the stuff built into the templating language, but it has a couple of problems. This is how it looks like inside the templating system. You have basically a view helper, an image view helper. You pass it an image file name that lies on the local storage. You specify the width and height of the image, and it will then on, when it generates the page, which is usually dynamically, so when the user goes to the, like, requests a website, browser requests a website, it will take that image, see if it already has a scaled-down version. If not, scale it down, then output the URL. So you can either use image file names. Basically, the way you identify the image is the file name. Not some, I mean, you could use a database identifier, but you can use the file name. Now internally, it looks a bit more complicated. Type 3 keeps mainly two tables by itself. One for all the files it has, and then one table that has one entry for every scaled version of these image files. So normally, for each image, when you generate the page for each image you need to output, each image URL you need to generate in your HTML. It's like this very long process that involves checking if the file exists, checking if a database record exists. If it doesn't exist, it will just create it, then checking if a scaled version of the image exists, both on the files and in the database. If not, scale it during page generation. And if something goes wrong, actually, that's kind of the worst case here. For some reason, it just returns the original file if it can't scale it down. And then that tends to get caught up in caches. Now if you do this process about 100 times or something to generate one page, that can be quite slow. I can add a couple hundred milliseconds to your page generation time. And as I mentioned, it's not very reliable. It also puts all the files on the local file system, so that's not very cloudy. We want to serve from AWS S3. And as I said, it has lots of database queries, one per image, or at least two per image, actually. Sometimes it will write depending on the status of how fresh or what's in the cache. Now my replacement for this, I wanted to use, as I mentioned, I wanted to use S3. I wanted to have the same functionality that I can on demand. I don't want on upload time. I have to specify when I upload the image, go, OK, I need thumbnails in this size. That's a very common case for people. When you upload a blog, you upload an image, you know what sizes you need. So on upload, you generate all the different sizes and you save them. It doesn't work for this case. It was like a highly dynamic project with lots of different templates and everything. You never know what sizes you need your images in. And I wanted it to be faster, obviously, than the current solution, and robust, mainly. Now this is what I did, using first like concept-wise. I put all the information for the image inside its file name. So we have, if we use a file name as a reference, we have all the information we need in there. The size, name space, that's basically a table name to make it unique. Say we have multiple tables in our content management system. We have images for ads, for articles, all that stuff. So I want each image to be like each record to have a unique image. So if I delete a record in the database, I can delete the image that belongs to it without having to be afraid that I will delete, that the image is referenced by some other record. So I want to make the file name unique, tied to the record. And it will contain its own original image width and height. So this is generated by the client library that I wrote for this project on upload. When you upload a new file, it will rename it and push it into S3 under that name, the original file name. Then the URL for a scaled version of the image will be derived from that file name. So if you have the original file name and you want to know the URL of that image with different dimensions, you can derive it completely just from the file name itself. That is also functionality that's inside the client library. So now we have an original file that's saved on S3, image file in ridiculously high resolution. We generate the scaled-down file name for it, but we don't really have that file yet on S3. And that's where Lambda comes in. So S3, it's static hosting, so you can't really make it react dynamically to your requests. But what you can do is you can make it redirect you whenever you get a 404. When you want to download a file that doesn't exist yet, you can make it redirect to AWS API gateway, and that then will invoke a Lambda function. So based on that, and again, so I go back to step two. From the time when I have the original file name in my CMS system or whatever, and I want a scaled version of it, I don't have to talk to S3, so there's no high latency operations involved. I have the file name. I can generate immediately a scaled version file name for it. If it doesn't exist, I will get forwarded to API gateway using a routing rule for S3. That's inside the S3 configuration. And then you will call it a Lambda scaling task. Now the Lambda scaling task gets the path of the scaled file name and can from that derive or get back to the original file name. So the Lambda task will download the original file from S3, scale it down, put it back on S3 in the scaled version, and then redirect the client back to S3. So in that case, you get two redirects if the file doesn't exist. But once the file is there, the future requests will be served directly from S3. So it's basically like using S3 as a cache, you can control a bit more than CloudFront. The objects stay there indefinitely until you delete them, and if they are not there, they will be generated. After that, we add a little bit of cleanup. If you delete an original picture, there's another Lambda task that can be triggered from when you call it, I think it's delete object on S3. You trigger a Lambda task that checks which file is being deleted, gets all the scaled versions from that, and deletes those as well. So you don't get a lot of stuff flying around. Right. One problem with this is that if you can derive a scaled file name from the original file name or from any other scaled file name, actually, somebody could just write a shell script with a loop and curl a couple of different image-scale configurations for the same image. And it will just fill your S3 with different scaled versions of the image, and it uses a lot of Lambda time. So to avoid that, I signed all the URLs with a HMAC hash, basically, which is a shared key way of signing messages. So in that case, only the server that has the same shared key can actually generate URLs for scaled images. Looks like this in the end. It's a very ugly URL in the end, but at least this way, you can't really create, without the key, you can't really create random scaling configurations for an image. So this is the complete path for a scaled image. It has a content hash reference to the table, the size of the original, the size of the scaled version, and the signature to make sure it's actually OK to generate the scaled image. So the original was 2? 240 by 160, and the scaled one is larger, yes, in that case. And that's that, too. I picked a shitty example. I picked a random one. All right. I took a very short demo video. In that case, I took 16 JPEG images, around one and a half megabyte in size each, and I scaled them down to 1,000 by 1,000 pixels. At the moment where the video starts, the scaled versions do not exist in S3. The originals are uploaded to S3. I put all the URLs in an HTML file, and the video starts when the HTML file is being loaded in the browser. So you get an idea of the execution performance. In that case, I think the lambda function has already been warmed up, because it's quite quick. I think I ran the same function a bit earlier. Let's see if it plays. I don't know. Yeah, cool. So it starts loading as all the images loaded. Actually, the total time was about three and a half seconds for 16 images when they are not in S3 yet, when they are not cache, this is the waterfall. The middle one, as you can see, there's a per image about three requests. The original one to S3 redirected to API Gateway, which invokes lambda, and then back redirected to S3. Obviously, the middle one, the lambda task rescaling takes the longest. So this is the case when the scale versions don't exist. When they already exist, basically, in this case, they actually serve from a cloud front, because I have a cloud front in front of it. But S3 would be similarly quick. So we have about half a second for 16 images, which is mainly because of the slow internet connection. I tested this with. Just one question on the other side. Yeah? When you call API Gateway, you're actually dropping the code and you're trying to clear the two locations of things? Yeah, exactly. So it's not completely asynchronous. It's parallel, though. So that's a great thing. I mean, I could have picked 30 pictures. Actually, I don't know how many requests Chrome nowadays does in parallel or browsers. Maybe seven knows. It's stupid. You can do two to the 31st. OK, cool. Yeah, but what's common for browsers? One extra you want, I think, is eight. OK. Well, for some reason, it does all 16 parallel, I think. So it's surprising. I thought it was like a lower limit. Anyway, basically, you can, as many HTTP connections, you can open until you hit the lambda limit, I think, to give you a default execution limit of what? The handle or something? Parallel? I don't know. But it scales. You don't have a per image waiting time. So I could have done the same demo with 50 pictures and it would have been the same. Right. I put the code up on GitHub. The first one is the server component. I used Terraform for the whole infrastructure, so the whole configuration of API gateway, S3, lambda, CloudFront, and all the permissions and everything is inside the Terraform module, which you can use very easily. And then there's sort of as a reference implementation, a PHP client library, which you can use to generate the image file names and push the images to S3, or take the image file names and generate like a scaled image version from it. Any questions? Was that sort of clear? Yeah. OK. My question is, in the lambda function, what is actually doing the scaling, this image magic? In this case, I used a library called sharp. It's at the lambda function is in JavaScript. I've used sharp, which is a bit faster than image magic, I think. It's quite nice, but you have to get a binary. All the node modules that have binary components, you need to compile it on an Amazon Linux, so it was a pain in the ass to get that. I think there's some tools to automate that. Lambda doesn't install or build dependencies, so I had to get an Amazon Linux compiled binary version of it, sharp. I guess that has a lot of libraries and stuff. All right. Any other questions, guys? Oh, one over there. Can I find you something else here? No, because there's not really an NPM module that you can use. You can find the PHP part on Composer Packages. The server side part is a Terraform module. If you go to the GitHub, you can see how you can include it in a Terraform project. But it's not something you can install with NPM. Cool. What about cloudinary? I don't know. What about it? It's like an image CDN thing with all this kind of scaling in the URL. Oh, yeah, I know. I haven't really looked at it, because this project has a lot of images. I think all the CDN services I've seen so far, the pricing was not really compressible with it. Yeah. I mean, here you pay S3. Is this an advantage of Amazon scaling, I think? I'm not aware of anything, actually. Yeah. Cool. Awesome. Thank you, Claudio. Thanks. Hey, guys, thank you very much. Shall we just take a five minute break to go for a piss and stretch your legs?