 Good evening. I am Manish and I work for Autodesk. I work for Autodesk and today we are going to talk about how to optimize your S3 usage or how we can detect there is anomaly in your S3 usage and how we can save costs. So this is based on my experience with a previous startup. Even though I work for Autodesk, I do not work for AWS at Autodesk. I mean I do not use any of AWS services at Autodesk. So let us begin the agenda. So we will talk only three things here. We will talk what is S3 and what is the AWS building pattern and then we will see why do we need to understand how what is the S3 data transfer cost, why does it matters to us. And last is how do we have generate S3 analytics so that we can actually optimize the cost. So let us come to what is S3. So those who have not used S3 or those who do not know what S3 is, it stands for simple storage service. So basically you can actually dump all your files into the S3 storage and you can give access to your services to those files. So that said next coming to AWS billing. AWS billing actually generates monthly. So based on the usage like use S3 EC2 and other things based on that the AWS billing comes. So let me show you how actually AWS billing is. Billing looks like. So generally the AWS billing looks like something like this. So it gives you a breakup of all the different services that you have used and what is the costing for each of the services that you have consumed. And if you can see here my data transfer cost for this particular month is around $31. It is for the month of July 2017. So this particular service is basically consumed by a startup in US. They are having an e-learning platform. So where they have all the e-learning content over the AWS cloud. And what happens is they were deployed this for almost close to 5 years now and they used to get average monthly bill of around $200 to $250 within this range. So what happened suddenly in the month of August 2017 onwards the bill shot up to $600 almost 3 times. So if you can see the billing here the data transfer shot up to $377. It is almost from close to $31 it almost shot to $377. So that means it is almost like more than 20 times or 30 times. And if you see the billing breakup you see that there is a data transfer this particular cost. So this is almost close to 4 terabytes of the data has been transferred. Usually the data transfer is around 500 GB per month. But what happened from August 2017 onwards is shot up to 4 terabytes and the bill shot up exponentially. And we do not know what was actually causing this. During the month of August they deployed two of the new features. One is they moved their existing platform from flash based e-learning tool to HTML5 based tool. So what happened? And apart from that they added few more help videos into the platform. So they were not sure what was actually causing this shoot up in the S3 usage. And if you see the bill you really cannot figure out what is actually causing this. So that actually with that actually we now move to the next section next section. Why do we need to understand the S3 data transfer cost? So we need to understand like S3 has two different kinds of costing. One is the amount of storage space you use in S3 that actually co-costs you. And the amount of data transfer happens between your S3 and your client over is requesting the service. So that is also billed. So the first part you know you are using 1 GB of your S3 storage so you are billed this much. But the second part there is no clear break up how much of data transfer on which file is getting downloaded and why is it causing the spike in the this thing. So we really cannot pinpoint this is the particular file which is being downloaded by the users which is causing the spike in this billing. So now we need to see go to the next final step. How do we try to figure out who is the culprit file which is actually causing this all this spike in the cost. So if I come jump to the bill as I shown like there is no really detail even if you go to cost explorer or something you cannot really figure out what is actually causing this issue. So let me actually show that this is the application this is the e-learning platform they have and they have this they in the month of august they actually have this once you log in you can actually see this getting started screen. This is a very basic help they added some five videos to help you with the usage of the app and then they moved center platform from flash base to HTML5. Now this is what these are the two simple changes they did but they do not really do not know what was actually causing the issue. So one of the developers they said it might be because we moved from flash to HTML5 flash might be caching the videos and HTML5 is not caching the videos so it might be spiking up every time the user logs in he is actually seeing the videos afresh from the server. Then what we did was they contacted me they asked me whether it can help me out on this. So I tried to figure out what was the real thing which is actually causing this spike. So what I did I tried to google up and see what is the different options we have. So there is something called if you go to S3 we have something called so if you have a bucket so these are the various S3 bucket. So you have the bucket and you can actually enable analytics on that. You have something called server access logging you can by default this is switched off so you need to actually enable this. So once you enable this server access logging you can actually as a AWS S3 starts logging each and every request that is made to the S3. So you can actually the logs are kind of stored here since it creates a directory called logs and for each request it actually creates one simple text file. One text file for each request it will store the IP address and what is the time and how much data was all the different parameters of the request. But this actually does not make sense to anyone it is a very flat file nobody can actually figure out the make meaningful sense of this data. So what we what I try to do was I try to figure out probably maybe I can write a some kind of a small server script which actually can parse all this but then that also did not figure helped me out. So I thought maybe there might be some third party providers who can help me out with understanding this all this logs. So I googled up there are many third party small startups which are actually specifically meant to enable you to access parse this access logs and understand what is exactly happening. So one of the things the one of the tools that I used was it is called S3 stat. So AWS is actually enabling some of the different markets they do not they are not providing one service because of that there is a startup called S3 stat which is just providing only analytics for that. So you just sign up here once you sign up you actually can you need to actually sign up and you have to give access to your S3 database read write access to your S3 database. So what basically these guys do is they go and first also logs that S3 is generating and then they generate give you this nice beautiful report. So this is actually much better than understanding all those log files. So here I can actually see if you see this log there is only this four or five files which are actually causing all those data transfer. This is a very one file which is accessed almost 200,000 times and it is almost close to 900 GB almost close to 1 terabyte and that is costing 87 dollars. So close to only five files are the one which is actually causing all this spike in billing. So this with this you can actually now you know who is this real culprit which is a file which is actually to be blamed. Now that you know which is the file which is causing now from here on it is up to you how do you want to optimize. So one of the we had two options. So these were a high definition videos. So if you either you can reduce the video definition compress it maybe instead of high definition maybe you can go for 720p or what I did was since these were help files they were not of much importance I just moved all those files to YouTube and then I gave access I linked those YouTube videos here in the app. So instead of actually going through S3 I am actually pulling it from YouTube. So this very simple now YouTube will be built for all those everything. So it is very simple solution how the owner of the company is saved almost close to 300 to 400 dollars every month because of one simple change just moving from S3 to but this is not a practical solution I mean it all depends on the use case basis. In this particular use case it was very relevant I could afford to do this but maybe if you have really paid content so based on your use case at least you know you have data to take decisions if you have the data you can actually take decision if it really not possible I mean those videos are really paid content that you cannot put it in a public access then you cannot do anything that then you have to pay for that. So in my case it was a really simple solution at least I found out this five culprits put it on YouTube and YouTube was paying for the 400 dollars whatever. So with this we come to know that like how do we optimize S3 analytics so what I mean to say that you should always have analytics enabled especially I deployed a update to one of our software on the S3 and there was some error which made that S3 usage into petabytes and Amazon made very big money out of that. So maybe if you do a mistake you can it can ruin your business so S3 Amazon services are really good but you need to know what exactly you are you need to have data to your fingertips to take decisions. So if you have data you can take decisions and you can optimize your business according to that. So with that I think I end my talk. Any questions? Cloud friend basically this is a small startup so if we do not want to use too many of the services then it adds to the development cost who knows all those things. So they were using S3 already so they just wanted to dump it there and this is just four or five new additional help videos they could have that that. Maybe for a larger organization it actually makes sense okay you can hire a bunch of engineers one for cloud or cloud front other for S3 other for RDS and whatever yeah. This is a very you can see the billing it is 200 to 300 dollars it is not a big startup they are making quite small decent amount of money but yeah we have to be aware of all the cost and all those things. How about using reduced redundancy? Reduced redundancy like how? Okay so S3 has three storage yeah sorry S3 has for the better the group. S3 has three storage your standard and then you have infrequent access and very short but in between infrequent access and your standard what we call reduced redundancy. So you are losing one or two nines out of the availability of 11 nines so availability is still there but the cost is almost half almost near price. Yeah that actually there is also one of the option here in the two we are actually using US east north Virginia region only that I am not even using the global one I am just using one region even with that the cost was too high I mean that difference of 400 dollars actually may makes a big difference for a small startup. Sorry I correct myself because you were talking about transfer cost. Yeah yeah yeah. Any more questions? Yeah if the practical solution would be probably you will have to compress your video or enable HTML5 caching and HTML5 with the latest HTML5 there is also an option that you can actually store that video locally on the users machine so that if once it is downloaded the second time it does not have to go to the S3 server it has to directly take it from the user machine. Additionally you can enable the cloud front. So what it does this video is static contents so once the cloud formation cache it is always there so every time yeah. For internal organization compliance training Yes sir. It is from route 53 for example so it does not go to the cloud front and there it goes in house video with the out there. But route 53 can also directly route to the cloud front. Yeah it is an external error. We do not want to expose directly the videos inside the bucket so it has to go to the cloud front. Yeah so one option we had was using HTML5 had a very good caching option wherein you can actually download that video to the browser cache and keep it there alive and for a long time. Sir can I ask you can configure a street bucket as a private bucket but have cloud front pulled from that one so you are only exposing the cloud front end point not the S3 end point. And that will save your password calls as well because then the object will be capped to the edge locations then your user will just pull it out from the last point and from cloud front. Okay. There is no cost to enable analytics? Yeah analytics that server logging is free so it is just an option just to tick it and it starts logging. Again the cost would be like your logs keeps accumulating then your data storage space will be accumulated maybe that will be built but that will be very less it will be really less than a dollar. Okay thank you.