 This is the last talk of the day. Oh But it's okay because we have actual Tom Yes, I'm I am definitely sure that this is actually Tom Eastman Tom is a senior system software engineer. No got it wrong. No, what are you? What are you? Senior software systems engineer. I didn't even make it up. I like a gentleman programmer of leisure He has a real job now coordinates and he is going to be speaking today about the exquisite dangerous art of Safely handling user uploads. So let's make Tom feel welcome I'm sorry. I really like doing that. It's fun. Um, I guess I should start by explaining why I was inspired to give this talk So I'm not a hacker or or a penetration tester by trade. I'm a developer, right? Like like most of us here I build things and I like helping other people to build things. So my interest in computer security isn't born of wanting to be A hacker. It's kind of more born from a constant Crippling anxiety about screwing up and letting people down and so that's definitely not a healthiest healthy approach I think you'll agree, but it's it's kind of a motivating one at least right so So when it comes to web application development, I've always known that Handling user uploads is tricky But until recently I managed to kind of mitigate the risk by not bothering The most effective way possible basically avoid having to do it and or as many circumstances as possible More recently though I have been working on projects that do actually require, you know doing user uploaded files either because they have data ingest or They handle images of a certain type and I made a couple pretty unpleasant realizations kind of along the way so The first thing is getting it right is actually kind of harder than I expected most of my Web development experience is using Django actually All of it frankly all of it is Django Django's got a really good reputation when it comes to secure web application development You're well protected from the most common security pitfalls that you end up having to deal with It's object relational mapper makes it really difficult to write code that's susceptible to SQL injection It's template engine gives you really good strong auto escaping by default making it a lot harder to write code That's susceptible to cross-site scripting attacks not impossible because cross-site scripting is a really hard one to deal with but You're in pretty good shape by default and of course the built-in user authentication and session Management capabilities are really well designed in Django and they follow really good secure best practices So with Django the defaults keep you pretty safe right out of the gate You're shielded from the worst of the OWASP 10 the OWASP top 10 vulnerabilities So let me just do the thing that I often do which is Who in this room has heard of the OWASP top 10 and kind of knows their way around it as a web developer if you're not a web developer It's a little less Important, but it is interesting. So if you are a web developer and you're not familiar with the OWASP top 10 Google it as soon as you're done with this talk find the OWASP wiki read up on the top 10 OWASP top 10 if you don't if you're not aware of these security vulnerabilities Your reliability to your projects that you work on and this is fixable because you are educational people and you can learn How to protect your users? So and of course the top 10 isn't all you need to worry about but it's a good start and it'll make you a much better developer So anyway to my surprise when I was dealing with file uploads with Django I kind of discovered that Django's default settings can actually be problematic in some ways not necessarily insecure but geared towards Getting you up and running fast and not necessarily putting safety first So if you're a Django developer, this will look pretty familiar to you This is two lines from your settings file and a couple lines from a models file and in practice This is all you need to start doing file uploads in a Django web app The problem with this example is that behind the scenes the defaults expect you to be saving files directly to a location where they'll be hosted that media root URI for example is Expected to be hosted by the web server right away, and I'll be explaining in this talk why that's a terrible terrible idea So the other thing I realized was if you get it wrong the scope for damages eye-widening a Little bit of clever manipulation could see files saved to locations. You didn't expect Leading to exploitation of the server that is running on a misconfigured web server could be led to execute code included in user uploaded files A malicious file can cause programs parsing it or validating it on the server to crash or misbehave just straight up break the system and Finally and honestly most importantly a lack of care when handling user uploaded files could easily turn your own project into a platform for attacking Other sites and services and users It's fair to say that you have responsibility to make sure that your work can't be exploited to attack other people So I'm going to give you this afternoon a short list of concrete steps to help you solve a complex problem In order to explain why each step is a good idea I will need to give you some examples of just how badly it can go wrong and what you need to be protecting yourself from and I'm going to give you the last slide of my talk first because I'm a sucker for spoilers and I'm four seasons behind in Game of Thrones But I already know like everything that happens and this is not a mystery show so Here is my advice and you're welcome to you know, this is the last talk of the conference You're welcome to have a beer afterwards or you can stay and listen to why it's good advice Try not to play if you can avoid it Outsourcing this problem is fantastic Step one throw away the entire file name. The file name is not your friend Step two always store freshly uploaded files in some kind of quarantine zone That's not in the web route that is not being served up by the web server Step three always very carefully parse and verify the file once it's been uploaded to prove to yourself that it is what it needs to be and step four Don't keep that file copy the parts you care about into a new file that you'll then be hosting so I Gonna spend the rest of my time explaining why in order to explain steps one and two Getting rid of file names and storing them somewhere outside the web route I'm gonna explain something about web server software and about some of the assumptions that they've historically made about their threat model web servers are probably Think it's fair to say the most exposed pieces of software on the planet right now, right? They're hit by requests all day every day non-stop legitimate requests and maliciously crafted ones and Corrupt ones and ones from old software ones that aren't running HTTP to yet and older horrible HTTP 1.0 stuff By now web servers are very resilient to malicious input from outside Still scares me that this sort of key software is written in C, but But it's hardened C. They've been doing this for a long time. Here's the problem Web server software does expect you to be able to trust the files that it's serving They've been built with the assumption that any files that they're serving up to a web browser are probably there because you put them there and sometimes they're configured to Execute instructions in those files So this is like one of the first rules of computer security, right? And it's it's almost it's almost Latently obvious if an attacker can upload code and get your computer to run their code they win, right? It's over. You're done This is the key premise so What are some examples of code that lives inside the web route something that Apache or nginx or your web server would serve up that Is executed by the web server when it's served anybody? PHP right No, you haven't have you But this is the most obvious one. This is the obvious first call for Complaining about this sort of thing, but it's not the only one Apache server side includes it's like a built-in template language for the Apache web server CGI scripts the old-fashioned, you know web 1.0 way of doing dynamic web apps where a program is actually just run every time The page is requested and the output of the program is returned to the web browser Active server pages, which is like PHP if you're in the windows world it is a Programming language template templating language that sits inside the web route and is served up HT access and configuration directives HT access in Apache which actually changes the configuration of the web server depending on what's being served and Other web browser other web server software has similar kinds of configuration files sitting inside the web route and then you've got whatever editions you're actually using in your web Server like mod ruby mod pearl mod Python, etc So there's a lot of code that might actually happen If your web service can figure to treat any of these files as special and an attacker can successfully upload any of these files Then they might have a way into your system Now I'm gonna in this talk I'm gonna be using PHP as an example a reasonable amount, but this like I just pointed out is not a PHP only issue It's just that it's ubiquity combined with its execution model makes it a really common risk factor Especially in sort of shared hosting environments and well, you know all over the place. It's very pervasive So let's say you have PHP installed either intentionally or unintentionally, which I'll talk about in a moment Most default Apache configurations including on Debian and Ubuntu How many of us do use Debian and Ubuntu as their sort of publishing platform when they're doing web development? Yeah, I mean again, like most of what I'm talking about is common to a lot of stuff But these are the systems that I work with so Okay The default configuration in Debian and Ubuntu will execute any files requested with the PHP with the .php file extension As if they were part of your program so What do you do? You might assume that you need to block any files with the extension .php from being uploaded and saved to the web root Right, I'll spoil it. Yeah, you're right. You need to block those from being run Trouble is that same default configuration and this is the the locked-down configuration that comes by default on Debian and Ubuntu and Probably red hat, but I didn't actually ever get around to checking Also runs as PHP any files with these extensions PHP 3 PHP 4 PHP 5 PHP 6 PHP 7 PHP PHP s and PHP now and that's So make sure that your upload checker is blocking all those extensions That's the standard conservative configuration provided by distributions and as you might have heard the internet is full of bad advice and So depending on what PHP tutorial you or your administrator or your ops team kind of followed when they were learning about PHP and setting it up They might have used this snippet of configuration code when setting up their Apache Web server Who's seen this sort of thing before in Apache, right? This is really common add handler tell it It's a PHP script and look for that extension It looks really innocent, but it has this wonderful interesting bonus feature that it doesn't mind if there are multiple extensions on the file So Now these count as PHP as well File name dot PHP dot jpeg file name dot PHP dot gif file name dot PHP dot text dot jpeg dot gif File name dot PHP dot whatever I want to type here dot text Long story short if your application code is checking file extensions to decide if a file can be trusted you could be in a world of pain It's not gonna be enough What if you're not even using PHP? Like I said, it's not a PHP only problem But you'd actually be surprised how often PHP ends up being installed and configured on servers that aren't even using it Again, if you're in like the Debian Ubuntu world and you've installed your server and you've gone through the server installation And you've gone to the task cell screen and you've selected lamp stack or you know web application stack for your For your server install then it will install PHP with that default configuration ready to go and Certainly even in Python shops, they're often using some kind of PHP as well on their web servers They might be using it for monitoring for Nagios for And I guess is is CGI isn't it but you might you might have a wordpress site on the side or something Your servers may well have it on there without you realizing So these risk factors apply for any system where the web server is configured to execute code and files that are contained within the document route and As if that wasn't enough reason for you to never trust the file name or the extension of an uploaded file Here's seven more File names can do weird things with case like are you sure that your file name checker is Case insensitive or case sensitive enough to make sure that it catches those file names can do other funny things that fool The execution system of the web server this one in particular used to work on older versions of IIS So a file might be uploaded and your code would see dot jpeg and then when it got saved to disk It would be file dot ASP and ASP is just like PHP. It'll be executed on the way back out What if your file name has dot dot slash in it? Hey, you're fine. You're fine with Django in this case But this will trip things up and even if you're using Django, which does I think check for that on a reasonably strenuous basis What if it has a mixture of back slashes and forward slashes? Who knows all of the subtleties of Python's handling of forward slashes and back slashes mixed together in a windows environment Where I think it actually allows both to be sort of easily platform compatible You could find yourself doing fun things Right on windows you still have those historic old, you know tilled one file name things if the file doesn't match that 8.3 Syntax from the old pre windows 95 days You could upload a file like that and it might actually overwrite the file called web config dot con Which might actually know sorry web dot config is the one that that one comes from all these examples By the way are from the OWASP wiki. So again, please This talk is basically OWASP wiki as interpretive dance. So Read up that site. It's good stuff Files could have any mixture of single or double quotes in them that could be hilarious when you're shelling out to some other program And the poison null byte which is a hilarious one so if you're passing this to some sort of Backend function that might be exposed in Python that actually implemented in C Are you sure that that's gonna handle the null byte in the C code correctly this one traps This one used to catch out PHP stuff all the time. So PHP has a lot of file name strings that get passed to Back in C functions and where's that file gonna get saved in that case And finally You can't trust these file names in the first place because your users screw them up. Okay, most of the time all of us sort of understand the hidden semantic value of File name extensions, but but lay people don't right like it's it's a picture. It's it's a picture of my grandkids It's whatever. It's it's just what happens you you can't trust that they are what they say they are even under the best of times You've got no choice. You got to throw the file name away You can you can keep it like keep it in the database, but treat it like untrusted input and don't use it for a file name If you're saving the file to disk save it as an anonymous lump of data Maybe with a file name generated from a hash of the contents of the file You know just something that you can get you can find your way back to but you don't have to Rely on the file name as a file name if you're saving it to an object store like s3 do the same thing and Storing files on s3 is actually a really good idea. Anyway, because it keeps it away from your file system It it closes off that whole potential attack vector in the first place It's always nice to have your code and your data kind of as far away from each other as possible And it's advantageous because if you throw away the entire file name It forces you into steps two and three of my of my four-point safety plan so You've thrown away the file name We are now up to Those two you've removed from yourself the temptation of trusting it to tell you anything useful So now you actually have to for you you're actually forced to look inside the file To prove that it contains what you're expecting if this is a data ingest and you're expecting a CSV file or something You actually have to read the file now to see what's inside it if you're expecting an image file You actually have to use an image file parser and prove that it opens successfully So you have to actually guarantee that the file is of a type that you were expecting so you're totally safe now no No, you're really not Reading and parsing potentially malicious files is a dangerous game And you actually don't really want to be doing it But if you're gonna be serving these files you you have to because the alternative is throwing your users under the bus It is your job to protect them from your website. It is not the other way around The things that can go wrong when you're parsing someone else's files are basically as myriad as the types of file that exist in the world so That would make this talk a little too long So here's a couple quick examples a file could straight up contain a virus right They could just upload a file with a virus in it a compressed file like an image file or a zip file or anything else Could be crafted to blow out your memory could be a You know a zip file which has five terabytes of zeros in them Which which I swear I never emailed my friends when when I was younger and I'm sure no one in here did No, everyone everyone sort of won't make eye contact suddenly if You're uploading XML and XML based file can have all kinds of really nasty things going on about it I think there was a talk by not Tommy Stamman about that on on on Friday and Actually, if you're handling zip files Then you have all of the above problems all over again, right? Because every single file that's in that zip file could have its own surprises waiting for you So it's it's it's not a fun game The advice I can give you here is kind of limited because what you need to do Depends a lot on your use case and what kind of files you're expecting you need to be aware of the fail modes of whatever parsers You're using to read uploads so with things like XML files You have the problem that that as was pointed out They're essentially programming languages and the parser is essentially an interpreter you need to turn off all the features of the parsers that are dangerous on untrusted input and in Python the short the short answer to that is use diffused XML if you're dealing with untrusted XML or XML that could have come from any surprising source We actually have a good solution for that here in the Python ecosystem. It's called diffused XML Google it. It's cool But other other file types are harder with images. It's really important to keep your systems up to date and patched because we have a Long long history of really terrible security bugs being found in image parsing libraries. So speaking of which Who who knows what image magic is who's unfamiliar with image magic? Just um cool Okay, so image magic is a ubiquitous suite of image manipulation tools It's a whole lot of command-line tools for annotating images rescaling images identifying image file types. It's a very useful suite It's very scriptable. So it's been in the Unix world for a really long time a couple months ago a large number of critical security issues were publicly disclosed and patched in the image magic library and These bugs were like worst nightmare scenario bugs anyone using image magic to handle an untrusted image was vulnerable to these and it was Trivially easy to x exploit. So here's an example That would trigger a shell execution on your server like you send this file to anyone using image magic and you get to own their computer from now on and this is the whole file and this is just a graphics format and It has a URI in it that URI got Used by I think it got shelled out to curl and it didn't have any kind of path checking on it So you can just start putting shell commands in right here That's all it took no buffer overflows no crazy see stack smashing or or Clever social engineering or anything. It's as anti-climatic as it is devastating like that's all it took Anyone who was behind on their security patching could end up being vulnerable to Something like this something embarrassingly destructive and Antivirus software has a really bad track record with this stuff too. Is anyone familiar with the name Tavis Ormandy? Tavis Ormandy is a researcher for Google. He works on Google's project zero And he has I guess what must be a dream job for someone like him where he Hacks non-Google software and open-source software and he calls up those companies and explains them how to fix it So Google is spending a lot of resources in making the internet safer for everybody by hiring some very dangerous people Fun fact fun fact excuse me Tavis found a bunch of severe security vulnerabilities and semantic antivirus a little while ago And he emailed them to warn them about it Unfortunately, Symantec uses Symantec to scan all attachments to to their in to their corporate email their entire email infrastructure crashed when he emailed them So look like antivirus is actually necessary. Sometimes sometimes it's a requirement Sometimes you have to be scanning if you're required to run antivirus software. My recommendation is run it somewhere else so an option might be Have a have your antivirus software on a computer where your file upload quarantine is mounted as a read-only As a read-only Network file system or something or if it's not confidential stuff. You're being uploaded There's actually a lot of third-party cloud scanning services that might be really appropriate assuming. It's not confidential information You don't mind where it goes But get it away from your application code and get it away from vulnerable things If that doesn't scare you about opening up and looking at random files that have been uploaded to I'm not sure what will But like I said if you're receiving files from the internet, this is your job, right? You you have no choice. You can't make your users do this. So remember step zero, right? The only winning move is not to play This must be the point where you're sort of starting to think that maybe maybe using a third-party service like Gravitar or early bravitar to handle those profile pictures instead of just letting people upload images to your site willy-nilly Keep your tools up to date Keep your security patches current and keep your parsers and file ingest mechanisms conservative paranoid and stupid Right. Let make them throw away anything that doesn't look right. I Go through this hold talk and I usually forget to mention that you know if someone's uploading a profile photo They could just upload a DVD image, right? What if they just upload five gigabytes of stuff at you? There's there's a lot of there's a lot of little things that you just want to make sure they just can't mess with you For bonus points again, I'm still sort of talking in debbie and in Ubuntu use app armor profiles Or if you're in the red hat world use se linux use some of these new security mechanisms that we actually have access to these days They might qualify as a bit of an extra credits assignment App armor is a Linux security technology that restricts the program's capabilities so it's functionally for example You would create an app armor profile that says this program is allowed to open, you know It's shared libraries its requirements and it's allowed to look at files in my upload quarantine But if it tries to open a network connection or start a sub shell kill it So going back to that image magic example that we had a moment ago a well-crafted app armor profile would have actually probably preemptively protected you from the worst that that could that that could do because You would set up a profile that says this is a program that looks at images It doesn't need to shell out and run LS. It doesn't need to run commands. So if it does do that Notify me and kill it. So the second image magic behaves in an unexpected way Process would be killed a warning would be generated in your log files, which maybe you'll even read one day It's cool stuff. I really like this stuff and it's worth learning about if you're a Paranoid systems geek like I'm hopefully hopefully turning you all into right now Okay, so where were we you're all very sleepy. Where are we? We are up to step four. Cool. I'm sorry. I'm like the last person keeping everyone from beer aren't I Sorry Okay The idea here is that you never serve the original uploaded file you built a new one yourself that isn't a straight copy of it The most simple example that demonstrates what I mean is image files again Once you validated that it is actually a jpeg that you've received Don't just put it in place for a re-hosting instead create a new jpeg with the same picture in it Either either by re-encoding the file or rescaling the image Which you know, you're probably gonna do anyway, right? Like if you're dealing with image files This is usually a no-brainer because you're probably gonna generate thumbnails and scaled versions of the file Or maybe you maybe convert the file to another format completely There's two goals that you kind of want to achieve when you're doing this You guarantee that you're the ones you're hosting files that you built You know that it has the data you need and That you want to present to your users and you're a lot safer with the assumption that it doesn't have any unpleasant surprises Sorry yep, and secondly you've successfully thrown away any data that you didn't need or didn't care about because your parser Didn't even notice it so jpeg files also provide a good example here by tampering with the image yourself You might break any last malicious content that might have been in there like You know image magic isn't the only thing that's been hit with security bugs and image libraries both Android and Apple have been hit with really bad ones within the last year and By tampering with the image yourself you might be protecting users on mobile devices or you might be protecting users who don't have their own Who haven't been as good with security patches as you have been So forget about exploits for a second. It's also just good data hygiene So I like imagine your user is uploading a picture and that picture contains an exif header And that header includes say like the GPS coordinates of the location the picture was taken That's pretty common. Most of our phones embed GPS coordinates in photos But it's a potential data leak that your user possibly neither expected nor desired, right? So if you don't care about that data if there's no compelling reason for the hosted image to have it It should be removed as a routine part of your upload processing This is a fun example this photo Was published on Google plus two months ago to demonstrate the new phone the new Huawei P9 smartphone Which I've seen billboards for all over the place. That is a beautiful photo from a from a smartphone Now Google plus Preserved all of the exif data it had it had all of the all of the exif tags on it so because of that It might not necessarily have been the Phone that took the photo if you can't read that at the back that says Canon eos 5d mark 3 Which is actually a really great camera So always be on the side of your users discretion don't don't be like Google plus So we're back where we started I Hope what I managed to do is give you some food for thought and some ideas on how to make both yourself And your users safer so Write beautiful code be careful out there and and and always be on the lookout for new ways you can disappoint bad people Cool. Thank you very much for your time We have time for one question. Oh one There's a short person up the back Thank you So in terms of patching software, I've taken particular delight in looking at specific CVE's and I do and finding they don't patch Get patched in vendor distributions because they're user space How do you look at those types of libraries and get them patched because package managers don't always pick up those security fixes when you say vendor libraries Are you sorry you're referring to The just not the distributions, but so for example a particular database driver within a language is Often a separate package within a distribution. Gotcha. And because those are separate packages, they're not necessarily Caught the distribution. So I found them sort of marketers Will not fix. Oh, yeah, so so just to give you just to give you an example of this Like Ubuntu has a very wide package selection in their main repository in in their repositories But their repositories are separated between main which they guarantee support for and universe which is called community supported and Community supported as I understand it means. Let's hope Debbie and patches it now You've got to maintain some vigilance on that. So I always recommend using you know supported Operating systems and supported LTS versions of things, but that's not a that's not always enough. You actually need to know if Your package is being supported by someone else, but assume it's not, you know, this is this is a process This is an ops process that is actually getting a lot worse in the world of Docker and Users and developers supplied things. There's a gap that needs to be bridged there And there are tools slowly filling that space Ubuntu has for example a tool, which I can't remember the name of but we'll tweet it out it is for Looking at the support lifetime of every Ubuntu package you have installed so at least in their metadata for their packages now They'll tell you this is an LTS one and it has secure support for like two years or this is a universe package I got caught out by this recently because WordPress is not in Maine Right, so if you're using WordPress and I always I always think it's a better idea to use Distribution-supported packages, but WordPress isn't one it's there. You can go apt-get install WordPress, but it's not in the main Distribution on Ubuntu, so you might not get timely security updates with it You need to Make sure you have some defined process for inventoring inventor inventorizing keeping track of the crap you're using And yeah, watch out for CVE's Someone has to have that job and all you need to do is go home to your company Go home to your project ask who has that job and if no one has that job you need to have a conversation about it Thank you, and that's all the time. We have let's thank Tom again