 And you're live. All right. Welcome everyone to Secure Coding for Developers. Darian's going to give us a presentation about Secure Coding. So go for it, Darian. Cool. Hey, everyone. My name's Darian. You can't see me right now because my slides are up, but I'm an application security engineer here with the Wikimedia Foundation. Relatively new as of a few months ago. And I'm going to be talking about Secure Coding for Idiolucky Developers. So moving into the presentation, this slide gives a breakdown of ratios of vulnerabilities that we see in our code base. And this is based on bug reports that get submitted to Fabricator. We've omitted the actual numbers, the actual counts for all of these vulnerabilities in order to not provide too much information to possible attackers that they could use to sort of footprint our environment. But we can see from this slide that XSS is what we see most often followed by sensitive data exposure, cross-site request forgery, and injection of vulnerabilities. And so this talk is going to focus on those four vulnerability classes. And we will move right into cross-site scripting. So cross-site scripting occurs when an attacker is able to inject client-side code into a page. Client-side code can be anything from JavaScript, HTML, CSS. It could be using HTML injection to embed some sort of flash object, something, anything of this nature, anything that you can inject into the client-side, anything that runs on the client-side is considered as a possible target for cross-site scripting. And it results in the attacker taking over the user's browser. And in most severe cases, they could make requests on the user's behalf, steal session cookies, things of this nature, or read page content. So there are three types of XSS. The first we're looking at here is reflected XSS. So there's a snippet of PHP code on that top line. And it is receiving search term from a GET parameter. You'll see just below that the format of the URL that an attacker might submit. And basically, the attacker is passing a JavaScript tag that will execute whatever the attacker chooses. Here is just a proof of concept. So we're just alerting XSS. And so when the attacker makes that request or induces a user to make that request rather, the alert box will pop up indicating that the attack has landed properly. Normally, an attacker would test this out themselves, and then they would go back and craft something that's actually more nefarious, more invasive, that's meant to take advantage of the structure of the page and any sort of session tokens that might be vulnerable to that. The next type of XSS is stored XSS. So in the previous slide, we saw that an attacker would actually have to induce a user to follow the URL that's listed on that second line. And in stored XSS, the attacker, using some other vulnerability in the web application, is able to store malicious content in a database or some other persistent means of storage and induce a user to view a page that sources information from that persistent data store. In this example, the attacker has stored malicious code in a database table. That table is then queried, and the information is inserted into HTML content that's returned to the browser. So there's actually two syncs here for XSS. The first is article ID, and the second is article title. So the value that the attacker would store in the database, the two values are listed below. For ID, the attacker would have to close the, actually there's a typo there, I'll fix that afterwards, but the attacker would have to escape from the attribute context for href. And so there would need to be the apostrophe there followed by an angle bracket, which is missing. That would be the less than sign, which would close out the tag. I'll just fix that here so that you can see that. So followed by that angle bracket, which will place us in a context in which HTML tags can be interpreted in the browser. And there, the script alert will be injected and run. Again, indicating to the attacker that they have successfully landed the XSS. And at that point, they may go back and modify that database entry or create another database entry, which contains more, again, more malicious content. And then a similar sync is also in this example for title. This, there's no need to escape out of context in any way or escape into another context. The attacker is already in the appropriate context. And so they can simply inject the script tag and have that execute in the browser. The third form of XSS is DOM-based XSS. This occurs entirely client-side and occurs when the attacker is able to take advantage of some JavaScript that's in the page to inject content of their choosing. And here, you'll see that there's an HTML snippet. There's a document write indication, which is JavaScript. And it is referencing location.hash. Location.hash is pulled from the browser. I'm sorry, pulled from the URL. So quite frequently, you'll see a page that allows you to navigate among several headings in the page, which are referenced using a hash value. So document write is being used to create a anchor tag in the page that will, in this example, allow the user to go to the next section. Unfortunately, location.hash is being used for the identifier for that next section. And the attacker can induce a user to visit example.com slash food.html, pound or hash, and then that following value. And here, we're escaping out of that attribute context and injecting a script tag again, which we'll execute in the context of the user's browser. So best practices to protect against XSS. The two key takeaways for this are to validate input and escape output. And output escaping is context specific based on whether you are placing output in HTML or, excuse me, in HTML outside of a tag, in HTML in an attribute, in the JavaScript context, in CSS for some reason, something like this. And there are a bunch of resources on the last slide that talk about specific mechanisms that you can use to escape content that's being inserted by developers into each of those areas. Be very careful about using input from cookies, databases, et cetera, input that is stored external to your application. It's best to always validate that input when you're pulling stuff from the database or, say, using content of cookies that have something other than session identifiers or using the refer value, something like this. All input should be validated to verify that it contains exactly what you're expecting, that it matches a very clearly specified format. And that's what we check for when we do security reviews. We look for input. And so we look for sources of input and how they're being used to ensure that input being validated. And this doesn't just apply to cross-site scripting. Many other vulnerabilities can be mitigated or as a frontline defense using input validation. Next, use HTML and XML classes which are built into media looking and know which functions in those classes will escape or encode output and which don't. And escaping and encoding are, when I mentioned escape output, escaping and encoding are interchangeable names for converting sensitive characters into a format that reduces their efficacy. It basically keeps them from running successfully in the browser. Templating can do this for you sometimes. But you need to ensure that you are either using template parameters that have escaping enabled by default or that you are invoking them in a manner which enables the escaping that's built into the templating system. And then you want to escape as close to the output as possible. And the reason for doing this is, let's say you have an application that pulls some data from the database, like in that previous example, does some conversion, some manipulation, and then outputs that data to the browser. Your initial iteration of that software application might be fairly simple. And let's say you append a value, and then you print directly to the browser. And you've wrapped your print statement in, or I'm sorry, your print statement is wrapping HTML special charts, the PHP method. Down the road, you might add some other things. You might take that HTML special charts out of the print statement and do some other things to manipulate that code. And another source of XSS might be accidentally added in that section. So it's very important. And another thing that we look for during security reviews to ensure that escaping occurs as close to output as possible. And a best practice is to use a method for outputting content that prints and also escapes, and then use that consistently everywhere that's easily auditable by you as a developer and by anyone who's reviewing the code from a functional perspective and from a security perspective. In JavaScript, use create element, set attribute, and append child rather than document write, or simple string concatenation. Avoid HTML, enter HTML, and document href. Document href is similar to that location hash that we saw in the previous example, where it was pulling a value from the URL. Avoid inserting untrusted data in jQuery selectors. jQuery will inflate any HTML content, any HTML tags that is placed in the selector into an actual DOM object, which will run in the browser. So you want to validate any input before it's used as part of a selector string in jQuery. Keep in mind that HTML parsing converts entities. So that HTML special charge method that I mentioned previously converts sensitive characters, the angle brackets, double quotes, single quotes, et cetera, into HTML entities. HTML parsing will convert those back into regular characters. And if you are passing content through a parser, it's important to recognize that that output's going to be unescaped and that you'll need to ensure that you're escaping it again before it's output to the browser. And then keep in mind the DOM context where you're writing out user-controlled data. So again, use appropriate escaping for cases where you're writing content to HTML, or a JavaScript context, or an attribute context. So one form of prevention is to use MediaWik's built-in output functions here. So what I'm doing is assembling an array of attributes that are going to be added to an HTML element, an input element. And then at the very last line, I'm using HTML, quote, unquote, an element to actually create that and store it into the label variable. And this function will handle your entity encoding for you. Here, we're using XML open element and XML close element to create elements in a page to avoid string concatenation. And then here's some further discussion in various HTML contexts. When you are injecting content, or when you're inserting content rather into the HTML body, you need to prevent tag creation when inserting into attribute names. Prevent JavaScript handlers from being used. And you also want to ensure that values that are added cannot escape from whatever sort of quoting you're using. So you want to ensure that you're using consistent quoting, double quotes, single quotes, and then ensure that you are validating input and escaping appropriately, such that your delimiting quote type cannot be part of the value, which would terminate your attribute context and allow an attacker to add a JavaScript handler, like onMouseOver or onClick. For URL attributes, don't allow JavaScript colon or data colon targets. Both of those can be syncs for XSS. So in our href example, we don't really have one. But let's say if we had an anchor tag similar to this example and we were inserting a value into the href attribute, if an attacker was able to gain control over that value, they could inject JavaScript colon alert XSS, which basically have the same effect as the example that we see here. In CSS contacts, normalize and also invalidate your input to ensure that no script is being added into your CSS. And we recommend not writing user-controlled values into JavaScript. Estaping here is very tricky. There are multiple contexts. And the XSS prevention cheat sheet from OWAS actually goes into more discussion there. And that's linked from our cross-site scripting page on medialicky.org, which I've linked to from the last page of the slides. So as you're coding, each time you test, each time you feel like you've reached a milestone in development, you want to review your code, start at the output, trace variables back to the source, and verify that all sources of output are being escaped prior to reaching the browser. And again, it's best to use some function that handles both printing and entity escaping. And then test your input fields. Here's a string that you can copy from the slides and paste into input fields that will help you to verify that you are escaping properly. If you can use this on code that you're not super familiar with, just as a simple first method to verify that entity escaping is in place appropriately. And then there are also a few headers that you can set to aid in preventing cross-site scripting. So it's important to specify an appropriate content type. There have been a number of browser-specific cross-site scripting vulnerabilities that take advantage of incorrect content types being set. X frame options deny prevents your application from being framed. X content type options, no SNF, and XSS protection one are both browser-specific headers that will keep client-side script from running in certain contexts. And content security policy is a relatively new specification that allows a developer web application admin, et cetera, to specify what sort of content is allowed to run and from which sources. So it allows you to disallow inline JavaScript in a page from running, and instead only allow JavaScript that's encoded, or I'm sorry, that's saved into separate files and loaded into the page. And I believe that's it for XSS. So moving into the next class of vulnerability, cross-site request forgery. And before I start there, are there any questions from IRC or from the Hangout? OK, so what was that? Sorry, I know. So cross-site request forgery. This occurs when an attacker is able to force web requests to originate from a user's browsing session. And because of how the web works, because of how HTTP works, if a user is authenticated to a site that an attacker has forced them to make a web request to, the cookies that relate that user's session to that website will be sent along with the request, causing that request to appear to have been run as the user, even when it was actually instantiated from another website. So in our first example, we assume that there's a page on example.com, situated at food.html. And the attacker has control over that page. And they are assuming that users who are logged into English Wikipedia are going to visit that page. And what the attacker will do is embed an image. And that image is actually going to take some action on English Wikipedia. So as soon as the page loads, the browser will attempt to load an image from the location that is listed here in the example in the source attribute. That, of course, isn't an image, but the request will still be made. And an action will be taken on the target site, which is English Wikipedia. And that will run as the user who has loaded this page. And they'll end up seeing a broken image tag. But it may not necessarily clue them into something happening. And in fact, this image tag could be rendered with display hidden CSS. So it would happen completely in the background. The user wouldn't even notice it unless they're monitoring the network requests. A more malicious example, possibly, is submission of a form in the background invisibly. Here we're using display none, as I mentioned, could be used here. But there's a form called wiki edit. And it's targeted for hidden frame, which is an iframe with display none, so that the user doesn't see anything happening. On page load, some JavaScript is used to submit this form. And the output of the form will be rendered, quote, unquote, in the hidden frame. But the user won't actually see anything there, because its display is set to none. And again, the session cookie, all cookies that are related to en.wikipedia.org, will be sent along with this request. And the action will be taken as that user. So here, the user would actually make an edit to a given article. So the way to protect against this is to add a random token to HTML forms and then verify that that token has been submitted when the form is submitted to the server. So here, we would actually add a hidden field that contains some value, which is never stored in a cookie, but instead is only stored in the page and is stored on the back end in the whatever session store is being used when to store user information. And that would be related to the cookie that's stored in the browser. So the two things would work hand in hand to verify that the user has actually loaded a form from the website themselves, rather than an attacker having crafted the form. MediaWiki has a built-in class called htmlform, which handles this for you automatically. So you would create a form that contains a field that you wish to submit from whatever page has been loaded. And transparently, as it outputs the form, it will add the token to the form in the hidden field. And at form submission, it will verify that when you're parsing the form values and retrieving those. If doing it manually, you can use edit tokens, which are the same things that htmlform uses. So here, this example is assuming that a form has been submitted. You would use requestGetVal to retrieve wpedit token, which contains the edit token value, and then use user match edit token to verify that the token matches. If so, then you can move forward with whatever processing will happen subsequently. And that's essentially it for CSRF protection in MediaWiki. It's fairly simple to do. And as long as you stick with built-in methods, you should be protected. So moving into SQL injection, SQL injection occurs when an attacker is able to escape from the intended context when SQL is being executed on the back end. So in our case, let's say some custom parsing is being implemented inside of an extension. And that extension is pulling the data from the title and putting it into a database directly without any sort of protection or without any sort of input validation or context-specific escaping. In the case of a database, that context would be escaping that is specific to the position within the SQL statement that that value is being injected, as well as the relational database management system and use. In the example that we see here, we're taking the username value and placing that directly into an SQL statement without any sort of validation or escaping. Basically, we're assuming in the example that username was pulled directly using requestGetVal. So an attacker, the commented outline there contains a value that the attacker might submit as the value for username. And basically, the attacker is closing out the username test and adding or 1 equals 1, resulting in the SQL statement, which follows. This is not the intended case when the developer created this statement. They were expecting an actual username, which exists in the database, to be passed or some value, some string value, some value that matches whatever their chosen format is for username, so that they can verify that the user exists and then subsequently check their password hash. And an attacker has circumvented that in this example. So the way around this, the way to prevent this is to use media wikis built in database access functions. So here is the same. This example is the same as what's been implemented here. So we use a database object. We specify the table. We specify the columns that we want to retrieve from the table. And we specify the where clause using an array. We can specify multiple columns to check as well. The word clause there is just wrapped. It may be confusing. And this will handle escaping for you automatically. It will handle delimiting for you automatically, so that an attacker can place this value, that value for username, but it won't be able to escape out of the appropriate context. Any questions on SQL injection? OK. Private data exposure. So this has been a big concern for us. And it quite frequently occurs when an extension has just one thing. There's been one comment on IRC. I'm reading. Honestly, this hidden iframe.js submit stuff makes me feel that browsers are just broken. So it's not that question. It is a comment, but I think. Yes, I agree with that. Unfortunately, we're left with a lot of legacy features. I guess not even legacy features, because that's a feature that has legitimate use cases. Unfortunately, I think they were not necessarily specced with security best practices in mind. So we have sort of a partially broken web that we have to program defensively for. And I think that as standards are developed further, we may move away from having to deal with these legacy issues, but for now, we're stuck with them. So it's important to just be conscious of how to work within those bounds until something better comes about. So yeah, private data exposure. So when an extension implements its own sort of revision deletion or is querying the database directly and looking at revisions that have occurred to pages and looking at messages and histories and things like this and doing something with that information, including the users who might have made those revisions, it's important to understand that the revision suppression system that's built into MediaWiki has been designed to allow for operators of the wikis to protect user privacy and protect content for legal reasons and for privacy reasons. And so when you implement an extension that queries the database tables directly and does not use the mechanisms that respect the flags that are set on various fields to indicate that they've been deleted or suppressed, it creates a possible problem through which data that should not be exposed to the public or should not be exposed to admins or other people, it ends up leaking that information leading to possible privacy or legal problems. So we're recommending that you don't reimplement the revision deletion suppression system because it's super, super complicated to do. There are a lot of tables that need to be verified. There are some extensions that also create additional tables that contain information that must be suppressed. And it's definitely a source of problems for us where information is exposed to inflammatory information, simply inappropriate content slanders things. So we recommend that you do not try to reimplement it. If you do feel that you need to for some reason, please contact the security group at wiki media. Talk to, if you're adding features to an existing extension or adding features to core, please talk to developers on those teams to verify that the feature that you're adding is absolutely necessary and to seek help from them as to the best way to do that that respects user privacy and legal requirements. I'm not going to read this entire slide. It's included just as a snippet so that you understand exactly what sort of information is considered private and should not be exposed. Any questions on revision deletion and maintaining private data? OK. So if you need help from us, you can click this link here. The slides will be available right after the talk. Click the link, tag it security reviews, or find myself or Chris Steiff on IRC. We both work Pacific Standard Time, generally speaking. I'm on sort of a little bit later, maybe 10 to 6 or so, 11 to 7. But find us on IRC or send us an email. We're happy to help out whenever we can. And you should definitely ask for help if you're doing any of these sorts of things. Authentication, authorization, session handling. We see a few vulnerabilities per sort of analysis cycle in this area. It can definitely get hairy when you're trying to do something there. So definitely reach out to us for assistance. If you're executing external programs via the shell, you can end up with another class of injection. And we want to help you out with that. We want to maybe offer some architectural suggestions or discuss other ways that you might avoid or other ways that you might implement the code such that you can avoid executing external programs. If you're serving up new content types, you want to talk to us. And we can connect you with appropriate teams that can offer some insight into how that's already been implemented. And any sort of new implementation of encryption and hashing, definitely please reach out. We'd love to be involved in that. And also, if you're a disabling page, I'll put carrying out some action, but you're not actually returning something to the browser. Give us a shout. And if you find a security issue that you need to report to us, again, you can click the link to open a fabricator ticket. Select software security issue from the security drop down. And that will place an RRB triage issues every week and discuss them pretty thoroughly. So in short order, we'll get back to you to check in with you about the status of that issue. You can also send an email to securityatwithmedia.org that gets distributed to Chris, myself, and a number of other core developers and security-focused people at the foundation who will respond to that pretty quickly. So that's the talk. Are there any questions about anything in here or any other issues for discussion? I have a question I received and all questions. Well, thanks, everyone, for attending. Please, again, feel free to submit a fab ticket or send us an email or catch us on IRC if you have any questions about any other content.