 Thank you for being here. I'm excited to be here. This is my first time speaking at DjangoCon. And today, I'm going to talk to you about OAuth2 and Django, what you should know. A little bit about me. I work for a small company called OpenEye Scientific Software. And we make scientific software for drug discovery. We are also a corporate member of the Django Software Foundation. And at OpenEye, I create web services that allow our customers to access our drug discovery tools, including the authentication and authorization services. Prior to joining OpenEye, I was a computer scientist at the Los Alamos National Laboratory where I worked in the high-performance computing. But enough about me. I'm here to talk about Django and OAuth. What should you know about it? I hope that at the end of this, you'll know a little bit about how it works. You'll know how to use it with Django. And what common mistakes that you should avoid. So in order to understand OAuth, it's useful to understand the problem that it's trying to solve. So let's say we have a user. We'll call her Alice. Alice is an engineer that designs parts in 3D. And she stores those designs in a cloud storage service. And occasionally, Alice needs to print those designs out using a 3D printer. But she doesn't own one. However, there is a service out there that will do that for her. Now, this service needs to be able to access those designs in order to print them. And Alice could upload a copy of each design every time she wanted to print one. But it would be more convenient if that service were allowed to access those designs directly. She's not going to give her password to that 3D printing service. That would be foolish. And her designs are protected by a password in the cloud. What she needs instead is a way to tell the cloud storage service, yes, this 3D printing service can access these designs and do it in a controlled and secure manner. That is what OAuth was designed for. I'll show you some examples later. But in summary, OAuth was designed to allow authorization for web services without the need to disclose a user's password. It's often referred to as the valet key for the web. In the same way that a valet key will let you drive someone's car without getting into the glove box, OAuth provides a means for specifying authorization. Now, this talk is about OAuth 2. But before OAuth 2, there was, of course, OAuth 1. But nobody uses that anymore, right? We all decided OAuth 2 was better and moved on. No, sadly, that is not the truth. There's a large divide between OAuth 1 and OAuth 2 for a variety of reasons. But I'm not going to talk about those reasons here to borrow from the spec that is beyond the scope of this document. So OAuth 2, it's called the next evolution of the OAuth protocol. Although if you want to be technically correct, the best kind of correct, it's a framework, not a protocol. It's a whole lot more general than OAuth 1. It's released as RFC 6749. And I recommend that all of you read it. It's an easy read. Trust me, they left all of the hard bits up to you. I can summarize the differences between OAuth 1 and 2 by letting you know that they've added the requirement for transport layer security. Some of you may refer to this as SSL. In exchange for that requirement, they were able to remove some complexity from the protocol, making it a little easier to use. Whereas OAuth 1 was designed to be secure independently of the transport. OAuth 2 has its own vocabulary. And I'm going to go over some of the terms that I'll be using throughout the remainder of the talk. First, we have the resource owner. This is what we would call a user. And this user owns data that is stored on the resource server. A resource server is a server that hosts the data and provides controlled access to it according to the specification. And it provides that access to the client. The client is another application. And it accesses data residing on the resource server belonging to the resource owner. It can do this by working with the authorization server. It puts all of this together by taking the authenticated user and allowing that user to specify access to clients. And it enacts that access by granting tokens. Tokens are just credentials. You can think of them like a key. And they've got some metadata attached to them. So using our previous example, Alice is the resource owner. The client is the 3D printing service. And the cloud storage service is both the resource server and the authorization server. There's nothing in the spec that says that one entity has to be both of these roles. But it doesn't prohibit that either. And it's a common example that you'll see if you use OAuth. That's why I've used it here. OAuth 2 specifies a few ways that authorization can be granted. And those are unsurprisingly called grants. A grant is just a way for a client to get a token. There are several different grant types. And I've listed them here. But there's no need to memorize that list. Each grant has its own requirements. But they all end in the same way. And that is with a client obtaining a token. The sequence of steps required to obtain a grant is called a flow. And there is a flow for each grant type. I'm going to focus on one particular flow, the authorization code flow. And the reason for that is that according to the spec, it is optimized for confidential clients. What is a confidential client? Well, a confidential client is a client that is capable of keeping a secret. So for example, if you implemented the client in JavaScript in a web browser, that would not be considered a confidential client because it can't keep any secrets from the user. However, on the other hand, a Django server has to be capable of keeping secrets. We all depend on that. And therefore, it's a confidential client. And the authorization code flow allows us to get a token to the client without the user or anyone else having access to the token. So how does it all work? Well, I'm going to walk you through the authorization code flow, but before we begin, a few things have already been set up. We have to assume that the user has an account on the authorization server. The client has also registered, according to the spec, with the authorization server. And so it has its own ID and its own secret, which you can think of as a username and password. And that's a secret that that confidential client is capable of keeping. And it specifies another field called redirect URIs, which I'll explain later. So suppose you have an application that allows users to log in through OAuth 2.0. The login page might look something like this mockup. Notice that the browser is on clientapp.com, and the link at the bottom says loginviaexample.com, two different domains. The client, in this case, is clientapp.com, and the authorization server is example.com. So if the user clicks loginviaexample.com, they'll be redirected to the authorization server. And the authorization server is going to ask that user to approve or deny that access. If the user clicks approve, they're then redirected back to the client, except now they're logged in. So using the example that I had previously, Alice would attempt to log into the 3D printing service and would be redirected to the cloud storage service. Now, that redirect represents the client's request to access data. So a few arguments have to be sent along with it. This is one example of what that redirect might look like. All the arguments are passed in the query string. I've put them on separate lines so you can read them. And the first one is the client ID. And the client's going to include this in every single authorization request, and it's just there so the authorization server can identify that client. The second argument is the scope. And this is literally the scope of the data access being requested. And it should be something that is meaningful to the authorization server. Next, we have the state parameter. Now, the state parameter is an opaque string, and it's just here to mitigate cross-site request forgery attacks. I'll explain how this works later, but you should always use this even though it's an optional parameter according to the spec. And finally, we have the redirect URI. This is where the user is going to be redirected to after they have authorized access. And remember I said earlier that this was one of the fields that the client specifies when it registers? Well, the reason is the authorization server is going to look at this URI and make sure that it matches what the client said was a valid URI earlier. Because if it didn't, you could put something evil there and redirect the user anywhere you wanted to after they've authorized access. And the reason this is dangerous is because that redirect is going to get some arguments as well. One of those arguments is a sensitive piece of information called the authorization code. That's what the whole flow is named for. And that authorization code eventually will be used to get an access token. So at this point in the flow, the authorization server has validated the redirect URI, client ID, and the scope, and the user has approved or denied the authorization request. And so using our previous example, Alice has gone to the authorization endpoint and is now redirected back to the client, or the 3D printing service. As I mentioned earlier, that redirect also has arguments. The first one being this state parameter. And it must be the same value that was supplied with the initial authorization request. And here's why. That second parameter, the authorization code, can be used to get an access token eventually. And that authorization code is associated with an account on the authorization server. So let's say, for example, that that authorization code was tied to my account on the authorization server. And I initiate this flow, but I interrupt it right here before my browser visits this URL. Instead, I take this URL and I embed it in a web page. And I get you to click on that link to go to that web page. Your browser loads this URL. And if the state parameter is not there, there's nothing linking the two requests together. And all of a sudden, my account on the authorization server is linked to your account on the client. This is just one example of how things can go wrong of an attack if you don't use a state parameter. So now the user has been redirected back to the client. The client's going to exchange the authorization code for a token. And the authorization code expires in 10 minutes or less, according to the spec. And it can only be used once. Now, here's how we get a token to the client without the user having access to it. The user has access to the authorization code. It's in the redirect. But you have to have both, or the client, has to have both the authorization code and the credentials that belong to the client to actually get an access token. Both pieces of information are required. So even if you leak out the authorization code, if you use the authorization code flow and you have a confidential client, then you still don't leak an access token unless that attacker also has your client credentials. I should also mention that tokens expire, but there are things called refresh tokens in the spec. I don't have time to go over that, but they do allow you to get new tokens using existing ones. So now I have a token. What do I do with it? Well, we're going to access the user's data. That's the whole point. And the way you do that is by including that token in the request header of any request that you send to the resource server. And you do it with this syntax. You set the authorization header and you put the word bearer followed by a space and the value of the token. That's all you have to do. And then we're going to take that data and create a linked account in our client, or in our case, a new Django user. And it's going to be linked to the user's account on the authorization server. So now that we've seen how to use the authorization code flow, how can it fail? Well, there are a few ways. If either party, the client, or the authorization server don't use transport layer security, then all bets are off. We have no way of making any guarantees. If the authorization server does not validate the redirect URI, an authorization code can be leaked to an attacker. If either party doesn't use the state parameter and verify the state parameter to be correct, then you're vulnerable to cross-site request forgery attacks. If you don't authenticate the client, you being the authorization server, then our authorization code can be used to get an access token. And finally, if you don't expire authorization codes, you're subject to replay attacks. Now that we know a little bit about how OAuth2 works, how can we use it in our Django app? Well, there are two broad overarching ways that you can use it. You can either make your app a consumer of OAuth or a provider. And that implies that you're fulfilling certain rules. For a consumer, it means you're fulfilling the role of client in the framework. And for a provider, you're fulfilling the roles of authorization and resource servers. Each has its own requirements as well. For the client, you've got to be registered with any authorization server you want to use. And you've got to redirect users to those servers when you need to access their data. And you've got to provide callback URIs so that that authorization server knows where to send the user after they've approved access. You also get to deal with inconsistent implementations of which there are many. Now on the provider side, you've got to provide URLs for authorization and tokens. You've got to provide a means for client and user registration. But the best thing is you get to make your own inconsistent implementation. And there are many inconsistencies. In fact, all OAuth2 implementations are consistent in the same way that all Unicode strings are UTF-8. If you get that joke, you have my condolences. Here's an example. These are the token endpoints for Google, Facebook, GitHub, and Foursquare. They are all different, and this is just one example of many. But keep in mind, this is within spec because the spec doesn't say anything about what these endpoints should look like. In fact, there's a lot that the spec doesn't say. OAuth2 was meant to be a lot more general than OAuth1, and it really shows through my rigorous scientific analysis here. You can see that OAuth1 or OAuth2 is about almost twice as general as OAuth1. There's a lot that's left up to you. I've pulled a few of the gems that are left up to you out of the spec, such as the methods used to access protected resources, the interaction between the authorization and resource server, the location of the authorization and token endpoints, and the methods used to validate the access token. The methods used to validate the access token. But don't worry, it's not all up to you. The Python and Django communities have your back. There are quite a few libraries available. So I've made this table for you based on my own reading. Enlisted here, you will see which libraries implement the consumer and provider bits, whether or not it is compatible with Python 3 if you're the kind of person that cares, and I am one of those. I don't get too religious about test coverage, but I do believe in the simple premise that well-tested code on average is better. And finally, the number of downloads in the last month from PyPI. Two libraries stand out immediately. One that implements the consumer bits and one that implements both consumer and provider. Python Social Auth is quite popular. You can see it has decent test coverage and it's Python 3 compatible, and I know that it has very good docs. Django Auth Toolkit implements both consumer and provider and Python 3, and it's excellent test coverage. And finally, I can't give this talk without mentioning OAuthLib. OAuthLib doesn't actually provide any Django specific bits. Rather, it provides generic reusable components, and it is the library that powers Django Auth Toolkit. And to the creator of OAuthLib, I say thank you. I don't know if he's here, Edan, but in addition to being a Django core developer, he has made OAuth usable in Python for all of us. He's done a great job. So let's use OAuth. Before moving on, I need to explain a little bit about how Django authentication works. Django authenticates users in middleware, and that's the code that processes your request before it gets here view code. Now, the way that it happens is that it iterates through a list of preconfigured backends, and these are set in your Django settings, and if it's able to authenticate a user using one of those backends, it sets request.user to the authenticated user or anonymous if you allow that. And the backend that you're probably most familiar with or that you've used the most is the model backend. I've trimmed out the bits that aren't relevant, but it looks something like this. All it does is compare the hash of the password provided for the given user against the hash that's stored in the database. Any Django package that provides authentication is going to have to implement a backend such as this, although in place of checking the password, it would initiate or complete the OAuth2 flow. So first I'm gonna talk about Python social auth and the consumer bit. Notice it's called Python social auth, not Django social auth, because it supports several frameworks, not just Django. It also has built-in support for many providers, all of the large providers that you would expect to be there are, and it provides an authorization backend like what I described for each provider. It has some higher level abstractions known as pipelines that are mechanisms for associating and disassociating user accounts on the authorization server with your app, and those are quite convenient. I'm happy to say that it implements the state parameter that I mentioned earlier because that's very important, but not for every provider because it's configurable and not every provider supports it, and sometimes it's disabled incorrectly. As were the case for Spotify and angel.co, in preparing for this talk, I found out that the built-in configurations in Python social auth for these two providers was wrong and I opened an issue and they fixed it. This is open source working before your very eyes, and if you're curious as to how I found that, I used the example app that's built in and for each of the providers, I just looked at the redirection URLs and made sure the state parameter was present and then I compared the presence of it against the documentation for that provider. So it's got excellent Django support, it does things the Django way. Login required still works, just how you think it would. It works with the session and authentication middlewares and it deals with all of those inconsistent provider implementations for you and I think that's great. And just to remind yourselves of the component that we're actually implementing, using our previous example, Python social auth gives you the client. So in summary of Python social auth, if you only need an OAuth2 consumer, I suggest you start there, but carefully choose your backends and verify their settings. So moving on, next I'm going to talk about Django OAuth toolkit. It's implemented essentially as OAuth lib plus some models, views and URLs which I think is just perfect. It's RFC6749 compliant, which means that it supports other flows, not just the authorization code flow that most packages implement. It also uses a state parameter which I'm happy to say and it validates the redirect URI. From their docs, they call it OAuth2 goodies for Django nuts and it works quite well. And what that gives us is both the authorization server and the resource server. It's going to give us those two components and again they don't have to be the same entity but they can be. It gives you built in views for managing clients or what they call applications for managing tokens and it also gives you mix in classes for protecting resources by requiring a token to be present in order to get that view. Here's one example. You can import the protected resource view and before this code is executed, it's going to ensure that a token has been presented and it's going to set request.user. And I lifted this straight from their documentation. It's really easy. Who in here uses Django REST framework? Okay. You're gonna be happy to know that they have built in support for Django REST framework. You can configure your view set just like that. There is not built in support for a tasty pie but that can be done with a little bit of work. So adding a token authentication to your APIs is literally this easy. This is all you have to do. So right now we've used Django Auth Toolkit to build both the resource and an authorization server. And remember how I said the methods used by the resource server to validate the access token or blah blah blah. Well somehow this package has to be validating that access token. So how do you think it works? It does that. It merely checks for the presence of the token and it can do that because it's using the same database both the resource and the authorization server. So it just checks for the presence of the token and makes sure the scopes are correct. But what you might actually want is a separation of the resource and authorization servers, right? And the spec says that's fine, you can do that. They can be separate entities. But Django Auth Toolkit doesn't immediately allow for the separation simply because both components are using the same database. So as you could separate it out but they're using the same database. So what you have is an architecture that looks like this. The authorization and resource server are together. So imagine every server in your architecture and every service has to also be an authorization server because it needs that database. What you would rather have is something that looks more like a platform, a suite of services. And one of those services is the authorization service and your services talk to it and they have separate databases. Now you can do this with Django. You can build a standalone resource server. It's gonna have a separate database which immediately means that your user accounts are now in two places because you've got to have linked accounts in your client that are linked to your authorization server. And you can no longer use the Django Auth Toolkit authorization back in or middleware, which kind of stinks. So now you need a way to validate tokens because we are relying on that. And remember those methods used to validate the tokens, generally involve an interaction, blah, blah, blah, blah. We can do this and I'm going to give you two possible solutions. The first one is to add a token validation view to the authorization server. So it's just a view that spits back information about the token. So a client sends a request to a resource server. It includes the token, the resource server sends a request to the authorization server with the same token. And the authorization server says, yes, this token is good and here's some metadata about it. It's still RC6749 compliant. And if you're having second thoughts about this, keep in mind that this is exactly what Google does because there are situations in which you receive a token but you are missing some critical piece of information about it, such as when it expires. So you can use an API point like this to provide that information. Solution number two, out of band storage with Django signals. And what I mean by out of band is not your Django database. So you could use a NoSQL database or anything else for that matter. You can use the built in signal mechanisms in Django for the save and delete signals on the access token. And you just provide functions that synchronize that token to your own data store. So now we can build a Django resource server and we can verify tokens. But without the Django OAuthToolkit, you don't have this nice OAuth2 token middleware that authenticates user and sets request.user. So you'll need to roll your own authentication back in. I'm not gonna give you code but it's really easy to do. What you have to do is retrieve the token from the request, verify that token using one of the methods that I just showed you. You'll set request.user to the authenticated user, creating a local account if necessary. And OAuth Lib has some nice examples to get you started in their documentation. So now, you've seen Django as a consumer with Python social OAuth, Django as a provider using the Django OAuthToolkit and some common mistakes to avoid in the client. Always use a state parameter. Even though it's optional, you should always, always use it. On the server, you do the same thing. You need to always support it, all right? And the server must always validate the redirect URI. Very important. Server needs to expire authorization codes. And most of all, everyone involved needs to use TLS. If this is the only slide you pay attention to, good. This is the one you need to watch. But don't take my word for it. Read the spec, all right? Trust but verify. Concludes my talk. Thank you very much.