 Thank you for coming to the presentation. Thank you to SAS for sending me out here. So I've been with SAS since January of this year. That's really when I first started using Python and Django. So what I'll talk today about is just some of the pitfalls I encountered in building a project and how we overcame those. And hopefully that will help some of you out. So when I joined SAS, the project I had was to build a REST API. They were looking to automate firewall rule additions. So I was working for a network team that's part of a larger IT group. But they were looking to automate this because it was kind of a pain point in terms of time that they were spending on it. There are requirements for it to build an application that would support direct API calls and provide a self-service portal through our just internal web. And they had a preference for Python. And that was because some of the switches that they were getting had Python preloaded on those. And they just wanted to go ahead and start building an expertise in that language. So they already had the input for this application defined. We had a customer source destination IP address, destination port. And then we had an optional business justification where a user could say, this is why I'm requesting this. So I had to select a starting point coming in. They had the Python preference. But they pretty much gave me free reign. They said, look at what would be the best way to build this REST API. And we'll go forward in that direction. So I started looking at Django because it was something I had heard of. It was something I was interested in learning. I just hadn't had really a chance to do it before then. It was Python-based. And then also the built-in admin interface. It looked pretty. And it looked really nice. And it looked like you could get it with very little work. And then in doing some more research, I stumbled across the Django REST framework, which offered the Browseable API, which also looked very pretty. It looked like it gave you a lot for very little work. I did have someone recommend looking at Flask. And to me, it just seemed really complex getting into it. If I looked at it today, it'd probably be easier for me. But at that time, I didn't know a lot about Python. So I started doing some prototyping with Django, Django REST framework. Got something up and running fairly quickly. And I was able to demo that to our management team and then our internal customers. And it was well received, so we went forward with that. So in learning how to use Django, Django REST framework, I had to go through all the tutorials and things. I think several other presenters have covered a lot of this material, so I won't go in-depth. But you have your models. This is a way to define database tables. You have serializers. For our application, we use JSON as our data format. And then for your views, you're taking in a request. You're doing something. And then you're sending back a response. You have your view sets, which can be collections of related views. And you have your URLs to map the addresses to places and your routers, and then settings, of course, where you can configure the application. For the data formats, you have different renderers available, so the one we're using is the JSON renderer. There is XML, and there's some other ones out there too. And of course, the Browseable API renderer. That's a big part of our project. So for our application, we've gone through two iterations at this point. Both of those gone into production environment. The most recent one went in last week. And we've definitely hit some pitfalls along the way and worked our way around it. And that's what I'll get into now. So one thing to consider when you're building a REST API is for your views, who do you want to access those views? You can certainly use the user and groups table as a way of maintaining membership just of the application and the views themselves from an application standpoint. Django does allow for you to set whether or not you want to automatically add a new user. And if you choose to do so, you can even customize how that user object gets created. So in this case, someone tries to get to your application. They try to log in. If Django says, hey, I don't know who you are, then if you've got this set automatically at them, they'll do that and build that user object the way you told it to. And there's default settings for that. For us, we didn't want that. If someone came to our application and we didn't know who they were, well, we just didn't want them to use the application. So we turned that off. Another way we were controlling access to our views were to just check the user data from the request. So the request comes in. And I think it's request.user. You get an object for whoever's accessing the application or that view. And so then, even if someone's not logged in and you have a view that doesn't require a login, you can get an anonymous user object through that. And you can then build in logic into your view to check who is this user and what do I want them to be able to do. For us, in our second iteration, we added LDAP authentication. So we're actually querying Active Directory to say, OK, this is the user that's accessing our application. Do they belong to a specific Active Directory group? If they do, OK, you can use the application. And if they don't, then we'll return an appropriate response code instead. So for disabling a view, that's something you could also do. One thing is, if you're writing your own custom views, so maybe you've written a get view and not a post view, so that may take care of itself if you don't already have a certain view out there. For us, we're using the view sets so that we just have all the REST API actions already available that meets our internal company REST API standards. And it also gives us the intent is to give us some growing room. Like right now, we don't support a delete method, but we have it in there because eventually we may. So if someone calls delete on any of our views, we're just going to return a four or three forbidden and just say, hey, you can't do this. But depending on your application needs, you could certainly return a different type of response code. So as we got to iteration two, our requirements changed, and of course this impacted our underlying logic for our application. Instead of accepting a single request on a firewall change, we were now taking a list of requests. And then instead of taking just a single IPv4 address for a source and destination, we now had to take that, also IP ranges, and also subnet mask. And in addition to that, instead of just one value, we may have received a list of values. And then similarly for destination port, we could take a single port or a range of ports or a list of any combination of those types of input. And then business justification, one of our internal customers came to us and they said, hey, we really think that this needs to be required. Now we don't want you to process anything unless we have that data there. Someone's entered something in. So we had to build that back in. And then, so in iteration one, we built our own web form to serve as our self-service portal and just used Bootstrap.js for that. But in iteration two, we had one of the company standard IT groups provide our front end. And this group was called ITChange. So ITChange had its own user appropriately named ITChange. So if our view was called, we could get that user and say, hey, this is ITChange calling us. And if they call us, well, we have to keep a record of every firewall change that we actually make and we need to tie that back to a user. So we can say, hey, this is ITChange that called us and ITChange has to provide us with a user value so that we know who actually submitted that request. Now, if someone directly called our API, we're just going to already have who that user is. So we don't have to check for that value. But for ITChange, we say, hey, are you ITChange? Did you give us a user value? And if so, we keep going. And if not, we return, I think it's error 400, which is a bad request or bad data. All right, so validators, we're taking in all kinds of input now. In iteration one, we were using basic things. So we were able to use an existing validator for the IPv4 address. There's some text validation. I think we just did a simple integer validation for the port numbers. When we got to iteration two, that didn't quite work for us. We had to build in some regular expressions so that we could check to see the data that we were receiving that matched up. Was this a range of IP addresses? Did this IP address come with a subnet mask? What about the port numbers? Was it just a single port number? Was it one through three? Something like that, it was very small, but one through a thousand even. So we had to write our own validators for that. And one of the things we also were able to do was to just take in group validators together. So for your models, you can declare an actual validator for things. And I think it just points to one, or at least that was my understanding when I was going through the development. So I'd point it to a single validator that was a group. And within that single validator, I could call other validators. And for those, if any of those throws a validation error, because if you have a problem, you want it to raise a validation error, but you can kind of put in logic. Maybe you have five validators that are part of this group validator. And as long as the data comes in and passes one of those validators, it's okay. So you just handle any exceptions you come across or anything, and if you deem it invalid, you return that validation error, and you can pass in data for that so that you're telling your user, hey, this is why it was invalid, and then they can go and adjust whatever it was they were given to the application so they can have a successful request on their next attempt. All right, so for the data going around in the model, one thing that was kind of a pitfall for us was in iteration one, we were just going with single IP addresses. We knew at some point we may bring in ranges or subnet masks, but we didn't put a lot of forethought into that. We just, okay, we're handling an IP address. We'll deal with that when we get there. We should have put more thought into it. Could have planned a little bit better. So what we ended up doing in iteration two was building a bit of an umbrella model, and this may not be a best practice, but we found it useful within our group and our purposes. And this is just a big model for our API entry point, which says this is all the type of data, all the types of data that we may need now or eventually, and we put some of that in there. Now, if it's something we needed eventually, we can make it an optional field, at least for now. But we never save this model to the database. We just simply use it for when we're accepting input, and then we have subset models that have the actual data we're saving to the database for our record purposes. Another thing to keep in mind is, what do you need to tell the user after you've completed an action? So they've called your API, you've done some logic behind the scenes. What do you need to give them back? For us, record keeping is a big deal. We need to be able to map who's submitted these changes, why were they done? So we associated an ID with that. So our request comes in, we do all our logic behind the scenes, then we write all the records to the database, and we associate that group of actions with a single ID. We give that to the user. So if the user has a question later on, they can come back and they can reference that ID with us. If they just come back and say, hey, we had a problem, your service gave us a 503 service unavailable. We can't do a lot with that. But, well, I guess I wouldn't apply there, but if they put in a request, and they say, I put this request in, but I can't reach the destination IP from that source IP, what's going on? Well, if they give us a request number, we can go look it up and see what was actually done on the back end for that, and see if we have any logs associated with that. Was there an error did we capture something? We have that point of reference. Another thing to keep in mind is the error data. So for the user, we want to gracefully handle any errors that may occur, and we just want to give them, right now we just give them a 503, hey, service is unavailable, you should try again later, or you can contact the support team. On the other hand, for our support staff, we really need a lot more data than that. The user doesn't really care. If it's not working, then that's all they need to know, they can let us know, hey, it didn't work. But we need to go back and figure out what went wrong. So we tried to do two things. One of those is we try to, we capture all the information that's available at that time, put that into an email, and send an email to the support staff so that we can hopefully have a almost real-time alert of that error. Another thing we try to do is to log the error information, that same information into a database. And so then we have that record in there and we can go back and look it up. Now, if both of those methods fail, we have something catastrophic going on, but at least we tried. So in going from iteration one to iteration two, we had to start accepting multiple input types. As I mentioned a moment ago. So what we were aiming to do is we could have done an entirely different API call to accept these different types of formats. But as part of our iteration one deliverable, we already had internal teams that were looking at our API, learning our API, starting to write automation that would utilize our API. So we want it to be as minimally impactful to them as we could be. So one big change was having that business justification. Whereas before it was optional, now it was required, but that was hopefully a minor hit to them. But we basically redesigned our existing API call to support these multiple inputs. And we did that by taking the request that came into the view and building logic in to examine, do kind of a pre-check of the input we received. So one of the first things we do is say, for the request that came in, is it a list? And if it's not a list, we add it to a list. If it is a list, we're okay. And then we take for the IP addresses, the ports, we say are these a list? And if they are a list, okay. They're not a list, we add those, put those into a list. And so then all of our logic that's going on beyond that point can treat all the input as if it is a list because we've guaranteed it is. And that just made writing the logic further down a lot easier. Another thing that we didn't have in iteration one that would have been great was to have a version with our API. Because then going to iteration two, we could have just had our customers call us and give us a different version number and we could have broken out logic to handle that input differently. So we did version our API in iteration two. When I was versioning it, I came across two schools of thought on how to actually version it. One of those being to put a version number in the accept header and another to be, was putting it into the URL. The problem we found with, if we went forward with the URL is if we have some address slash v1 and then we move to a version two, well, if we want to maintain backwards compatibility, we now have those two addresses to deal with. So we've got to keep v1 working and then we've got to keep maybe v2 working. So we really didn't want to go that route. We went with just the HTTP accept header. It can come in, if a user doesn't give that to us, we assume, hey, we're going to use version 1.0. And as we go forward and increment from here, we can then branch out the logic in our code based on a version number we actually receive. So for the save process, for a model, you can actually extend that. In iteration one, I'll chalk this up to just learning. I'd put a lot of logic into the model save method. So it was doing a lot of things, checking the data before it actually saved the data to the database. So we took and extracted all of that out in iteration two and our logic is done in our view file and then a separate logic file. But you can override save method if you need to. Now, as we were doing things in iteration two, we had to save a lot more to the database. We were breaking things out into permutations with combinations of all these IP addresses since we could now receive these lists of input. So as we started to save these entries to the database, we immediately saw performance impacts. It was just taking a lot of time to save all that data. And we were trying to throw large quantities of data at it for the input to do internal load testing. So what we found was we could use bulk create to have all of these objects written to the database as part of one transaction. The problem we had with that was that we had a lot of foreign keys. So these were other objects that really needed to go into a main object, our main request object. So what we ended up having to do was a combination. For any type of object we were saving where we were just going to have maybe a handful of those and they were foreign keys, we just individually saved those models. And then for our bigger request permutations, since we could have a multitude of those, by that time we already have our foreign objects, foreign key objects, because we've saved those, we've gotten either a foreign key ID back or we just have the object itself. And we can insert that into, we basically built a big list of all these other objects we need to save, throw the foreign keys in there as we're building those objects. And once we have that big list, we can call bulk create on that. And we immediately saw the performance issues go away from that. I've learned some things here at Django.com that I can go back and look at and maybe we can optimize it even more, but that for us was an immediate savior. Documentation considerations, of course the built-in Browseable API is very powerful for doing that. Our internal customers have found it very valuable, it allows them to play with our API, we can give them a sandbox system and they can just have at it. So they've really liked that. You can also use it to see the different data format so they can see the JSON response that we're going to give them and they can go back and write automation on their end that can handle that and process it. I looked at Swagger, which is very similar to the Browseable API. It's much more beautiful in my opinion, but when I looked at this, it was probably in a March or April timeframe and they were having some browser incompatibility issues. I think those have been resolved. But I was still wanting to, for some of the actions, just listed every REST API action and I wanted to go in and customize it and say, well, okay, you can call delete on our API, but we really don't want you to right now, so I don't want that to show up in that web view and I couldn't find an easy way to remove that. So it's something I'll go back to and look at and it may be a lot easier to do that by now. Another thing we do is we have our own internal team, Wiki, which is accessible from anyone in our company, but we are documenting our API there, we're putting out all the different use cases, so if you give us a successful request, this is the type of data you can expect back. If you give us a bad request, here's what you'll get back. If you have an error, then this is what we will give you back and that way, all the teams that are building automation based on our API have that available for them. So some other lessons I learned along the way is we did have some legacy databases and I think another talk has already covered some of that. So we're running MySQL for our database and we found that we could run the InspectDB command and it would actually take all the preexisting database tables and generate model code for that. The problem we found there is we were looking to not only pull this in for the models, but also so that we could manage these legacy databases from our admin interface and we needed to write to it as part of our request processing. So if we were to save one of these models using the automatically generated code, the InspectDB had made the auto increment field and integer field in the Django model code, which meant that Django did not automatically give us an ID back after we saved that model. So the quick fix we found for that was to change it from an integer field into an auto field and then we could save that model, we get that ID back and we can continue our processing using that and that may be improved in 1.7. I'm not sure, we were using 1.65. So another thing, once again, we were using MySQL still are. We started seeing issues where it said MySQL server has gone away. Server was there, we could access it, there's no problem with it, we could access the database, this just seemed to happen, it didn't. We couldn't find any reason behind, oh, we had a connection open too long, it just seemed random to us. In the information I could find online, the appropriate way to handle this was to just close that database connection before you tried to have any database transaction. I didn't like that response, I'm hopeful that maybe that will change eventually, but that was the way it was at that time and we put that in there. So we had database, maybe database.closedConnection was the command, we put that in there and then we do a model.save. The issue went away, but then we ran our unit testing and that broke because it just didn't like that you were trying to close the connection to the test database. So we ended up going to our settings file and we set a setting in there unit testing, true or false. So when we run our code, we go based on that and so our code logic will actually say if we're not unit testing, close that connection and then do a model.save. Now if we're unit testing, it just ignores that and we just save the model and that's worked for us so far. Another thing we're doing is using the coverage model and that way we can run our unit test and we can actually have some nice HTML created where we can look at the code coverage, we can go see what lines of code we haven't actually tested and we can go back and try to hit those and we've successfully maintained 80% code coverage for all of our production iterations so far and we'll try to keep that higher but at least 80% is our minimum. Some other things, we had this IT change as providing our web form. They're using JavaScript to build their form and it could not work with our code until we added certain HTTP response headers. It was really easy to do, just in your view code you can add logic to add data to the headers, set the headers so we did that, no problem but we encountered yet another issue so we had our application is available, it's running on its own servers, running Apache. We have to support two different methods of accessing our application just based on our own internal limitations so IT change has to call our API through just the server address and then any other user in our company has to call it through basically a proxy and when you're trying to run both of those together on the same system, you start to have some problems so for us we tried configuring our settings files, we tried to have multiple settings files that could, they basically inherited it from a master settings files and then based on the server we're running on and based on whether IT change called us or just some other user called us we would then select the appropriate settings files, set those settings. Well what I found with running Apache and this is something I'll deal with when I go back is once the first call is made those settings are set so if someone else comes in the settings didn't change and so I've got to go back and look at that so we ended up having to manipulate our settings files a little bit more so that a single settings file worked regardless of whether IT change called us or just some generic user but another thing we hit there was the Browseable API as great as it is it was automatically generating some of the links that we had and it was generating them for our actual direct server address and we didn't want that for our general users they were the ones that were going to actually use the Browseable API and IT change wouldn't so we wanted the general users to have the link that went to the actual correct address so they could still use the API so we had to go in customize the template we had to customize the router for that that actually generated the links but the good thing there was it was all customizable it was just learning how to do it the forced script name there that was we basically had our internal site.sas.com slash our application so you had to set our application as forced script name to get that to show up in the link. Alright so in closing I found Django, Django REST framework to be a very easy entry point for myself I was definitely a novice at the time still am in some ways but it was very easy for us to get something up and running quickly we found it to be very customizable as our usage needs advanced for me my management team was more than willing to give me the time I needed to explore and figure out how to get things to work so that's definitely a key if you have the time to put into it you can customize it just to no limit it seems. Alright so I've got email, Twitter if you want to follow up with me I've got my slides posted they've been on there the whole time any questions?