 All right, hi everyone. My name is Kyle Knapp, and I'm a software developer at AWS where I primarily focus on developing the AWS Python SDK, also known as Bodo, and the AWS CLI, which is a Python-based command line interface for managing your AWS resources. And this is my talk. It works on my machine, writing Python code for any environment. That grab expert must be there. So does this look familiar where that you start developing on a, you start developing on your own machine, and it works fine. And so you decide that you're going to push it to another computer, say it's a server or a, or a GitHub repo or another colleague that might be using it. There's supposed to be, there's supposed to be a graphic that kind of goes around showing it fails at the end once you get the success. But let's see how that works. Okay, so you may be wondering why does compatibility matter? If it works in my environment, why do, why would I care that it works on something, somebody else's? Well, if you ever want to adopt more users for your program, most users are not going to be running it on the same environment as you. They're going to be using a different Python version, a different operating system. And therefore it's important that you're able, you could be able to run it successfully. Secondly, usability helps drive popularity. When there's no barriers to entry to actually use your program, it helps widen the range of possible users. Many popular Python programs out there, such as Django, Flask, Request, they all use, they all have compatibility across a bunch of different operating systems in Python versions. And finally, most of the time it is not difficult to ensure compatibility across the different environments. Oftentimes it's just a line here and there. Oftentimes it's a little bit, a couple of times it's a little bit more difficult. But let me go over what would be covered in this talk. First going to go into background about a sample application that most of the topics that are going to be based around. Then I'm going to talk about the different Python versions and how to ensure compatibility across that. I'm going to talk about how to get compatibility across operating systems. And finally, how to write tests to ensure compatibility and help improve it. So the sample application I'm going to be talking about is the AWS command line interface. Like I said before, it's a Python based CLI used for managing your AWS resources. I decided to use this application because I have a lot of experience with it. And in the end, it's a CLI. Most users are not Python developers. They're not concerned with what Python version they're running or the operating system. They just want it to work in the shell. And so many of the topics I'm going to cover are based from my experience working on the CLI. The AWS CLI is compatible on Python versions 2, 265 and higher, and Python 3 versions 3, 3 and higher. And it's also compatible on a bunch of different operating systems such as Linux, Mac, Linux and Windows. So the first topic I'm going to cover in terms of compatibility is Python versions. Some of the sub-topics I'm going to cover are renaming. The difference is in string types across Python versions and the possible limitations you may run to. So with renaming, a lot of times when you're working across Python major versions, the modules or the classes or the functions may be renamed. And whenever developing on CLI, I sometimes run into this. And the question you may be asking, how do you handle that renaming? The most general solution is to use six. Six is a compatibility library to help you write Python code that's compatible across Python 2 and Python 3, and it will handle a lot of these conversions for you. So here's an example. The class string I.O., which is a class that wraps data into a file-like object. In Python 2, you import it from the string I.O. library. But in Python 3, you import it from the I.O. library. The issue here is if you wrote a Python 2 style of formatting it, and you ran on Python 3 code, you'll get a bunch of import errors. But fortunately, the solution is quite simple. Thanks to six, you can just simply import string I.O. from the six library. However, this brings up the edge case of what happens when there's no six compatibility. For example, in the CLI, we have to use the format date function, which is a function that returns a string in the form of what you expect to see in an email. In Python 2, you import from email.capital.utils. And from Python 3, you import it from lower-case utils. And like I said before, there's no six conversion for it. So the solution to that is to have your own compatibility module to handle all these conversions. And a lot of the projects I work on, it's called a compat.py file. And simply, you just have this logic. First, you check to see what the Python version you're working on. And if it's Python 3, you import the Python 3 version of importing format date. And if it's Python 2, you just do the Python 2 version of it. Then to build on top of that, whenever you need to use format date, you just import it directly from the compat module. Note that it's a lot better to use the bottom way of importation instead of having this code block in every single file you need as format date, because if you ever want to change the logic, you'll have to change it in every single spot. So the lessons learned here is use six. It's great for handling all those name changes across Python versions. And if you can't do it, use the compat.py file. So now we've talked about renaming. There's also string type differences across all the different Python versions. So you'll get a change in name and also a change in functionality that you actually will have to handle. So in one example of how to handle string types is input to the CLI. So with input to CLI, we'll have a bunch of different types of input we could be receiving. And here's an example of a CLI command. So we're trying to run the EC2 describe regions command, which will just print out what regions are available for you in EC2. And the input we apply is regions names, the value being US West 2. Underneath the CLI, it uses the art parse library to create a namespace of all the inputs that got received. And inside of this namespace, the values may differ even across string data types. So across different Python versions, you can have different string types. So I keep talking about this Python string type. So let me cover them now. There's two ways to represent Python strings. There is string data represented as bytes and string data represented as unicode. And Python 2, the string class is string data represented as binary. And unicode class is string data represented as unicode. And then this changes though in Python 3. In Python 3, bytes is a class that represents string data as binary. And string is the class that represents string data as unicode. So that could be kind of confusing right off the bat. To put more on top of that, there's different functionality between the two Python versions. So for example, if you try to mix the two different string types, here we're trying to gincatenate a byte string foo with a unicode string bar. Python 2 will do some implicit decoding for you such that when you concatenate the two together, you'll get the unicode string foo bar by decoding foo first and then adding the two together. However, this can cause issues for you. If you take this example right here where you take the byte string E2 and concatenate it to the unicode string bar, you'll get decoding errors where it tries to decode the E2 byte string, but it can't because it's ASCII and it's out of range. In Python 3, it's a little bit more strict on how to handle this, so if you take the same line as before, you'll just get the error. You can't concatenate bytes to string straight up and you won't have a vague message. So the problem here is how do you handle this, especially when you have no control of the inputs that you may be receiving, and you know that mixing string data types is not good. So the solution that I found through my development is to ensure that the string type consistency is consistent right off the get go for input to your program. So take this method, or this method that we use when we receive an arg parse namespace. The method parse known args takes a namespace here. It first determines what the terminal encoding is being used for standard input, and then what it'll do is it'll iterate through each namespace and that was provided and to determine if there's any string binary types. So for example, here we're using six, which has a useful helper method that determines if the string data is represented binary, depending on what Python version you're running on. And if it is binary, we determine to decode it to a string data type represented as unicode, such that any string data type in the namespace is represented as unicode. So lessons learned here is that Python strings differ semantically and functionally, and you do not want to mix the string data types, especially in the algebra project because it can be very hard to trace back what happened, what went wrong. And to do so, make sure you ensure that the type is consistent as soon as possible. So I've been talking about a lot of these renaming and changes of functionality, but what happens when there's the functionality straight up doesn't exist? For when developing on CLI, there's a couple occasions where there is no functionality, especially for Python version 2.6. And our solution to that is to actually backport the functionality. An example of a backport is the ordered DIC class. It's a ordered dictionary that keeps track of when you insert your keys into the dictionary. It's in Python 2.7 and higher, but not in Python 2.6. And we don't want to add this functionality from scratch. It's a pretty large class to be writing by hand. So how would you go about doing this? First, you would want to add a conditional dependency. Fortunately, there's a third-party package that implements ordered DIC for us, and we can pull that in as a conditional dependency. So take this snippet of code from our setup.py file, and we take our original requirements, our normal requirements, and then we use the sys.version info tuple and determine if the Python version is 2.6, and if it is, we'll actually append this new conditional dependency into the requirements. Then the second step in order to make, to hide the lack of functionality is to import it using the .py file. In this case, here's an example snippet where we first try to import ordered DIC from where we expected it to be if it was Python 2.7, which is the collections library. If not, we'll import it from the third-party package that we pulled in. So the lessons learned is you may have needed a backboard when needed, but you want to make sure you limit the effect on untargeted Python versions, especially with the dependencies. You want to make sure that you are not pulling in an unnecessary dependency, especially for Python 2.7. You don't need order DIC because it's already in the library for you. So next, I'm going to talk about operating systems and how to get compatibility across all the different operating systems you may run into. So the first topic I'm going to talk about is file handling, then I'm going to talk about how to handle file paths, and then finally, functionality and lack of functionality, or how it may differ across operating systems. So file handling, most likely, if you have a Python application that you're going to have to deal with files, the interface for files across operating systems is pretty similar. You pretty much can open up a file with read or write or in bytes, but it's really the undocumented functionality that you've got to be aware of that may be a limitation on the operating system. So for my introduction for this, I'm going to talk about the AWS S3 CP command. What this does is it uploads files from your local file system to Amazon S3, which is Amazon's storage service, and you similarly can download files from S3 and copy it around your buckets. And when it runs this command, this command will work in parallel, so it makes a request to S3 in parallel and does all the writing and stuff like that. So the original implementation of how we did this was we would have multiple threads, in this case, let's say three. We would make concurrent requests to S3 where we are doing range gets for a specific object. And once we get the, once each thread will get their data, they will then write the file, write to the specific point of file, note that they know exactly where to write saying that is a range get, so they can seek to where they want to in a file and they know how to the content length so they know when they're done writing. Thus, if each thread won't be corrupting another part of a thread's file. However, we ran into problems with that where if you start with a brand new file and you try seeking to it, the operating system will extend the file with a bunch of zeros. And when you're extending the file and writing to the same time, some versions of Windows we have found does not actually, does not have support for that or there's limitations in it and what happens is we get corrupted files. So our solution to that was to add an IO thread to limit the amount of concurrent interactions of the file. Here we take the normal three threads and they pass on their data to this IO thread which will then sequentially write the data to the file. Note that the bottleneck here is not of the IO thread as making the request to across the network and retrieving the data is much slower than writing it to the file. So lessons learned here is that file handling behavior, it can vary greatly or not greatly but it can vary subtly and you need to be aware of it. And also in order to help cover these undocumented behaviors is to limit the number of concurrent interactions you may have with a file. So I didn't cover all the other issues we've ran into but one example is if you have a file open and you try to remove a file handle open and you actually try to remove it, you'll get a bunch of trace backs on Windows. So the best way to avoid all these is just to limit how many interactions you may be having. So if you are handling files, you're most likely having to deal with file paths and how the location of the file is represented. So for the intro, I'm going to build off my previous example with the AWS S3 CP command which has a recursive option which lets you upload or download a whole directory to S3 and back. It relies on being able to list files. The problem is file paths differ across operating systems. So for example, if you take Linux, if you get the current working directory, all of your paths will have a separator of a forward slash but for Windows, it will be backslashes. And the way S3 is represented in the CLI is the Linux version. So for example, if you take this here where you have a local Linux machine and you're transferring a file to S3, it's located underneath my dir with the name my file. If you want to transfer to S3, you don't have to worry about any renaming or anything. It will just transfer just fine. Where the object in S3 will be named my file and have a prefix of my dir. However, for Windows, it's a little bit different. Note that in Windows, it's a backslash. And thus, in order to handle it, you actually need to change, you have to normalize the separator in this case. And that allows you to transfer to S3 and still have the same function, the desired functionality of having a prefix my dir with the name my file. So the general solution to handling these is to use the OS library. There's a lot of neat functionality that will let you handle a lot of these things that you may run into. For example, for the example I just provided, if you can use the OS.sep property, which will determine what separators use for your path, you can replace it with a normalized version, which was a forward slash in my example. So the lessons learned here are files can be represented differently across operating systems. And you want to be wary and use OS whenever you're handling file paths. Because if you have a hard coded value in there, it will mess up your Python program. And then finally, like I talked about in across major Python versions, I'm going to talk about differences in functionality across operating systems. So given one example of where we ran into differing functionality across operating systems is with the CLI help command. So what the CLI help command will do is if you type on the command line, just any command plus help, you'll get a man page that shows up and the man page will let you scroll through it and it'll collapse as needed. Implementation wise, all we do is we generate the output and then we pipe out the output to a pager such as the command line utilities less or more. The one thing you want to note that is less is like more but has a little bit more functionality. So for example, if you hit the bottom of a page when you're paging, you can scroll back up. So the problem that we ran into was that Linux has the ability to use both more and less. But for Windows, you can only use more. We did not want to make... We did not want to subtract any functionality based off the operating system such as only deciding to use more for no matter the operating system because Linux has the functionality for less, which is a little bit better pager to use. So ideally what we want is for Linux operating systems to use less and for Windows machines, users to use more. So the solution here is to actually have platform-specific code. If you use the platform.system call, that will let you do is determine what operating system you're running on. And in this snippet, what I'm doing is I'm checking to see if the system is Windows and I'll return a help renderer specific for Windows, if so. And if not, I'll return a POSIX help renderer. And if you look at these subclasses, the pagers I get assigned to it are what you expect. So for Windows, we have the utility more and for POSIX, we have less. However, there's a little bit more you can actually apply to this to help the healthiness of your program. Ideally, when you have to overcome these different functionality differences, it's a good idea to follow LSP, which is to make sure that the group, the objects in your group has the same interface such that any object or function that may be calling these objects can expect the similar interface. So for example, if you take the parent class from the Windows help renderer and the POSIX help renderer, you'll notice that it only has one public method, which is a render method. And the render method pretty much hides all the differences in functionality between Windows and non-Windows machines. And it'll just pipe the output that gets generated to the appropriate pager. So similarly, you can take the help command that actually runs it. And all it has to do is call it getRenderer. And it would not have to worry about what operating system's running on. It'll just retrieve the appropriate class. And then it can just call the render method on it to make sure to actually get it to do what you want it to do. So the lessons learned here are the operating system functionality may differ. And if it does, it is all right to have platform specific code. But if you are going to add platform specific code, you want to make sure you create some parity in the public interfaces such that any compromises you make in one class doesn't affect all the functionality of your other classes or throughout your program. So now that you learned some of the tips and tricks about making compatibility cross-piphone versions, I'm going to cover how to write test and run test to help better your compatibility across the various different environments to be around me too. So I'm going to first cover writing test. I can't stress the importance of having a good suite of tests just for the different possible issues you may run into. It's ultimately your safety net because there's a bunch of different compatibility issues you may run into. And you're not going to catch them all. It's really going to help you in the end. Most of my examples are going to be from the unit test module. And you have to remember that all the techniques I talked about before still apply. So if you have a source code that's compatible, even though it's compatible on a bunch of different Python versions or operating systems, if your tests aren't compatible, they're still going to fail. So it's important to have compatibility there as well. An important note is if you're going to use unit test, it'd be good to pull in unit test 2 for Python 2.6. In Python 2.6, it does have unit test, but the functionality is not the same as for Python 2.7 and higher. So similarly to how we made a... use the functionality in the compat.py file, we can do it as such here, where if it's Python 2.6, we'll import unit test 2 as unit test, and if it's not, we'll just use the regular unit test module. So best practices. One of the best way to catch if your program is capable to run on a bunch of different environments is to actually throw a bunch of common problem-causing inputs into it and see what happens. For example, if one good value to put in there is Unicode, and it's also a note that just having... making the string a Unicode type is not enough. Even throwing some Unicode values in there is great to determine if your program's going to have any decoding errors and whatnot. Similarly, throwing in some bytes as well is a good idea, as you can often catch a lot of the different types of decoding errors that you may run into, or if you're mixing in the bounds of your project. Finally, for file paths, it's a good idea to have some nested files in it such that it has a directory. It's just not just a way of your current working directory. And even throw some Unicode into your files to see if you're actually modifying the file paths, how it's able to handle that. So, another thing I would like to talk about is file handling in terms of testing. A lot of times there's a pattern where you'll mock out a... You'll mock out a lot of the OS library or maybe the open method to... In order to create a file-like object like a string.io or bytes.io. Ideally, especially for integration tests, it's better to use actual files because there's too great of a difference in parity between file-like objects like string.io and actual file-like objects. And if your file... If your program is using files, it only makes sense that you use the real thing. So, here's an example of what your... One of your test cases could look like in setting up a file to use. For example, here in the setup method, what we do is we create a temporary directory. And from that temporary directory, you can add any file you want and write some data into it. And then similarly, once you're done setting it up, any test case can actually run in and use that file you just created for the setup. And then in the end, you can just tear it down. It's a great way to be able to test on the real type of files you'll expect in your program. And then finally, you may run into the situation where you actually need a test for an environment-specific behavior. And ideally, what you want to make sure to do is to share code paths in your test. But when you can't, it's all right to skip tests. To skip tests, you can just use the skip-if decorator. In this case, I'm skipping if it's a Python 3 version. And I can also skip if it's a platform-specific. So for example, here I'm skipping if it's not Mac or Linux. So you may be asking now, when do I know to skip test? Ideally, skipping tests, you should only be skipping tests of the functionality you expect differs. So like the output differs based off what operating system or your Python version you're using. Or if the functionality just straight up doesn't exist. So if you're testing special files, such as symbolic links or socket files on Linux, that doesn't have any representation in Windows, it's appropriate to skip here. So to reiterate some of the main points here, you want to make sure that you are testing common trouble inputs to get a realistic idea of how it's able to handle possible issues. Use expected types, such as the trouble inputs and actual files. And then skip only when you absolutely need to, as I said before in the previous slide. So now that you have your test suite written, you wanna know how to actually run the test and make sure it runs on all the different environments that you can possibly use. So one of, one utility to make sure you have in your warehouse is virtual amp. Creating virtual environments for specific Python versions is great for when you're developing and testing such that you know how well it runs for a specific version. And also here's a couple of tools for testing that is great. Nose is a test runner that can run through your test suite. There's talks which will, you can use it to set up your virtual environments, run your tests and just tear down your virtual environments to the end and can do that for a bunch of different Python versions. And there's also PyTest which provides a lot of similar functionality. So here's an example of a talks.ini file that you can find in CLI where what we'll do is all you have to do, it's quite simple. You just have to list the different environments that you want to support. So for example here we support two, six to three, four. And then all you have to do is specify what the dependencies are such as there's a requirements.txt saying what the dependencies for the tests are required and then the commands you want to run. So here we're just running those to run the actual test suite. And similarly we can have conditional dependencies for Python 2.6. And note that there's a requirements 2.6 specifically for the unit test to dependency. So finally you may be wondering how to handle like different operating systems. You only have like access to a Linux machine. How do I make sure that I'm running on a Windows? I would recommend using CI systems. The two good ones are Travis and Jenkins. Travis automatically links up to your GitHub repo. So it can automatically test a bunch of these different pushes that you make and make sure that it's supported across all the different Python versions. And then similarly for Jenkins, what you can do is you can launch an EC2 Windows instance with Jenkins installed and you can run builds up for your tests based off what got recently got pushed to your repo. So in conclusion, some of the major topics that I would like to reiterate is to use 6 whenever you run into a Python compatibility, Python renaming or also determining like what Python version you're running on or determining what string type you're dealing with. For string types, remember that mixing the two are bad, there's different naming conventions across the major Python versions and the best way to handle it is to ensure that they're all the same at the very beginning of the program. File handling, one of the major issues we'll run into there is that if there might be undocumented behavior that you may not be aware of and the best way to avoid it is to limit the concurrent interactions with it. And then with limitation handling, what you need to do is oftentimes just backport the functionality or compensate for the functionality and then hide it whether if you're importing it differently or if you are, or if you're trying to use LSP to hide all the functionality differences. And then finally, you wanna also test as closely as possible to reality. This will oftentimes catch a lot of these different issues you may run into. And then finally read the documentation. I said finally a bunch, but they're oftentimes it's just the one liner and they're saying it's this Python version supported or this operating system supported. It can help you save great travels the next time you're writing your project. And thanks again. Here is the GitHub repo for the AWS CLI. Here is my profile. I think I'm about over time on the, I've got 30 minutes though. Okay, okay. Yeah, so here's my GitHub repo and my profile. If you have any questions about any of the projects I work on, free for you to ask me, I'm gonna be here all week and thanks again. So we have now some more time left, any questions? I've been using AWS CLI for a while and I was finding it quite hard to get used to the help like handling with handling the documentation and what's available on the current line because usually when you work on Unix-based systems then the convention is like dash dash help at the end of the command to see the actual help and then man as for manual pages. Is that because of the compatibility with Windows systems that you decided to use like just help at the end and no manual? For that, it was just, I think it was just the implementation detail. It was like how we decided to design it. I believe we just decided to use help. Just, I mean, Windows doesn't have a man page, right? So having a help is intuitive in that case. Okay. There is also the Python Future library. I think it's the alternative to six. Yeah, futures, yeah. Yeah, but they have like how it's compared to six. I know that six is quite old and some rumors were that Python Future has some different approaches, I think. Yeah, for this example, I was just doing it based off what the CLI and a lot of projects we do. We just use six because it's just a very low weight type of way of handling these compatibilities. And for some of the projects, we actually do use features. For example, for Bota 3, we use features for some of our multi-threaded processes. What's the hardest to support? Like Windows or Python 2.6 or 3.2? I think it's just, I would say just getting it all, getting it all working on across all the different machines or different environments just start hard. I wouldn't say that one's harder than others. Certainly there's some, some Python versions have limitations and Windows have limitations, but it's just trying to get it all to work well because sometimes you may have logic for one that is, it doesn't work another, but then you change it and it doesn't work another machine and it's sometimes tricky, but it's always doable. I will ask you a question. If I am developing with some Python library that is in Debian or in Ubuntu in a package and sometimes they update it, just the package will be updated and then I am not sure if it works so I can use virtual and as you thought. There are some Python packages that are dependent on other external packages that are not directly with Python connected, so some C libraries. How would you virtualize these packages? Can you repeat the question, I was kind of, can you repeat the question? How would you manage the different versions of Ubuntu or Debian packages that are not Python packages that are C libraries that get updated sometimes in the background and I'm not sure whether I can use it with the new Python package or not. Usually there's a, it doesn't have a version strap to it so you can actually walk down to a version that you feel comfortable with. Doesn't work always. Yeah, I'm not too sure about that. Yeah, I would just suggest that making sure that your strap down to a specific version that you're comfortable with is probably the best way to go about that. Okay, thank you. Any more questions? The only problem that I see is that in my opinion you need to have a clear idea of what you need to do differently for a different environment. It's not something that is going to come on its own so how can I know what I need to make differently for a different environment? I mean Windows works differently from Linux and 2.7 works differently from 2.6 or whatever but I don't know the differences so I cannot foresee them. How can I know them before I have to test them? Okay, well probably the best way to do that is to just, like I said, using a virtual LAN for spinning up an EC2 instance with an EC2 instance. That's a great way to just fiddle around with your project as you're developing it to see if there's any issues that you may run into as you are developing. No more questions? Okay, Kyle, thank you very much.