 It's important to understand that in Python 3, a string, the stir type, is an immutable sequence of unicode characters. Immutable, again, meaning once the object is created, it can't be modified. So you can't take an existing string object and capitalize all the letters. You can take an existing string object and from it produce a new string object which has all the same characters but capitalized, but the original object cannot change. And when we say that the string is made up of unicode characters, it's important to understand that how those characters are actually represented in memory, the encoding, that's left up to Python. It's basically an abstract away detail. And as far as you're concerned, just like a list is a sequence of items, well, a string is a sequence of unicode characters. The first character is at index 0 and the last character is at the index which is the length of the string minus 1. Now, of course, when we take a string and we write it to a file or if we take some existing file and we read its data and get back a string, in those cases, then encodings matter. But the way Python handles that is that when you open a file, the file object itself has an associated encoding. And you don't have to specify one because by default it assumes UTF-8. But when you need to, you can open a file specifying some other encoding. And so any time you read or write from that file, then the data you read from the file will be unicode strings produced from an assumption that the file is encoded with that encoding, with UTF-8 or UTF-16 or whatever. And when you write to the file, the strings you pass in, those unicode characters will get encoded as bytes and written to the file using the encoding of the file object. So, if you have some file which is encoded in, say, UTF-16, but then you open a file and specify a different encoding, and then you read from the file, well, the strings returned by the file's read method are going to be messed up. It's not going to be the correct text data because Python is making assumption of the wrong encoding. Now, if you want to deal with raw byte data, as we sometimes do, especially when reading and writing files, then for that purpose we have two types, bytes and byte array. The only difference being that bytes is an immutable sequence of bytes, whereas byte array is a mutable sequence of bytes. So, with a byte array, you can actually modify the existing bytes of that object. If you wish to read and or write bytes to or from a file, you open the file but specify to open it in binary mode, and then the read method of that file object will return bytes objects, and the write method of that file object will expect you to pass in either bytes or byte array objects, not strings. So, it's very important to get this straight, the distinction between string objects which are sequences of unicode characters and bytes objects and byte array objects which are sequences of bytes. It's very easy to get confused about these differences because when we read and write strings from files, the strings, of course, in the file are represented as a sequence of bytes, so there's some encoding involved. Unfortunately, the way things work in Python 2 tend to make this much more confusing. In Python 2, the default string type, str, the type of object you get when you express a string literal, that is not actually a sequence of unicode characters but rather just an immutable sequence of bytes. So, it's actually the semi-equivalent of Python 3's bytes type. If you wanted an immutable sequence of unicode characters in Python 2, there's a separate type for that called unicode. Remember that when Python was first created, unicode barely existed and it didn't really take over the world until a good decade later. So, it kind of made sense to conflate strings representing text data with strings representing bytes because text in those days was almost always ASCII and ASCII is generally one byte per character. Python 3 very sensibly corrects this mistake and makes string into a proper unicode type. Unfortunately, it's this change in the basic string type which is the major obstacle of converting Python 2 code to Python 3. If you take some Python 2 code and you want to go through it and rewrite it for Python 3, your biggest headache is probably dealing with the logic behind strings because if you're not careful, you can quite easily end up mishandling some text or binary data. In any case, if you ever find yourself getting confused about the nature of strings and binary data, I suggest you focus first on understanding how things work in Python 3 because Python 3 is much more sensible. Even though most code out there is still written in Python 2 rather than 3, the way it handles strings and binary data really is a mistake. Now, when you write a string literal in Python, you can do so either in single quotes or double quotes. It doesn't matter, the only difference being that within single quotes you have to escape a single quote character, otherwise it would be mistaken for the end delimiter of that string. Conversely, if you have a double quote string, then you have to escape double quote characters rather than single quote characters because, again, otherwise it would be mistaken for the end delimiter. Here in the top example, we have a single quote string and so the single quote character is escaped but not the double quote character and then in the bottom example, it's a double quote string so the double quote character is escaped but not the single quote character. Python also has what it calls triple quote strings which are enclosed in three quote marks, either three single quote marks on both ends or three double quote marks on both ends. I generally favor using triple single quotes. The special thing about a triple quote string is that, well, first we don't have to quote individual single quote or double quote characters, only in the rare case that you might have three consecutive single quote characters or three consecutive double quote characters where you have to escape any of them. And the other special property of triple quote strings, the big selling point is that they can be spread across multiple lines. So if you start a triple quote string, that string you can put as many new lines you want in it and it only ends until the triple quote end delimiter is encountered. So if you have some big chunk of text you want to make a string literal, it makes a lot of sense to write it as a triple quote string rather than a single quote or double quote. Though, of course, generally it's considered bad practice to embed too much string data into your code. If you have a whole bunch of string data, like really big chunks of text, generally then you would store them in the database or some external file. And then, in the course of your program, read the text data from that file or from that database. Generally speaking, code files should contain code, not text data. A doc string in Python is simply a string which documents some class or function. It's a string which gets assigned to the special doc attribute of a function or class object. And it's a string which is returned when you invoke the built-in help function on a class or function object. So here, for example, we define a function foo and then we assign to its doc attribute a string. And then if we invoke help on that object, then it's going to return that string. Python, however, has a special syntactical allowance for attaching the doc string to a function or class. If you write a string as the first line of that function body or class body, that becomes the doc string. And because these documentation strings might be multiple lines long, we must commonly write them as triple quote strings. This is actually the most common use of triple quote strings. Note that the doc string is written with the same indentation as if it were just a normal line of code. A very common thing to do is to take a number object and get from it a string or take some string and parse it into a number. To convert a number into a string, we simply pass that number to the string constructor. So here, stir 3.5 returns a string of three characters, reading 3.5. And likewise, if you invoke int with an argument string of a single character of the numeral four, then that returns an actual number object representing the value of four, an int object. And similarly, float with an argument string of 3.5 then returns an actual float object representing 3.5. And then in the last example here, notice if we try and get an int from the string 3.5, well 3.5 isn't a valid int value, so this will raise an exception. So do understand that when you parse an int or a float, the string text must only consist of characters that can be parsed into a number of that type. Though float is more forgiving, float with an argument of just the character four returns a float value of 4.0. And also any white space surrounding the number in the string, that's okay, it's not going to cause an exception. Any other additional text however will trigger an exception. Now when converting between booleans and strings, well if you simply pass the boolean value true or false to the stir constructor, then you get predictably a string reading true with a capital T or false with a capital F. Surprisingly though, going the other way, if you try and pass a string to get parsed as a boolean to the bool constructor, you will get back either true or false, but what's really going on there is simply that well, if the string has text in it, it'll return true because non-empty strings are true, but if the string is empty it's false because Python considers empty strings to have the truth value of false. So there isn't actually any inbuilt way to parse booleans expressed as strings back into booleans. It's really not something you want to do very often, but if it is something you want to do, you basically just have to write your own function that simply checks the string and says well, if it's equal to a string reading true, then return true. If it instead matches a string reading false, then return false. Somewhat similarly with the built-in collection types, yes, you can pass them to the string constructor and get back a string representation of that collection. However, there isn't really a built-in means of going the other way. The closest thing to it is that there's a built-in function eval short for evaluation, which takes a string representing any Python expression, and then it parses that Python expression as Python code and evaluates it returning the returned value. So here if we pass in an expression that looks like a Python list literal, then of course we get back that list. However, using eval like this is generally frowned upon because eval is considered quite dangerous because it accepts any arbitrary string and just executes it as Python code. And that inherently is just a dangerous sort of thing for programs to be doing because it means that you have to be very careful of what kind of string ends up getting passed to the eval function. If you're taking user input and passing it to the eval function, then you've provided a potential attack factor where some malicious user can try and pass in data that then gets eval and in the course of that evaluation, you might end up invoking say some function you don't want to be invoked at that time and that can be quite dangerous. So when it comes to collection types and other kinds of complex objects, how exactly then you serialize those objects into strings and then deserialize them from strings back into objects, that's a pretty involved topic. Really there are just many different ways you might want to do that, each with different virtues, so it's not something we'll cover here. Now if you look at the Python string type, you'll find that it has many different methods for doing all sorts of commonly useful things you might want to do with strings. Going over all of them thoroughly would take quite a bit of time, so we'll just quickly go down the list and discuss what purpose they serve. First off, there's the special add and mole operator methods. The add method, of course usually invoked with the plus operator, will concatenate two strings together. So you take a string reading foo and a string reading bar and you concatenate them and you get together a string reading foo bar. And again, be very, very clear about Python strings is that they are always immutable. So this operation and all the ones we'll discuss subsequently, they never actually mutate a string, they never modify a string, they rather just produce some new string which gets returned by the operation. So here the add method, when we say that it concatenates two strings, it doesn't modify either of those two strings, it produces from a third string which is returned by the operation. The mole method, the multiplication operator method, will multiply a string times a number producing a string which is a concatenation of that string with itself as many times as is specified with the number. So if you have the string foo and multiply it times seven, that will read a string reading foo foo foo foo foo foo foo. The original string foo repeated seven times. And again, this doesn't modify the existing string, it returns a new string. The original string object remains just as it was. Moving more quickly through the rest, the find, our find index and our index methods are all about locating sub-sub-string within a string. For example, you have some long string, anyone find if it has the word hello in it somewhere, then you pass as argument the sub-string which you wish to find. What gets returned is a number representing the index at which that sub-string is found within the string. So if the sub-string is found starting at index seven, then the number seven is returned. If the sub-string is not found at all, well in the case of find, it will return the value negative one, but index will instead raise an exception. That's actually the only difference between find and index is that index raises an exception whereas find will return negative one when the sub-string isn't found. And as for the our index and our find methods, those are just like find and index except instead of starting the search from the start of the string, they will look for the sub-string starting from the end of the string, on the right side of the string, so to speak. That's why they have an R in front. The starts with method is past a sub-string, and if that sub-string is found at the start of the string, of the string object, then it returns true, otherwise it returns false. Ends with works just the same except it tries to find the sub-string at the tail of the string, at the end of the string rather than the start. The methods is al num, is alpha, is decimal, is digit, is numeric, is identifier, is slower, is upper, is title, is space. These are all simple Boolean tests that determine whether or not the string conforms to some criteria. For example, the isAlpha method returns true only if all the characters of the string are characters of the alphabet. Otherwise, if there's a space or a number or some other symbol, then it will return false. For another example, the isLower method will return true only if the string contains at least one letter character and only if those letter characters are all lower case. Otherwise, it will return false. The methods upper, lower, title, swap case, and capitalize, these methods are all about modifying the case of the letters in the string. So upper, for example, will take the string and return a new string, which is just the same except with all the lower case characters converted to upper case. The methods center are just an ljust, all pad of string with extra spaces, either at the start or at the end or both. The center method is the one that pads spaces to both the start and the end of the string. You pass this argument the length of the padded string you want. So if I have a string that consists just of a 10 character long word and then I invoke the center method with an argument of 100, I'll get back a string which is 100 characters in length with 45 spaces tacked onto the front and 45 spaces tacked onto the end. And again, remember, we're not mutating the original string, we're producing this new string. The methods strip, lstrip, and rstrip are about removing leading white space or trailing white space, white space at the start of the string or at the end of the string. L and r here stand for left and right, so the lstrip method will just remove the white space at the start of the string, and rstrip will remove just the white space at the end of the string. The methods partition, rpetition, split, rsplit, and splitlines are all about taking a string and breaking it into constituent parts. So the split method, for example, you pass to it some character, a string that's a single character long, and then it will return a list of strings where each string is a substring of the original string separated by the specified character. So one common use is you have a list with a bunch of things separated by commas, you want to split them up into separate strings, you invoke the split method, pass in a string consisting of just a comma character, and you get back a list of strings where each one is something that was separated by the commas in the original string. The method join essentially does the opposite of the split method. It takes a list of strings and joins them together. Confusely, though, the string on which you invoke the join method is the string which gets inserted between all the elements of the list. The list itself is what gets passed as argument to the join method. So, for example, if you have a list of strings and you want to join them all together separated by commas, then you take a string which just consists of the comma character you invoke its join method and you pass to it that list and that will return a single string with all of those strings in the list concatenated together but separated by commas. The method expand tabs will take all the tabs in a string and convert them into spaces, the number of spaces specified by the argument you pass. The format method is really quite complicated but it's basically about interpolating values into designated places within some string. It's more involved than I can explain here but it's very, very useful. And, finally, the encode method will return a bytes object. So, again, our string is a sequence of unicode characters and when we invoke encode, we specify some encoding and then the encode method will return a bytes object representing that sequence of unicode characters in that encoding. So, those are almost all of the string methods. There's a couple I left out but that's most of them. If you want to see them described in more detail, you can look in the supplement and, of course, you can always look in the Python docs.