 In this video, we're going to look into the string data type, in particular in how we create strings and how we obtain strings from various sources. So let's open a new text file, a new notebook file, and rename it to text ipython notebook because we are modeling textual data here. So we have seen a couple of times that using the double quote notation creates text objects. So if I write in there, for example, random text and execute that, Python understands it. And the reason why is because that is the so-called literal notation to create text objects, to string, so-called string objects. So whenever Python sees the double quotes, a starting double quote and an ending double quote here, it knows that everything that is in between is part of the text that is to be modeled. However, the representation of the object that is created below the cell has a single quote. So you may wonder, what is the difference here? Well, there is no real difference. So double and single quotes are perfect synonyms. So you can use either double quotes or single quotes. You have to be consistent, but both are perfect synonyms. So which one do I personally use? I personally prefer the double quote notation for the main reason that here it is clear that this is a string, a textual object without any characters inside. If you use here instead the single quote notation, then maybe you can confuse that with one double quote. And if you work all day with code, then you don't like confusing things on the screen. So I prefer double quotes. And also a major reason why the double quote notation, I guess, is more popular in recent years in the Python community is because there is a very popular code formatting tool. A tool that automatically automates the formatting of your source code. And it's called black. And it's called black because it goes back to a quote that was given by Henry Ford. He said you can have cars. So Henry Ford, the car manufacturer, so he said you can have the car in any color as long as it's black. So in other words, there's only one color to choose from. And in a similar spirit, the black tool, the black code formatting tool, gives you back your Python source code in one and one only way. So here we have, and this tool simply prefers or uses double quotes instead of single quotes. That is why it is very popular in recent years. So let's check quickly the type of that. And it's, of course, also string as we could have guessed. And now what are further things that we should know about when talking about text-through data. So let's go ahead with another example, a bit more complex example, but not too hard. So first I make a markdown cell here and let's write a quote by Albert Einstein. So Einstein said, if you can't explain it, you don't understand it. Period and quote ending. And I hope in my videos that I kind of can explain what I want to explain. Otherwise, this would be an indication that I don't understand a couple of things. And probably some of the videos are a bit hard to understand because I still need to get more feedback from you in what are better ways to talk about content. But let's continue here with this quote. So let's say we want to model this quote here, this entire sentence here in memory, in our computer's memory. How could we do that? So let's simply go ahead and first of all copy paste it. And let's try to use the double quote notation to create a text object. So let's simply paste the sentence in between. And we can already see with the weird coloring here that we have red here and black here, but this is an indication that Jupiter lab does for us that Python will probably have a hard time understanding that. So let's quickly run the cell and indeed we get a syntax error. So syntax error means that Python cannot even read what we are giving it. And the reason why is because it thinks that the first double quote here indicates the beginning of a string literal. And then anything that comes after it should be part of the string's value. And then the second double quote here is indicating the end of the string. However, it is not really the end. The line goes on and that is the trouble here. So what can we do here is we can give a hint to Python that the second double quote here should not be the end of the string. So how do we do that? We use the so-called escape character and in Python this is the backslash. So backslash quotes basically means that the quotes here have a meaning other than the end of the string. And in general, the escape sequence, so the escape character, implies that the one character following has a meaning other than its literal meaning. So in this case, it simply means put a quote there as part of the value of the string object. And of course, we will do that here as well for the second double quote inside the sentence. And now if I run the cell, it works. Python understands it. We don't get a syntax error. And what now Python gives me back is it gives me back a representation of the object that is now in memory, but it defaults to single quotes. So just like above, whenever I enter something with double quotes, Python usually defaults to single quotes. There are a couple of exceptions from this rule, but usually we get back a notation with enclosing single quotes. And Python does that here as well. And now what we see in this notation is now the single quotes here are escaped. So backslash single quote means we escaped the single character. And the double quotes, of course, do not need to be escaped here because Python can clearly understand that if we begin the string object with a single quote, then double quote is simply part of the value of the string. So let's copy paste everything here into another code cell just to prove the point that the representation that we see here is a so-called literal notation, literal meaning we can simply copy paste it as it is into another code cell and it works. It is a valid Python code. So this creates another object with the same value. Okay, so now you may be confused. You now see that below the code cell we have these weird characters that we actually don't want to be part of the text that we are modeling. So the thing is, the backslashes here, they are only part of the syntax in the source code to make this happen. They are not part of the semantic value of the object that we create. So remember back in chapter one, there was a video that I did on object orientation and that three properties every object has and one of them being the semantic value. And the semantic value is basically what the object means to us as humans. And to us humans, the backslash, the escape character, the backslash here has really no meaning. It is really just syntax to not cause an error here in Python. So what that means is if I take, for example, this entire piece of code here, which is valid Python code, and I put it in between the, at the argument to the print function between the parentheses, then we are now printing out only the value of the text object, of the string object. And now what we see is the weird backslashes, they are not part here. And also as a minor detail, there is no red number seven here. But if I go back up here, there is a blue five and a red five and a blue six and a red six. But here is only a blue seven. The reason why is these two cells, they give me back some object, some reference to an object in memory. And this cell here does not because the print function has no return value, it returns none. Instead, the print function has a side effect. And the side effect is in this case, it prints something to a screen and it just happens to be both below the code cell, right? So up here we have a so-called literal notation that we can copy paste into a code cell. And then it's valid Python code. This is what usually the text representation of an object in memory is. Instead, if we simply copy paste the value of the string, then I get a syntax error again. So the value of a text object is not necessarily valid as a Python expression. And what the text representation of an object is that it's valid Python. We will see very much later on in this course when we talk in chapter 11 about classes and how we can design our own data types. And we will also among others talk about the text representation of objects. And the built-in objects, the string object in this case, they also have a text representation. And in this case, Python's text representation defaults to the single quotes. Okay, so note that the value of a string is not necessarily what you can read here in the text representation of a string. Okay, so let's go ahead and see further escape sequences and a little bit of more syntax before we can go on with other stuff. So let's say you want to go on and you want to write a list of things and you want to separate that into several lines that you want to see. So let's write here a list of colon and let's now break the line and say step one, break the line and say step two. And let's do it one more time and say step three. So let's say I want to model a text object that contains a list of things, right? And yeah, I just want to model that as a string. So let's do that with the double-core notation. And again, I get a syntax error and it says EOL, which is short for end-of-line while scanning string. So the end-of-line is simply here because we have a new line character. So I'm breaking the line right here and that is the end of the line. And Python, for syntactic reasons, when we start a string using either a single quote or a double quote, does not expect that. It cannot handle that. So it expects that everything, the entire string as we see, must be on one line in the source code. Okay? And so how can we fix that? So there is an easy fix and the easy fix is to simply use the so-called triple-double-quote notation, or maybe simply triple-quote notation for short, which we have seen previously when we define functions and we use the triple-quote notation there to implement the doc strings, right? That document the input-output relationship that is behind the functions that we have implemented. So if I now execute the code, now I don't get a syntax error, so instead Python understands it. However, as we see, Python gives me back a text representation where we have these backslash n's in there, right? And a backslash n is simply an escape sequence that indicates a new line. Okay? So just like backslash single quote or backslash double quote means put a double or single quote within the text, the backslash n simply means break the line. But only this regards only the value of the text object. So where do we see that? Well, if I go ahead and I copy-paste that into a new cell, and let's say I pass that as the only argument to the print function, now I see a list of step one, step two, step three on four different lines, okay? So now this string, which is a so-called multi-line string, is actually shown correctly as we would expect it, okay? So again, this here is just the text representation of the string object in memory that we can use to simply go ahead and copy-paste that. And we see that we get back in this valid Python code. So we get back another object which happens to have the same value. And of course, if we use the print function, and we go ahead and we print this, and we copy-paste in this one line notation here with the backslash n's, then we also get back exactly the same output as above, okay? So we really have to get used to the idea that the way text is portrayed in syntax, the text representation, that's the formal word for it, is different from the semantic meaning of what is inside the text, what is being modeled by the text. This is what we would see as, you know, printed out when we use the print function, for example, but we could also do different stuff, right? We don't have to simply print out to, in this case, below the cell, but we could also print or write to some file, right? And maybe when you write to some file later on, you also want to break a line in there. And this only regards this file then into which you write. It does not regard the object as it exists inside the Python process, okay? So get used to that. It looks maybe a bit weird for now, but you will get used to that. It's not too hard. Okay, so backslash n is just another escape character. Okay, so what else do you need to know about the triple double quote notation? Well, obviously, you could also use a triple single quote. So if we say, let's say, a, b, c, if I enter that, we get backslash n, a, backslash n, b, and so on. Why do I get a leading and a ending backslash n? Well, obviously, we have a line break here. So right here, there's one line break. And also down here, we have a line break. This is why we have the backslash n's here and here. And this is why you often in code see simple dot strip here. So we have seen the strip method before. I think the last time was in the video when we covered modeling interactive games, guessing games. So what does the strip method do? The strip method simply removes leading and trailing white space. So this is something that you often see in source code. And now you could actually go ahead and you could probably put in some more empty lines. This will basically not make the value, will not change the value in any way. And now you can format everything nicely in terms of text. But the backslash n's between the a, b, and c, they will not go away, of course. They are, of course, part of the value of the text object. So this is everything that you need to know about the so-called different kinds of literal notations. So there are two kinds of literal notations. It is the single quote and a double quote. This is one kind of literal notation and the triple quote notation from so-called multi-line strings. This is another built-in literal. Of course, you must not forget we can use the str constructor. So let's say if I pass 7 to the str constructor, I get back 7 as a string object. That is almost trivial by now. Then, of course, just as a reminder, there is a built-in function called input, which takes as an argument a prompt, which happens to be a string object. So maybe I say enter something. And if I do that, then a text box opens and I can enter text. And then the return value of the input function is simply the text object. But as we see, it's a proper text object using single quote notation here. And if I go ahead and let's say I put in one single quote as a text, or maybe I say you don't know that, then what I get back is a literal notation using double quotes because Python is smart enough to know that I don't have to backslash that here, escape that if I use double quotes instead of single quotes. So the problem with the sentence above was that we have both single and double quotes inside. But if we only have one of them, then Python defaults to the other one to use as the delimiters here. So input function is also a source of how we can obtain string objects. Let's look at another very important source where we can obtain string objects. So what we see here on the left-hand side, I created a file which is called lauromepsum.txt. I can open that with a double click. And this is simply a file holding textual data. And so now I show you how you can open a file in Python. We do that using the built-in function called open. And the open function takes as its argument a string, as its first argument a string with a path to the file. So let's put there simply lauromepsum.txt. And now if I execute this cell, what I get back is a so-called textIO wrapper object. So this is a special object of type textIO wrapper. So this may be a bit weird for now, but let's store the object in a variable. Let's call it file. And now you already may guess what the IO wrapper object is. Well, it is really a model of a file. And how does this work? So maybe I'll show you a little diagram in memory of what happened in memory as I executed the cell. So when I run the open function, then what happens? Python will create a new object. And let's put here as the type of the object simply IO wrapper, IOW maybe. And it has, of course, a couple of ones and zeros in here. And what the ones and zeros here do is they do the following. Let's assume that this is the border from the operating system to the Python process. So this is now OS world, the operating system on your computer. And the operating system not only handles the random access memory, the so-called RAM memory, basically the short term memory of your computer. It also handles what happens on disk. So let's assume somewhere the operating system gives you access to a file that lives on your computer's hard drive. And the file contains, for example, textual data. So let's maybe call it TXT something. It's a text file. And what this IOW wrapper object does is really just a pipeline that goes from inside the Python process to the outside world. And this way, the Python process can look into the outside world and obtain data. In particular here, it can load in data from a file on disk. Okay. So this is what happens in memory. And in the example, I gave it the variable file. So we also should put the variable file here. Okay. So this is what has happened so far in memory when I execute this line here, the line open and then a path. So why is opening a file so weird? So the reason for that is I haven't so far opened any or I haven't so far read any of the content of the file yet. So the only thing I have done is I have created this pipe to the outside world, which gives me access to a file on disk. However, I have not yet read in anything from disk so far. So how can I do that in Python? So I could do that one way of doing that is I could simply loop over the file object here. So I could say for line in file and let's simply say print line. And this will show all the texts that we just saw in the text file that I opened in a different tab here. So this is now one way of how to load in data from a text file. So you may wonder why do we have to go so far to make that happen? Why couldn't we just load in all the text data right here at once? Well, the reason is quite simple. Usually, so this here is the modeling the Python processes memory and anything, any program that is running on your computer, so-called processes, they only occupy basically the space inside your computer's short term memory. Okay, this is the memory that gets erased when you restart the computer. Right here, I'm basically saying, okay, this is supposed to be the disk. So this is basically what is not erased when you restart your computer. And now the problem is this, the disk space usually these days is goes into the terabytes already or at least gigabytes. So in other words, the space here in terms of how big the space is is very big. Your computer's working memory, the RAM memory is usually a lot smaller compared to the disk drive, right? So RAM is usually very expensive. Hard disks are usually very cheap. So therefore, it could happen, it could be the case that the file that we want to read is simply too big for our computer's random access memory. Therefore, the Python core developers, they decide the open function such that we only get a so-called proxy to the outside world. And then this proxy here allows us to obtain, for example, a line by line. Okay, so we are only reading in a line at a time. And we do that by iteration. Okay, so in a previous video when we talked about the difference between abstract and concrete data types, I introduced the term interval. An interval is any object over which I can loop, for example, with a for loop. And obviously, the file object here, I can loop over it. So what do we learn from that? We will learn from that that file objects are modeled as some sort of a proxy to the outside world and we can iterate over them. So they are iterables. Okay. And so we are iterating over them using the for loop. And we're simply looping one line at a time. And now it happens that the iteration variable, the line called line here, also exists after the for loop is done and it is still set to the value of the last line. So let's do that. Let's look at that. And as we see, it is still set to the value of the last line in the text file. What we also see here is a backslash N. So every line in the text file that we just opened ends with a new line. So maybe I open the text file one more time here in the tab. So we see that the end of almost every line here is simply a new line. So these, the new line characters, they are also read in somehow. And this is what the backslash N does. And we have seen above when I print it out, when I say here print line, the backslash N, these special characters, they go away when printed out. However, we see that the variable line points to a object of type string. So let's prove that it is a type string. So type of line is of course string. And what we see from that is that every line in the file when we loop over it becomes a string object. So this is just another way of how to obtain string objects. So above, we looked at the way to use a constructor function or to use literals. And now we, by opening a file, we have a third way of how to obtain string objects in the Python process. Okay. So before we end this video, there is one thing that I want to get across. So what we did now, we looped over the file object. So we looped over the proxy, right, over this proxy object here. And what the proxy does is it really isn't it. So I throw it here as a pipe to the outside world. But what this proxy really is in a bit more detail is like a cursor. So in other words, at first the cursor looks at the very first spot or at the very first character inside the file. And now if I make one iteration in the for loop, what happens is this cursor will move. So let's do it like this. We'll simply move to the second line, right? And so on. And as we loop through the for loop that we have in the example, at some point we reach the end of the file. And that means our cursor that goes from the proxy object is now at the end. And now one thing you need to understand is, if let's say we go ahead and we go ahead and loop one more time over the file object, we get no output. Why? Well, the reason is because the cursor does not automatically move forward. So there are two ways of solving this problem. One, there is a method on the file object, which simply would put the cursor back to the beginning or to any other spot where we want to put it. Another way of doing it is to simply create another proxy object, which then automatically is started by pointing at the beginning of a file. So when we loop over files, we must always remember that the file object is a proxy to the outside world. And it not only points at some file, but it points at some spot inside this file in particular. And it always remembers that. And this way we can basically move through a file line by line or one character at the time or however we want to move. And that is how this proxy object works. And now there's one more thing I want you to understand. So for technical reasons, any operating system can only obtain so many open files at the same time. Usually it's around about 1000. So let's use the file object here that we still have. And let's call or let's look at an attribute using the dot operator. So I use file dot and let's use the closed attribute. And the closed attribute gives us back the value false. So the closed attribute tells us that the file is still open. And we shouldn't do that. So when we are done reading a file, we should close the file because again, for technical reasons, your operating system can only handle so many open files at the same time. And so whenever you are done with some file, the good practice is to simply close it. How can you do that? Well, you can simply go ahead and say file dot close. And now we are calling the close method on the file object. Before that, we access the closed attribute on the file object. And now we are calling the close method on the file object. If I do that, and of course it does not take an argument, it takes no argument. We're calling it like that. And now if I go ahead and ask Python, hey, is the file closed? It will now tell me yes. And now if you go back and we try to loop over the file object one more time, I now even get a value error. So what did the calling the close function do? Well, what it really did is this cursor that points to the outside world will now simply be removed. So the file object here, the proxy object is still existent in our computer's memory, but the cursor going to the outside world is now removed. So if we try to now do anything with this object here, then we will get an error message. In particular here, we will see IO operation on a closed file. Basically this means the file is closed. We can't do anything with it anymore. So now this file object is totally worthless. So the only thing that we could do now is we could simply create another one using the open function again. And now we can basically loop over everything as before and properly close it. There is a nice syntax that we have not yet seen in this course, which is the so-called with statement. So with is basically what we call a context manager. And let's do that to end the video. Let's open the file, but not simply by using the open function just like that, but by using the open function together with the with statement. So we say with open and now we will say as file. And now what I will do is I will go ahead and I will say for line in file, print line. Okay, so this is the same code that we executed above here. But now I'm executing this code as we see in the context of the with statement. This is why we call this the context manager or a context manager because the code block here is indented. So it's one level of indentation. And that means this code block here, the for loop is basically executed in the context of this context manager. And for the open function, the context manager has only one purpose and the purpose is to automatically close the file so that you don't forget it. So if I go after the with statement and if I ask Python, hey, is the file closed? Just like this. So we know that this is not this code here is not inside the context anymore. It is after the context because it's not indented. So if I execute this code here, now I will see all the content of the text file. And also we get back through here because the file is now closed. So it's automatically closed. We don't have to manually close it. So that's a nice trick to know. And we will see a couple of other examples of the with statement in future videos. For now, simply treat it as you can understand it here. It is simply executing some code in its context. And when concerning the open function, it is simply closing the file automatically so that you don't have to do it, nothing else. So it's the same as above, but we don't have to manually do it. Okay, so we have learned quite a lot here in how to create different ways of creating text data, different other sources of how to obtain textual data, in particular the input function that gives you back text object and also the open function that gives you back a file object, which then gives us back string objects. So these are various ways of how to deal with textual data. And also remember that there is a difference between the syntax of a text object here to create it and also its value. And its value is the thing that we see when we print it out. Okay, so there are a couple of subtleties in this video that you should understand. So in the next video, we will see how the string data type has also some nice abstract properties that the list data type has. It's very much related to the list data type and we will do so in the next video. So I will see you then.