 Welcome back to 105. So today we get to talk about strings finally. It only took us like halfway through the entire course so Let us jump right into it because they sure introduce a lot of fun that we get to have So a string just an array of characters and Kind of like the exam question what I did before where I just had like a variable number of integers And all my arrays were different sizes I just put them up minus one at the end to indicate that that is last element and to not include that Same thing for strings. So they're an array of characters and then See we'll go ahead and add a zero byte or just the character Zero it's going to be the literal value zero because remember characters are just some magic numbers and For the purposes of this course We're just going to assume that they're ASCII included strings So you can go like ASCII table comm if you ever want to look up their value for some weird reason Basically, they're just characters represented on a US keyboard Because well, they were the first ones that did it so they got to name everything so In we've already seen strings before but didn't really know enough to actually explain them So string literals are the array of characters between the double quotes so for example if I have Double quotes hello world and then double quotes that is a string literal and It is an array of characters The only difference is that it is one bigger than it looks because there is a zero byte at the end to tell you that Hey, that is the end of that character array Other fun rules are that see well this string literal is actually stored somewhere in memory and The bytes used for string literals are actually stored in some place called read only memory Which means if I try to modify it I'm not allowed to and I will get a segmentation fault and that will be a lot of good fun to debug so if you write a string literal the result of that string literal is a pointer to the start of the string and All we need to know is that a string is an array of characters that ends with a zero byte So in other words, I can actually tell you these now the type of a string literal It is technically const char star So it's a pointer to a character and that can represent an array and this const here says You're not allowed to change the characters so I cannot reassign individual characters So just remember Pointer like array decay would apply in this and then it represents an array of characters Const again means we cannot assign the individual char characters So we will see an example of it really quick, but first Little thing to know is that there's a format specifier for C strings and if we want to use a string and something like printf some of you have discovered this already some of you may have not The format specifier is just percent s. So s stands for string kind of one that makes sense So if I do printf String space string and then a new line while the first one gets replaced with hello And then we get a space and then the next one gets replaced with hello with world So I see hello world and then a new line. Yeah, sorry. Oh, whoops. Yes. There should be a comma right here I will fix that later. Yes. Yeah, there should be a comma there just like any other printf so C goes ahead prints all the characters in order up to and not including that zero byte So this would be like H e L L O and then a zero byte But C is smart enough or the way they defined it is that it will not include that zero byte as part of the input and You might be tempted to use percent s in scan f We will go over why to not do that because that is a as we will see soon a terrible idea And in order to we had to understand dynamic memory allocation before we can actually say that's a terrible idea So first off so If we go ahead and run this so I'll have an example so you cannot modify values that are part of the string literal So if I have like chart like pointer to a char and I call it s and I sign it equal to the string literal Hello world while the string literal is a pointer to the beginning so s is a pointer to the beginning the capital H And I can go ahead and use it as an array if I want it can do all those pointer arithmetic Then if I try and do Access the element at index zero and I try to let's say I try and make it lowercase here I am not allowed to do this so if I go ahead and I compile this program and Then I try and run it It compiles fine doesn't really complain at you, but if I try and run it I get a segmentation fault. Yay my program dies immediately and You know we saw Valgrin before maybe it can help us out. So if we run that It says process terminated with some things we can't understand bad permissions for something Just a lot of gibberish for now, but basically says line 6 failed. I'm not allowed to write to it so I will show you Something you could do to actually turn this into compiler error. So we know that hey, this is read-only memory It's like cons char star. So if I actually just wrote a cons there Then I actually get a compiler warning here and it says error cannot assign read-only location s and We get a compiler error instead of a segmentation fault when we run it and we can actually go ahead and debug it and things Act a lot better. So One thing you should definitely do is if you are assigning like a pointer string literal you should Use the type cons char star so that you don't accidentally break that rule where you're not allowed to modify it So everyone good with that So everyone if I use a string literal I will use cons char star and then that will save you a lot of headache so There are some strange rules for it, but here like I said you should use cons char star for string literals But turns out you can just create an array in a function and then you can modify that string because there are special rules For when we use something like char like if we define an actual array and When we define an actual an array those bytes are on the stack So they exist as long as our function exists. So they do not Apply or this read-only rule does not apply to them. So if I say like char s and then an array You know, I could just Write out the native array initializer myself put in values and then that stored on the stack. I'm allowed to modify it I'm allowed to do whatever I want for it. So there's a special rule for string literal. So if I say Char star and then the empty brackets for like C Please figure out the sizes array for me If I do equal to string literal C will go ahead and then copy all those char values to that array And then it will be on a stack and then you are allowed to modify it. So if we do this So all I did I changed the Declaration of s to just not be a pointer. So it won't point to that like read-only memory I create an array and I go ahead and then assign it equal to hello world It gets copied and then because it is a copy of it and it's on the stack I can go ahead. I can modify the values to whatever I want. So I go to index zero Which is this capital H. I change it to a lowercase and then suddenly I lowercase everything So if I run that whoops can modify I Get the lowercase hello world. So questions about that Yep Yeah, in the previous example when I had here this means that I have a pointer to a Character that I cannot change Yeah, so there's a good question here like there's two things. I can't modify right There's the pointer and then there's what the pointer is pointing to so you are allowed to use const in different ways, so turns out that if I want to like Here I could say Whoops say I do const char t equals Test something like that. I could do s equals t. That's perfectly fine, but if I do that That means I'm not allowed to change the pointer and I'm not allowed to change what it's pointing to and I'm the right person asked for that because that was what I did for too long and No one really knows these rules. So don't worry about them too much People that have been programming for years. Don't really understand it. Just know just write this for string literals and The compiler will complain at you instead of seg faulting All right. So here is this example. I declared on the stack. I modify it So I'm changing this capital H to a lowercase ace perfectly fine I'm doing it on the stack everything. We've learned about arrays Still apply I could iterate over it. I could use that macro for array length all that fun stuff so If I use that array length macro see knows how many bytes to copy from the string literal So if I do like that char star array or sorry that char array called s And I assign it equal to hello the array length is six not five Because well, there's five characters. So h e l l o and then because it's a C string There's always a null byte. So there's always a zero byte at the end and it gets copied as well So the array length is six. So this would be the exact same thing as if I just did char char s Which is an array and then I set it equal to the characters h e l l o and then Backslash zero to get a literal zero. Yep. Sorry like if I just did this with char star No, so if you took this to char star You would have this example like you'd be equal to a string literal and then that points to read only memory Even without the const. Yeah, so this Before it didn't have a const here, right, but it was to read only memory. So when I tried to modify it I got a segmentation fault So little weird behavior here, but Yeah Yeah, so the question is but bit of an implementation thing So if I just have this string literal is it still stored and read only memory and the compiler just copies it for me And the answer to that is yes, so it would get it would get stored and read only memory Compilers also smart that if you had like Like 10 string literals that all said hello, it would only just make one of them. Yeah But that's more of an implementation thing that That's like a compiler course how a compiler works. We don't really care. We just hope so any questions about this? Yeah Why we need what? So why we need the null byte is because well For like the printf function, we just give it a pointer, right? We don't say how many elements are in it So what they decided is instead of you having to say how many elements are in the string They just put a zero byte at the end and that means the strings done So in this case, I don't need to know that the length is five if I want to print the string I can just you know print the H like just keep on going While I don't see a zero so I could just say print the H print print the e print the L print the L Print the zero. Oh, it's a zero. I should stop and We'll see where this gets us into problems All right, any other questions with that? All right, so if we go ahead and just create a larger array So if I did char s8 and then hello So hello is five characters plus a new line So I only need six but the same rule applies as before when I had things shorter when I did it with 2d arrays Same thing with 1d arrays. So if I say this that's equivalent to it will just pad it out with zeros So if I did that the last three elements would all be zero. It's just all zeros at the end It turns out doesn't really matter because it wastes a bit of space, but like the string still works So it would just be h e l l o. Oh, I hit the zero. I'm done. I wouldn't access anything. Anyways, doesn't really matter Just wasting space So like I said just stopped when it hits the first zero others don't matter That's okay The other problem is a much larger issue So if I have a smaller array then what can actually hold it that causes issues So if I have char s8, so I only hold eight characters and I have the string Hello world. Well, this is h e or one two three four five six seven eight nine ten And then eleven for the null byte So it would be a lot bigger than just eight So if I go ahead and run this Well The only thing that actually gets copied to this array is hello wool So the first eight characters and it wouldn't be it wouldn't have a new line or anything or sorry It wouldn't have the null byte or anything like that. So it turns out When I execute that I just see random stuff after it, right? So I'm essentially reading in valid memory. So This printf function will just keep on printing out characters until it hits a zero turns out that in my array I didn't have enough room to put a zero. So it just didn't exist. So it will read The first eight characters and then just reads random memory until it randomly reads a zero and then it stops so in this case I wrote read a Tilt whatever the hell that is back tick thingy and then a bunch of question marks And then if I run it a few times. Oh, I got some question marks any of the time I see question marks. It's basically it doesn't know how to print it. It's not a natural character So I just get like random. Oh, I get some really strange one sometimes like an ampersand MT I'm just reading random memory. So it's all nonsense So this is one of the major reason reasons why Strings are hard because if you forget that null byte things just go off the rails really really really quickly and You know Valgrind is your friend. So if I tried to run that it would also tell me that oh In fact Valgrind doesn't really care. It thinks it's all good so Turns out it's even harder because in this case because we didn't mallock it or use dynamic memory Valgrind doesn't care. It thinks you know what you're doing. So you don't get any help. You just get some weird output so Knowing that we can write our own functions that use strings if we want So if we want to write a function to count the number of spaces Looks pretty similar to how we wrote functions before so we can just take a const char star So that is the string. We're going to count the number of spaces in Declare variable count. So that's where we're going to keep track of how many spaces we have encountered and then in our While loop well, we just keep reading characters until we read something that is a zero So as soon as we read a zero we want this to be false and go off and the while loop so If we just De-reference it that is the first character of the array so in the case that it is hello world we'd read a capital H and Then de-reference it again check what the value is is the value of space if it is we'd increase count. Oh In this case, it's capital H So we wouldn't and then we do plus plus s which would do pointer arithmetic, right? So that is like s plus one so that would move that pointer one character I'm just doing pointer arithmetic just for the fun of it And then I move one character and then I de-reference s again. So now that's the second character check if it's Equal to the null byte if it's not check if it's a space and increase count in the case it would be e So when go into if it would advance go to the next character Read an L not a space L not a space Zero not a space. Oh space is a space. So count would increase to one And then we'd read all these characters last character We would read is a D and then we would increment the pointer then we're pointing at a zero byte We would de-reference it and turns out zero is equal to zero So this would be false and then we would stop We know we're at the end of our string now and then we can return count. So any questions about this program at all so Like I said before We don't need to know the length of the array because we know when it ends because we encounter The null byte and that should signify it ends and we don't consider that as a valid character otherwise Everyone good with that? Not too much different than what we've been doing. I could write that a bit differently if I wanted to so this uses pointer arithmetic might be hard to read So we could also count just the number of characters in the string So maybe I write a function called string length that just counts the number of characters in the string That does not include that null byte. So I can just Declare a variable called I I'll use as an index and then I can just have a while loop that just says Wow s at index I does not equal to the null byte I can just increment I so I can just increase my index by one and I just keep on doing that until I actually encounter a null byte and then while I is actually equal to the Size of the string by the time I exit this function So if I use string length with hello, it would Increase it so we would start with zero Encounter the H then increase it to one encounter the e increase it to two Encounter the L increase it to three four increase it to four oh increase it to five and then we'd be done so Go back So here's our string length program if we give it hello should work We execute it this length of the string is five so five characters in the string, but we need six to store it right because there always has to be that null byte at the end and Keeping this in mind when you use like library function use other people's functions is Going to be very very important because if you miss that null byte then it just reads random memory And you are in a world of hurt Most of the designers of this Really really regret making C strings work like this because it has caused many many many a security issue because you just Forget that you need one more byte than the length of the string and you're screwed Or if you forget to put the zero byte at the end you're also screwed and life is very very difficult back so Let's go into Some other functions So there's another function called put s that we can use instead of print f if we want So put s is a bit easier if you're just doing string So it just takes a single string argument So you could use put s s and that's equivalent to this exact same thing as if we did print f Format specifier for a string and then a new line and then gave it the string. So it will just Go ahead and print the string directly for you and then put a new line at the end of it So you don't have to remember to do it So other things that We might want to use print f4 is you could use like the turns out the rounding that we used for Doubles actually works for strings, but it just kind of prints the first few characters So if I do if I declare a string s Equal to hello world if I do put s that should just print out hello there and then a new line If I do print f with just the string format specifier and n and I do some pointer arithmetic here So this is s plus six. So it will move forward six characters. So that pointer Well s initially would point to the first Character which would be the capital H and if I move it six characters. Well one two three Four five six that would have a pointer that starts at the t So it would print all the characters starting at the chi until it hits a Null byte which would be at the end of the string there. So this line should just print there and Then similarly if I do this format specifier, but I do like 0.5 like Round the string to five decimal places seems a bit strange, but I guess they Consider rounding a string to five decimal places is to just print the first five characters So that will just print the first five characters, which would be H-E-L-L-O and then stop immediately So if I run this I would see Hello there then there and then hello all on separate lines so Everyone all right with that ish Rounding a string seems weird, but it works Might be useful to you might not be useful All right, so now we get into the scan F So I put right on the slide red means I'm serious and you shouldn't do it. So we could Could but should not because it is impossible to write a correct program using scan F with strings So some rules with it. It might seem appealing at first scan F. It'll ignore some white space Before the characters the user enters So if you write like space space space John and then hit a new line you wouldn't get any of the starting characters you just get John and Matches any characters until a white space character So it doesn't matter if I do like enter or spaces if I did space space John space space space enter It would just be John just those three characters and it also Does us a solid and it will terminate that string with a zero byte. So it'll be a valid C string but Yeah, but the string produced won't have any white space characters But turns out this function is impossible to use correctly Because you don't know how much memory you need right you have to give it a pointer You essentially have to give it an array an array will have a certain size and you have no Earthly idea how many characters the user is actually going to enter even if you make it a thousand Well, as soon as they enter turns out exactly a thousand because it wouldn't have space for that new that null byte then you are screwed and You will never know so Let us use it because we can see that it is a silly thing to do. So here is my program. So here just to Simulate us having a more complicated program. I'm just going to define an integer called x Make it equal to 1 2 3 4 5 6 7 8 9 and then I will never modify it And I'll just print it at the end. So the last print I should see is x 1 2 3 4 5 6 7 8 9, correct All right, so let's see if that's true So here I will say prompt the user input your last name and then I will just create a Array of four characters so I can only hold four characters in this and then I will do a scan F and Then I will print off whatever is in that array So if I go ahead and I run this I Said my last name and let's say that I know I can't read for some reason. I just did John if I run that well, that's fine Because John fits in it that is three characters and a null byte. So it's exactly for it exactly fits into it It's all good prints John and then X is 1 2 3 4 5 6 7 8 9. Well John's not my last name So if I run this and I type my actual last name I Get the correct string because it does it doesn't know how much memory you set aside for it It always assumes it's valid so it put elfson and then a null byte but turns out the memory that it used for that was X and It just changed the value randomly and now Imagine debugging your program now Oh Yeah, good luck. You are screwed nice thing about this is If we went run valgrind on this one Let's see if it helps us Yeah, it turns out valgrind doesn't help us either because well again We declared it on the stack valgrind doesn't really care about anything you do on the stack So if I wanted to I could probably get some help if I did Something like this and I just mallocked four characters and Of course I Freed it and then I did size of So if I do that and run it well Turns out I'll just Over all like overwrite some other val like random memory in the heap and it will probably actually work this time and Yeah, I'll see elfson and then X unmodified But if I look at valgrind it tells me It's very unhappy Like it's just throwing a bunch of errors after errors after errors that are quite unreadable The only thing we need to know is that like invalid read invalid read invalid read invalid write invalid write That's basically telling us that hey we are accessing memory that we shouldn't This is like we're accessing invalid memory and the rest of it is fairly incomprehensible So let's rewind so moral of the story is that basically scan f impossible to use correctly so Do not use scan f with strings if you see this That is not good So what should we use instead? well There's a put s function Well, there's also a get s function so that reads all the characters until a new line But it doesn't keep the new line character issue with this is the same as with Print or scan f with just the string format specifier there's no way to know that how much space we need to actually store what the user inputs and That issue is actually called a buffer overflow and that has caused so many security issues and billions and billions of dollars over Over the years so what that term basically means is that The space in memory you set aside to store that user string Typically just some space that we're going to use to store Something the user inputs is called a buffer So basically just means some memory I want to set aside and then we call the memory we intend to like Use for that function a buffer. So in this case, we're using skit or get s So we give it some memory We would say s is like a buffer because we want it to write to that memory and because it might Write past the end of the valid Section of memory then it is called a buffer overflow. So I wrote past the number of bytes. I actually allocated for it So there's another function that is slightly better called f get s so f get s actually allows you to specify the size of your buffer and it will Actually check that it fits or it only writes up to that many characters And it won't overfill that buffer. So its API is well, it takes a string that it Pointed to a string that should be Valid memory so an array of characters that it can actually write to and then an inc size So how many characters is it actually allowed to write and that will include the null byte And then there's this weird file thing we haven't seen before that's called a stream That's this basically will work with files on your hard drive if we know how to do this But there is a standard one that kind of represents your terminal so string pointer to us to a pointer of size bytes of valid memory because a character fills one byte and then size is the integers Representing how many bytes to write at most by this function So if the user input something that is larger, it will just go ahead and ignore it and then stream in Order to represent the terminal there is a variable declared for you and one of the header files called STD in or standard input is what it is short for and it represents terminal input Declared in STD.io, so you should just use that so the produce string here will always have a null byte So remember that the size includes a null byte So there could be at most size minus one characters in the string So if I set aside four bytes of memory well It can use three of those for the actual contents of the string and then a null byte at the end so if I go ahead and I use And I use F get s this is what a program would look like that uses it So in order to not repeat myself, I'll just define buffer size is equal to four So I'll prompt the user again input your last name declare a Array of characters on the stack of size in this case four and then I can use f get s So I give it a pointer to the start of that array and then I say n is Equal to the buffer size so use at most four bytes because that's all the memory I have that is valid and then we use STD in to represent Actually, just typing something on your keyboard and then we just print out whatever the string is we get So now if we use f get s I input my last name and I actually input my last name Well, then I just see e y o because that's all that fits So I only have four bytes for it. I can only store three actual characters. The rest gets cut off so Maybe this is something you don't expect, but you know This is something you can actually fix Something you can actually debug and you know if I wanted to I could go ahead and just increase this If I change this to an eight could I input my last name? Yeah No, my damn last name causes problems all the time So if it's only eight so my last name is one two three four five six seven eight characters But again needs that new lot or needs that null byte at the end. So if I do that, it's else so and I Was so close. That's my username here elf. So and then the number. I almost got my last name Kind of disappointing. All right So any questions with f get s? Yeah, so the question is what is std in according to this std in is std in You will actually learn what it is in the operating system course Answer is it depends what operating system you use? So if you're using Linux or Mac, it is equal to zero but you don't need to know that for this course and The story is way different on Windows too All right any questions about f get s So the only one that works you should not use scan f you should not use print f or put s with strings Another one you could use that I would probably recommend Because it does a lot of the work for you Actually, it does all the work for you is you can use something like get line It looks a bit intimidating So its API is get line and then char star star. Oh a pointer to a pointer All right, we've seen this before so so it's a pointer to a buffer And then it takes a size t which is like a size pointer So that is a pointer to well size t is basically an integer that is a different size Depending on your machine and then it takes that same file stream So the nice thing about get line is it will malloc memory for you So it will malloc memory for you and tell you how much memory it has actually malloc so turns out why it needs a Char star star is because well you just use a char star Which is just like a pointer to somewhere and then you can just like if you're using Scan f for other things you can just give it the address of the pointer And it will go ahead and update the value for you it will do malloc for you and make sure that Whatever the user types can actually fits so you don't actually have to Think about malloc at all think about how big it has to be do anything like that This function will do all of it for you And what it will do is it will match the entire line the user enters Including white space and ending in a new line So you might have to do something with that line if you want to use it for anything But the good thing is you get every single byte and then you can choose what you want to do with it So it will return the number of characters actually written and that is excluding the null byte So it will tell you essentially how big Was the string that the user inputted So we will finish today Well after summary we'll finish it with get line so Looks a bit intimidating, but basically how it works is I can just declare a Pointer to a char and then instead of doing malloc and then guessing a size I just set it equal to null and I say well This takes up zero bytes So I initialize size equal to zero and then for get line I give it the address of s and then the address of sign because it will modify those for me So it will malloc and then whatever pointer it gets through malloc It will write it to s so s will point to something and then the number of bytes We are allowed that are valid are stored it will store in the size variable And then we're reading input from std in and it returns the number of bytes. We actually inputted so It will include the new line and stop at a new line So if I want to just get rid of the new line Well, I could just take the number of bytes written and then minus one to turn that into an index and Well, I know the last byte it matched was a new line So this is the lot. This is the index of the last byte and I just change it to a zero So I just overwrite the new line character by a zero and then I don't have a new line my input anymore so now if I want I can output all of these so size will be The how many bytes are valid to use through the pointer s and it mallocs it for me Then the number of bytes that actually match that the user actually inputted and then the string with the null byte that Or with the null byte and I got rid of the new line character so the only caveat with this Get line function is that it called malloc for me So when I'm done with that memory, I have to be good and I actually have to free it So here I free s. So I have to remember that I have to free it because I'm not the one that malloced it So if I run this now The get line function just waits for you to input something. So if I input John and then a new line well Turns out it grabbed 120 bytes of memory for that didn't need all of them But didn't really hurt anything and then bytes written was four. So I entered four bytes of information J O N and then the new line character like when I hit enter so that counts and then In this case bytes written was four So the new line character would be at index three So that's what this did right here if I didn't have this line It would have a new line and I'd have a separate new line at the end there so if I went ahead and Commented this out, then it was said it wrote four bytes It would be J O N and then a new line and then if I go ahead and I print it Well, it would print with the new line here, and then I see an empty line here And it looks a bit ugly. So I can go ahead It's a ballad string. So it would put it would do J O and new line and then a null byte And if I just go ahead and comment this I would have two null bytes in a row because I replaced a new line by a null byte But the effect of that doesn't matter because it'll just go until the end and if I wrote my whole name That's just my whole name. I input 13 characters. Yeah Yeah, so as a test, let's see, you know, we might be like, oh, I can just say 120 whatever. That's fine So let's write something a bit longer. So Efficient development was All right, that's longer than 120 So let's see what it does. Oh Just printed. Okay. I should probably change the order. All right. Let's not print this so too much information if I just Bring up the string again. So There's a lot of input. I press enter it got it had to allocate 4,100 bytes for me and turns out I actually entered 4,096 So it works. Could you imagine writing that yourself? Oh, read the user's input and then oh, I ran out of memory Okay, I need to allocate some more memory. Oh crap. I need to allocate some more memory So It'll go ahead and do it for us So questions about yeah So the 120 it had before it's just a default value that they picked So you don't really have to worry about it. It's a small enough value that if you waste a bit of memory, it doesn't really matter so with most of these just let them pick it it works and Until you actually need to change it Working correctly is better than being efficient and broken All right, any other questions about this one? so, I Would recommend using git line and then if you have to modify the string you can so Whoops, so because strings use memory They're difficult to use correctly. You got a lot of rules. So Because they defined it C strings like this you always need to ensure there's a null byte at the end Sometimes you have to read the functions and sometimes they will include it sometimes they won't and The dip like them treating it differently might cause confusion if you misread it There is a format specifier you can use so the percent s, but you should only use it in print f Never use it in scan f or bad things will happen Because while you always need to make sure there's enough memory to hold the string But for overflows serious security issue that has again like I said before Caused lots and lots and lots of money and that's why people do not like see Because well, it's really easy to have this problem other languages may be slower, but You know, they actually save you from having this issue, which sometimes is worth it in and of itself so for user input you should either use F get s and then if it's not if the amount of memory you set aside is not big enough Well, it will go ahead and not write past that so you might get something shorter Or if you don't want to allocate memory yourself, you should just use get line. It will malloc for you I recommend using that one, but it matches literally every single character you write in a line and If you want to ignore white space or ignore spaces or something like that you have to do that But you're sure that you actually capture all of the user's input and it works a bit better So sweet we can end two minutes early. So just remember pulling for you. We're all in this together