 Hey everyone, Mr. Gibson here the next lesson in cryptography. Today we're going to take a look at writing a function that'll help us sanitize or clean the text that we're about to work with when implementing a cipher. This is really important when we're doing computer programming. We want to have a predictable format for our input so we know how we can expect to work with our text. If we have a mix of uppercase, lowercase letters, symbols, spaces, this can be really hard for us to figure out how to consistently change our plain text into cipher text or vice versa. So the first step that we're ever going to implement when it before we do a cipher is going to be to change whatever format our current string is, whether it's representing plain text or cipher text into a predictable format. And we call that text cleaning. In this course, the way that we're going to do that is that we're going to take all of our text and we're going to make sure that it's all uppercase. Regardless of whether it's going to end up being plain text or cipher text, we'll worry about formatting our final output from the cipher later, but for now we just want to make our text predictable and the same. So it's all going to be uppercase. There's not going to be any spaces. There's not going to be any symbols. Really, all we want to have are the 26 uppercase English letters for this course. You could of course change this later on down the road. Maybe you don't want to encode things in English. Maybe you want to work with a different language. Maybe you do want to find a way to incorporate numbers or spaces. There's a lot of ways that we can do that. But for now, for getting started, we're going to focus on just the 26 letters. So we're going to go ahead and get us started here. I've already created a kind of a skeleton here of our function that I've named text clean. It takes in one variable, text, which will be a string. And that's the string that we're going to clean. And then it's going to return a string and that string is the string of clean text. Now what we want to do is go ahead and create a string that's going to hold our clean text when we're done. So I'm going to go ahead and create that now. This is called initializing your variable. It's something that we're going to be adding things to later. But maybe we're going to add to it in a way that we presume that that thing already exists. So when we're working with strings, the way to do that is to create the variable that you're going to work with and just assign that an empty string. You notice that's two sets of a single quote with nothing in between them. We call that the empty string. It's not a space. It's just empty. There's literally nothing there, but it creates a string for us to work with. And now one way that you can think about cleaning text would be kind of creating an unapproved list of characters. So this this is not going to be the strategy that ultimately ends up being successful for us. We're gonna see it becomes a little bit too tricky to do this. So we can do that by saying, okay, now that I've got my clean text string, I want it to be the same thing as my text string. I said I wanted it all to be uppercase so I can use my dot upper. I said I wanted to get rid of spaces. So maybe I can use my dot replaced to replace any spaces with empty strings. Oh, yeah, I don't want periods either. So I need look for those and replace those. Oh, question marks. Yeah, that's another punctuation. And so on. But you're going to end up having to predict every possible input. And in fact, if you think about the number of characters have just looked down at your keyboard, there's actually probably more characters that you don't want to include than you do want to include. So kind of going this way of creating a list of characters that you don't want to allow into your clean text is going to be one longer into a lot harder to implement successfully, never underestimate the power of your user to give you something you didn't account for. So what we're going to instead use is kind of we're going to call this an allowed list. So instead of trying to define all the characters we don't want, it's usually a lot easier to just define the ones that we do want. So we can use this letters string, which is just going to contain the letters that we do want to keep. And now all we want to do is is double check. Are the characters that are in the string text, the one that we pass into the function, are they in this string? If they're already in this string, that means they're good to go. We want to move those over. Now we want to be careful because if there's a lowercase letter, say a lowercase G in the string text, we actually want to keep that we just want to make it uppercase. So we want to kind of take all of these things into account when we're planning what comes next. So this seems like a great opportunity to implement a loop. We're going to use a for loop. We're going to iterate over the variable text, but I want to do that by taking it one substring at a time. So one character at a time. And I'm going to store that to the variable characters. I'm going to initialize this variable character right here on the four string header. And we're going to do that through text. So we're going to iterate over the string text storing each character to the variable named character. And we're going to have it do a set of operations. And really, what we care about is what we want to think about is what do you want to do if the uppercase version of whatever character we're working with is in the string letters. That would be considered a good thing in our case. That means that that's a that's either an uppercase a or a lowercase a that was in the string text, we make it upper and we check is it in the string letters. If it is, we keep it. And if it's not, we kind of just toss it aside, we want to ignore it. So the kind of the clause of this statement would be is that if the uppercase version of whatever character is currently stored to the variable characters in letters, that I want to copy that character over to the string clean text. And we can do that by using one of our shortcut operators are plus equals, that will take whatever our current character is the uppercase version of it and concatenate it on to the end of the clean text. So that should do it, it's only going to it's only going to add into the characters on to the end of that string. If it's in the string letters, which is our approved list of characters right up here, you might be thinking about well, can I do I need an else statement? In this case, there's really not much that we wanted to do if one of those characters that's in the string text isn't in the string letters, we don't want to do anything, we just wanted to move on to the next characters. So you could say else, and then there's this command called pass, which basically just tells Python, if you get here, move on to the next thing, and there's nothing left in that block. So we just go back up to the for loop to the next character and try again. But that actually ends up being extra, we don't we don't need that, we can just leave it right here. And if it doesn't meet the condition of the if statement, it's done in that four block, there's nothing left indented underneath the four block, and it will move itself back up to the top and go through again. It's probably the most concise way to write that. Now, once we do that, we want to make sure that we actually return that string back to the main part of the program so we can do something with it. Usually when we use the function text clean, we're going to store that clean text somewhere else either a variable inside of a different function, we might print it, but we're going to do something with it. And it's not always going to be the same thing, which is why this function does not actually print the clean text, it just returns it. And then whatever function or whatever piece of code called the text clean function, it can decide what to do with it once it gets it. Let's check and make sure this works. So for now that means I'm just gonna I am going to print whatever the cleaned version of this string I'm about to create is. Let's just start with something simple. Hello, world uppercase and lowercase. So if this works, it should come back with no spaces and all those letters should be capitalized. Let's let's bug check the code. That looks pretty good. Let's make sure it's actually taking out punctuation. So if I did a little mix of characters here to make it a little more odd. That still works too. Now you'll notice that when we had numbers in there, it did not attempt to try and replace those numbers with a maybe a spelling of those numbers like three could have been thre instead of just removed it. So you don't be mindful of that that this text clean function that we've written so far will not keep any information that numbers might have in your message. We can update this function later in the course once you learn about dictionaries. That's another data type in Python that will allow us to do something a little more complicated like I just mentioned. But for now our text clean function will definitely lose information that's represented by numbers. So keep that in mind when you're thinking about your messages that you'll use to test your code or to to transmit to friends is that when they decipher ciphertext that maybe you would you would clean before in ciphering once they get back that plain text any of the numbers in your original message will be gone. So just something to kind of think through. Right. That's it for the text clean function. I would definitely add this to your toolkit of functions that you could pull up on a moment's notice and we'll catch you on the next one. Thank you for watching.