 I am going to just quickly outline lists. So, creating lists you have already seen a simple example. So, creating lists the easiest way to create an empty list has to do open bracket close slotted. So, empty to create other more complicated list we have already seen several examples spam x 100 1.234 is again a list p is equal to this is also a list and notice that the list can have all kinds of elements including other lists. Now, let us try one little thing q is range 5. So, q is this a is 1 2 and now I will put q itself into this that is also legal. So, a is now 1 2 1 2 a 0 1 2 3 4. The nice thing about lists is you can access elements and you can also change them. So, if I want to say q of 0 is 100 q becomes 100 1 2 3 4. Notice that a is now 1 2 1 2 3 4 and a also has this reflected. So, when you pass in a variable and store it inside a list like here that q is the same q, they are the same q they are not copies. So, the points to note here is empty list creation is simple. List can be empty with no elements in them. List can be heterogeneous with every element of a different type as seen in these examples. Accessing elements again we have seen this for list string. So, I am not going to belabor this p of 0 is the first second fourth element. p of minus 1 will give you what the last element minus 2 will give you penultimate minus 4 will give you the fourth element from the last just like in strings. And just like in strings q of minus 4t will be a index error. It will say list index out of range it is not inside a reasonable range. Index starts from 0, indexes can be negative, indices should always be in the valid range. To find the length of a list you use length already this has been discussed. So, indices should basically be within the range 0 to length of p similarly with the negative indices. So, how do you add and remove elements? The nice thing about lists are mutable that is they can be changed unlike strings unlike an integer. The integer itself is a constant I mean there is nothing you can do if I say a is equal to 1 there is nothing more I can do to it. I can say a is equal to 2 which means the variable a is now changed to 2 but 2 itself cannot be changed. However, lists can be changed as an element it is a mutable entity. So, if I take empty if I take p is empty list p dot append so again how do I find the various name methods for a list p dot tab and you will see there is append over here ignore all of the underscore methods p dot count extend index insert pop remove. So, let us see what p dot append does p dot append says L list dot append any object. So, it append this object to the end. So, if I do p dot append one more p has become one more it was an empty list p dot append what do you think will happen here. In this case it will add the list 1, 6 so it becomes one more comma 1, 6 it is not one more comma 1 comma 6 please note the difference if you want to do that you need to use p dot extend. So, p dot extend will take the elements from that sub list and add them to it. So, for example, if I do p dot extend then it does what you had expected in the original case. So, now how do I remove elements p is this now let us say I want to remove 6 I can say p dot remove 6 in which case it has removed the last 6 it will not remove this. However, if I want to remove a particular index I can say del it is again a keyword p of 1. So, to remove a specific element based on an index you must say del you must use the del keyword if you want to remove an element by value you use the remove. So, now let us look at this simple case I have p is equal to 1, 2, 6, 3, 4, 6. Now, if I do p dot remove 6 which one will you which one do you think it will remove. So, the documentation is very clear it says remove the first occurrence of the value. So, if p is this p dot remove 6 will be will return give me the list 1, 2, 3, 4, 6 concatenating list is very simple we have already seen this. So, if you have these 2 list a plus b will give me 1, 2, 3, 4, 4, 5, 6, 7 put together remember that when you do a plus b it actually returns the new list and the original list do not change. So, which means let us try this a is c is equal to a plus b c is this. So, now if I say a dot append b is this c is still the same thing. So, essentially it makes a copy of that list it is not like you may added these 2 list and magically if you change something to a this it becomes a new list with a copy of all the elements. Slicing and striding again you will notice that many of these concepts are common to all kinds of sequences. Therefore, the results are the same I will go through them simply for repetition so that the concepts are clear for you. So, in this case we have the primes 2, 3, 5, 7, 11, 13, 19, 23 and 29. So, what do you think 4, 8 will do. So, please try the following I will give you 2 minutes before you type this and get the answer on the interpreter kindly predict what you will see this is the only way you will learn if you do not do this if you do not think before you type you will not learn anything you will only be a typist. The idea is we want you to become a programmer and not a typist please think once before you get the answer expect the answer correct your understanding. So, make sure you get all of the possible slicing and striding examples correct. So, for each of these cases make sure that before you type it you expect an answer type it hit return and see what answer you get make sure you are getting the right answer if not your understanding is flawed. If you have a doubt feel free to ask me now this is the third time you are doing slicing and striding. So, by this time you should be a master of this. So, I will do the first 2 examples 4 colon 8 will give me what 0, 1, 2, 3, 4. So, it should give me 11, 5, 6, 7, 8. So, it will give me 11 to 19 let us try I am correct. So, my python is good I do not know about yours what about primes colon 4 what do you expect in this case if I do not specify the first it means 0, 0, 1, 2, 3. So, it should give me 2, 3, 5, 7 ok. If you got this far you have understood the basics of striding. Now, take this example 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. So, can you tell me a simple way by which you can create this list you have already seen it the answer is to use range 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 is now created very good. Now, num 1 colon 10 colon 2 is what again you should be able to predict what this is going to do. So, this will give me from 1 till 9 skipping 2 elements. So, you will get 1, 3, 5, 7, 9. Let us try this is easy you have already seen it this will give me everything till 9 what do you think this will give me this will give me 10, 11, 10, 11. This will give me all of the order numbers all the even numbers 0, 2, 4, 6, 8. Now, this one is a little bit of a trick what do you think the last one will do. So, the first thing you have to understand is yes first thing is the first index is not specified by default that means it should be 0 last index not specified means end minus 1 what do you think this will do ordinarily this will give you nothing or it should give you just 0 because you are saying start at the first guy go backwards there is only one thing you can do there the first element nothing else can be done. But if I do not specify this and I have specified minus 1 you obviously want to do something different. So, if this last fellow is negative and you do not specify the first and the end it will actually reverse because that seems like the most useful feature. So, if not what do I have to do if I want to reverse the list I will have to scratch my head and say give me minus 1 and give me all the way to 0 and even then it would not work because if I specify 0 it will give me 1 before the 0. So, I do not get the 0th element. So, I must do this therefore when you give this it actually does a reverse. However if I did this it will simply give me a copy and this is equivalent to this which is also a copy. So, if you understand all of this you have understood slicing and striding rather well. So, if you do not follow this I suggest you go through the slides and repeat these exercises yourself. So, your understanding has become clear this is very important. So, now we come to some useful things lists also have. So, let us take a is the range say 5, 1, 6, 7, 10. So, a dot s tab will give me sort. So, do not look at this function because the actual signature is very complicated little more complicated than you may be familiar with L dot sort basically says it takes some bunch of arguments. But notice that these arguments are already specified says CMP is none key is none reverse is false do not let this fool you basically means I can simply do a dot sort without specifying any arguments and it is legal. Notice that when I did a dot sort it means it did not return anything there is no output. If I type a now and see what I get I get all the elements sorted. So, sort does an in place sort it does not create a new list it modifies the same list efficiently and sorts the elements. What you are seeing in this help documentation is that if you wanted to sort it in a different fashion you can supply it a function which takes 2 arguments x and y or call it takes 2 arguments and returns either minus 1, 0 or 1 depending on whether x is less than y or x is equal to y or x is greater than y. So, if you know how to create functions you can basically do sorting like this. You can also do reverse sorting by saying reverse equals to which case it will do a reverse sort. Now, sometimes certain things do not have a sort method. For example, if I take b is hello b dot sort does not exist because b does not have b is not mutable I cannot do an in place sort on b. So, it does not provide. So, there is a built in called sorted which will actually sort any sequence. So, if I say sorted of a it will also do the same task okay. So, you can use sort if you want to do an in place sort on a list or you can use the sorted method for any iterable and it will return a new list okay. So, if you want to learn more about sorted question mark it always returns a new sorted list you have to remember that. So, reversing again is available. So, if you have a, a dot sort, a dot reverse also exists a dot reverse will not sorted it will simply reverse the elements in place. So, for example, let us say a is let me go back. So, a dot reverse here will simply give me 10, 7, 6, 1, 5 it is not sorting it. Unlike sort with reverse is true which sort set in reverse order a dot reverse will simply take the elements and reverse it. It is essentially equal to a is a colon colon minus 1. So, that is pretty much going to do the same thing okay. The difference between the reverse and this fellow colon colon minus 1 is that when you do colon colon minus 1 it returns a new list, but reverse does not reverse is sorting is reversing the elements in place. So, it is essentially doing swaps is this clear? So, reverse does an in place reverse colon colon minus 1 does a creates a new list. So, we are done with lists now. So, we are on target, but there is a question someone asked me about. So, we had this list noon and he wanted to know what is happening when I do noon minus 1 colon 0 colon minus 1. So, what is happening? What I am doing here I will clear this. So, it is easier for everyone to view. This is noon. So, noon minus 1 colon 0 minus 1. So, what is this saying? This says the first index is the last fellow minus 1 means last element which is 11. So, start at 11 go in steps of minus 1. So, it means 11, 10, 9, 8, blah, blah, blah, blah, blah till 0 not including 0. Remember that the last index you specify does not include it. So, if I specify 0 it will only go to the first element. So, this will give me 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. It will not give me 0. However, if I do not specify this, it will mean end of the sequence which means it will also give me 0. And if I do not specify minus 1, it does not mean it starts at 0, but because the last striding is minus 1, it will start with the last element and give you till the first element. So, the minus 1 case is a little special. On the other hand, if I did this what would happen? So, what do you think would happen in this case? Those of you who have understood well will get the right answer. So, let me go through the answer. Minus 1 colon 0 colon 1 will say start at the last element, go to the first element of the list, but go positively. I cannot do that because I am already at the last element. If I go one beyond that I am outside the list. Therefore, this will simply return a single list, a list with a single element level. So, it did not do what you expected. So, you say minus 1, 0 and you strided to 1, you are basically saying I want to iterate in the negative direction, but you cannot because you cannot stride positively once you are at the end of the element. Therefore, it does not return anything at all. Whereas if I did this, the last index is still available. So, what is happening is when I do minus 1 colon 0 colon 1, what is the first index it has minus 1. However, this minus 1 is well beyond 0. I have explicitly told in this line that I do not want, I want everything before 0, but before 0 there is nothing. Is this clear? When I do this, I say from minus 1 give me to the end in steps of 1. This will therefore give me the last element. Whereas when I do this, I am saying from minus 1 which is the last element give me everything till before the first element. There is nothing before the first element. Therefore, this cannot return anything. So, these are all corner cases. So, somebody has the question noon minus 1 comma 4 comma 1. What do you expect in this case? So, if you have actually, so should I subtract 1 minus 1 mark for you, this will give you an error. You have never seen me type this syntax so far. This minus 1 comma 4 comma 1 is wrong. It is a type error. No, it is not semicolon, it is colon. This is also going to give you the same problem because the end of the element is 11. So, for example, if I did this, this will give me 11 because I am asking for elements till the 12th element. So, I think most of you should have gotten the concepts till this point. If you have not, you have not asked me a question. Therefore, I assume that you have understood everything properly. So, let us do a quick recap of what has been covered today. We looked at the basic language, how are the features of the language. Then we looked at the basic data types, specifically we looked at integers, floating point numbers and Booleans. Then we looked at sequences. In sequences, we looked at strings, lists and tuples very briefly. The idea was to get a overall perspective of what is possible with sequences. We looked specifically at how you can check for container ship, how you can iterate over the elements. Well, you can see how you can iterate over the elements but how you can check for existence, how you can access the elements. All of that was considered in the overall big picture. Then we looked, we spent considerable amount of time looking at slicing and striding. Then we looked at strings. We looked at how to define strings, what is the need for all so many times of strings. How do you do simple assignment and arithmetic? How do you access elements? We mentioned the fact that strings are immutable. So, you cannot change the value of a string in place. Then we looked at a simple problem and how we solve it using strings. With one line of code, we were able to do it. In order to do this, we had to review the content on slicing and striding. We also looked at string methods. We looked at a few of the methods. In all of this, we used heavily the IPython feature of tab completion and the history. As well as the question mark, I have been constantly reinforcing the use of. We then looked at one very powerful and useful feature called join. You may not see the use of it right now but later on, when you are actually writing code doing string manipulation, you will find that it is a very convenient feature. Then looked at conditionals which is basically if, l if, else that block. We mentioned that the else is optional. We also looked at the ternary operator which makes it extremely easy to write in one line, what would normally take four lines. We looked at pass and then we looked at loops. We looked at the while loop. Basically, it is while, conditional, colon and then a block. The block indicates the actual content of the loop. To get out of the loop, you again dedent. You go back to this level and you are out of the loop. Then we looked at far. We looked at several examples and we also looked at range. I explained that range is very similar in its syntax to the slicing and striding operation. We looked at break. I mentioned that break basically gets out of the innermost loop. I showed you an example of a nested far loop where I demonstrated that it only stays within the innermost loop that is where it exists. It does not exist the entire loop. We looked at continue. Someone asked me the difference between pass and continue and I clearly answered that. Then we looked at list in some detail. We looked at how to create empty lists, how to manipulate lists, how to create complicated lists with elements of different types. How do we access elements? We elaborated on negative indices and we also looked at what is the validity of these indices. Then we looked at certain key or important list methods like append. I also demonstrated the use of extend. Once again, if you have confused or you have doubts, you can always use the interpreter, try your own experiments and use the question mark feature of the IPython session. We showed how you can remove an element using dell or remove concatenation of lists and then we spent fair bit of time doing slicing and striding. I went through all of these examples and spent fair bit of time on this. Then looked at a few additional methods like sorting. This is an in place sort. This produces a new list. Sorted produces a new list. Then we looked at reverse and again colon colon minus 1, this special form produces a new list. We now begin the afternoon session with input and output. So far we have looked at print, print statement. So consider the example that has been typed out. When you say A is, this is a string. When you type the character A in your IPython session, it is actually not printing anything to screen. IPython is simply giving you a convenience when that expression evaluates to this is a string and therefore showing you that expression. On the other hand, you could do the following. Print A will produce no output but it will actually print it on screen. So this slide basically highlights this. It says what is the difference between A and print A? Typing A shows the value but print A prints it. Note that when I did the value, when I said A, you actually see the quotation marks on the ends of the string. Whereas when you do print A, it actually gives a representation, a string if I had representation that is represented on the screen. So if you had a Python script and you just put a B inside, it will not show you anything. It will basically only, only when you do print B, you will actually print out that line. Now in addition to the elementary processing that you can, printing that you can do with print, like print whatever string, if you are familiar with C, Python also supports this special syntax which is commonly used in C as follows. So let us say you have x is 1.5 and y is 2 and let us say z is zdd. Then if I do this print quote x is percentage point 2.1f, y is percentage d and z is percentage s followed by a percentage x, y, z enclosed in parenthesis. What this will do is it will take x, substitute it into this. What this is saying is give me x represented as two places with one decimal place after the float. So let us try this, x is 1.5, y is 2, z is z, print or before I do print, let us just try the string itself, x is percentage 2.1f, y is percentage d, z is percentage s. So percentage f means float, percentage d means integer, percentage s means string. So this has returned the string. So if I say print, it will not give me the string value but it will actually print it on screen and saying x is 1.5, y is 2, z is z. Now what if I made this, percentage 2.3f. Notice that it says it is 1.5 double 0, it means you are specifying 3 decimal places. So the nice thing is this allows you to do some simple formatting and it is often useful. This is a standard C way of representing of doing string manipulations and it is also documented inside the python documentation. So do the following exercise, create in an editor, use a text editor, do not use Microsoft Word, are they familiar with this from yesterday? Fine. So if you have used g-edit, please use g-edit in order to create a file and type out the following code print, hello, print world, print hello, comma, print world. Notice that there is a comma here. So let me do this. I am going to use win. Please use whatever you want. I am going to say print. I am typing something slightly different to illustrate a point. I am going to save this. That is the example that I have. In order to illustrate something, I am also going to do something interesting. I am just going to type a here, just like in the interpreter. So if I want to see what is in the current directory, if I do bang ls, I can see print example in that directory. Notice that I did bang ls, bang is a special feature of ipython, which will let you escape to the underlying shell. So you can run shell commands from within ipython. Similarly, cd is a special thing supported by ipython. cd dot dot change the directory to the parent directory. You need not do this. Make sure that the file you are editing is available in the directory from which you ran ipython. Now I can do percentage run. Notice that it printed hello, so I will run it again. Hello world, hello world. And I will show you the code. That was the code. So the answer here is given in this last slide. It is saying print x adds a new line, whereas print x comma adds a space. So if I just say print hello, it prints hello and a new line, print world, new line, print hello comma, therefore it only gives a space and then world, which is what you see in the screen here. Hello world, hello world. Notice that in my code I have also just typed a like we did in the interpreter and it printed nothing. Is this clear? So when you just type a, it is just an expression, ipython or a python is simply showing you that expression, the value of that expression for your convenience. An actual python script will not do this. So what we have done over here is we have basically created our first python script. This file is a python script. Now I have run it like this. I could also run it like python print example. That will also do the same thing. But run is a very convenient thing to be able to do. Without leaving your ipython session, I can run the session. So it is very convenient. It has some other features which I am not discussing. So the essential point here is to create python scripts is very easy. You just open a shelf, you just open a text file, put the python code inside it and run it either from ipython using run percentage, percentage run print example or you run it on the command line from dollar, from your bash shell as python example, whatever, file name dot py. The only key important thing you have learnt new about the print statement here is that if you put a comma, it does not add a new line but it adds a space. You can also try the forward print. This will also work. You do not have to put it on a separate new line. So you can put as many of these. So now we have learnt how to create, how to print things on screen, whether we run new line or no new line. We also have learnt how to create a simple python script. So are all of you able to do this? Have all of you been able to create a little your first python code? Do you need some time? If you need time, please raise your hands because I am going to move on. Move on or wait? Wait. I will give you two minutes. Please make sure you are able to create the following file. I will move back to the slide that is necessary. Save it as print example. Do not use Microsoft Word or OpenFace or anything else. Use a simple text editor. Use either g edit, vim, emacs, any text editor will work. So I will give you exactly two minutes to run the python file. Use this percentage run print example. Make sure you are on the same directory as that. If not, use the cd command to move to that directory. Another 30 seconds you have. Is everyone done? You can also run the example from the shell. So we are moving on now. Two minutes up. So I hope all of you have been able to create a simple file. If you have not been able to, please do this in the break or some other time later. It is very simple. Idea is you create a text file, put your content in it. How to open g edit in python? I suggest you do not. If you want to, you can do this I think. g edit, it works. You do not have to do it that way. I would suggest you do it in a separate session because ipython shell is not as powerful as the shell itself. It will allow you to do many things but it is not a replacement necessarily for your shell. So I suggest you use the other shell or start it from your applications menu or something like that. But you can use it like this also. Let us move on. Now sometimes you want to get input from a user. Very often unfortunately this raw input function is abused but I want to get into those details. But in case you have to get input from the user for whatever reason, if you do this raw input function, excuse me, let us try it. What it does is, it basically waits for me. Now let us type some nonsense. Once you hit return, ip will, so what is the type of ip? String. So raw input will always give you a string. Now someone has asked the question, can we use Linux commands in python? The answer is absolutely no. Not in raw python. There are certain functions that are available that can allow you to call the underlying shell. What I have been doing so far with my g edit, all that bank commands is basically to use ipython special functionality. So it is an ipython specialty. This is not legal python. Bank LS is not legal python at all. So it is only a convenience to run when you are doing interactive exploration. It is convenient to do this. That is all. It is not legal python. So here is a simple example. Again c is raw input. If we enter 5.6 and hit return at this point, the value of c will be 5.6 but the type will be a string. Now sometimes raw input, you may want to say something. So you may want to say raw input. It will then pause for you asking your name. Then you can say, in this case it is an empty string. So you can basically display a prompt and obtain the results from that prompt. Many people abuse this to ask for input in every single program that they write. It is not a good idea unless and until you actually need the user to give you direct input. Asking them on a prompt is not a very convenient way of doing it. But it is useful functionality if you have to ask the user for some input. So this is just input and output from standard input and standard output. So how do I get input from the user? How do I give him output? But more often we have to deal with files. So let us see how we can work with files with Python. So in Moodle, I believe Srikant has made available several data files as part of this course. So if you untar those files, you will see inside that directory structure, you will see a file called pendulum.txt. So let us explore how to read that. So for me it is in desktop Python files. I simply did cd to whatever directory which contain these files. In my case cd desktop Python files. So if I am somewhere else I can do cd tilde slash desktop Python files. Now I have all the files. If I want to open this file, I simply say f is, remember f is a variable so it could be anything you want. f is open pendulum.txt. Notice that iPython will also do command line completion for file names. So you can just do tab. f is basically an open file pendulum.txt open with a read mode. So I can now read this file. To read the entire contents of the file, I can say pendulum data is f dot read. Now if I put printpend, firstly what type do you think pend will be? It is a string. So let us print the first 100 or 10 characters of the string. How do I get the first 100 characters? The string, the first 100 characters are obtained like this. So if I printpend itself, it is the full file. So it is that easy to read a file. f is open pendulum. f, this pend variable now is a string containing all of the data. Now supposing I do not want it to be the whole file but I want it split in terms of lines, it turns out that pendulum, that string has a method called split or split lines. Let us see what it does. It returns a list of strings which is broken at the line boundaries. So which means if I say p list, it has 90 lines. So let us print the first 5. So let me repeat what we did. We first got f by saying open file name. We read the entire file contents by saying f dot read. The variable what is read is a string and now we convert the string into a set of lines by doing p list is pend dot split lines. Pend is a string variable and each of these, the p list is basically a string. So each line is stored in a separate element in this list. So the first element, so p list is the first line, second line, so on and so forth. Now that we have read the file, what do we do? We should close the file. So we say f dot close. Remember we read all of the file contents by saying f dot read. Now if I say f, it will say closed file, pendulum dot txt mode r. Which means the file is closed. Recapping, open. Now this open line can be a full path also. It does not have to be a relative path. So you can actually say slash ohm slash user name slash desktop slash python file slash pendulum dot txt is perfectly fine. You read the contents with f dot read, split the lines and close the file. Now often you do not want to just do, you do not want to read the whole file. Instead you want to read line by line. So in that case you can do the following. So I have been asked to repeat, we asked to repeat files. I have repeated this already three times and I will do it again. To open a file all you need to do is you use the open function which basically is given a file name as an argument. It has other optional arguments which I am not specifying. By default it will read a file. So f is open this file. To read the entire contents as a string I do f dot read. So it is very simple. f is then this will read the entire contents and this will produce the entire file will be split line by line into elements of a list. So the first line will be in the first element, second line, second element, so on and so forth. Finally to close you do f dot close. Is that enough? I have repeated this four times now. So I am going to move on. Now sometimes you do not or very often you do not want to read everything in one shot and just produce one massive list. Imagine this has a million lines. In which case you want to iterate over each line. In which case you can simply do for line in open. In this case I am simply printing lines. You can imagine that inside this for loop I can do all kinds of complicated things. For example I could create an empty list outside and every line I could just append into that list. So then I will create a list containing every line. What did I do? In the first case I printed print line. So which means each line also has the new line character. So it is printing that new line character as well as print k bar I have not given a comma so it gives a new line. That is why it gives it each line separately with a new line in between. That is if I do print comma, line comma it does not print the additional new line. So basically what is clear here is so far I never told you that you can I only told you for N in range or for N in some list. What happens is underneath in Python there is this concept known as iterable. An object can be iterable. So a list is iterable. A string, any sequence is iterable. It turns out that a file is also iterable. That is the reason you are able to say for line in open pattern. So this is a common pattern that you will see across lot of Python code. Many objects are actually iterable which means they can be iterated element by element. A file is a very natural thing and in this case when it iterates over every element of the file, it iterates over every line of that file. So now supposing I want to do this, instead of printing I want to collect a line set of lines in the list. I can simply say a line list is empty for line in open pendulum list, line list dot append line which is exactly what I said. So now let us look at a nice problem that we can solve given all that we know about Python so far. So let us say we are given this string. We are given a string, some string a, semicolon 0 1 0 0 0 2, semicolon r, semicolon 0 5 8, semicolon 0 3 7, semicolon 42 3 4, semicolon 3 4, semicolon 35, semicolon 40, semicolon 2 1 2, semicolon t, semicolon semicolon. This is the string. So imagine, so you can understand what kind of file this is. As teachers this looks like a student's mark sheet record. He is in some state category. He is in either math or science or social science category. This is role number, his name. This is marks in mathematics, science, no language sorry, this is language, oh sorry, first language, second language, mathematics marks, science marks, social studies marks, total marks. P slash f will be pass fail, w is withheld otherwise empty and the first character is the region cone. In this case it is a. Now let us say we have a file with a million records like this. Every line contains one student, his region, role number, name, marks, total marks pass or fail. We want to calculate the mean of the math marks in region B. This fellow is in region A. So take all the fellows in region B, calculate the mean of their marks, the average marks obtained by fellows in B, fellows and felis in B. How do we solve this problem? So we will learn the respect, the necessary bits of Python in order to solve this little problem. So first thing is if it is a file, it is a text file. We know how to read it line by line. We can say for line in open file name, you will get the line, every line you get. Each line contains all of this stuff. I now need to take this line, process the line, find out with which region code he is, then find out his total marks and select all of these total marks and then find the average. That is going to be our structure. That is how we are going to approach the problem. So let us see what we need to know to do this. So we will slowly build up our Python knowledge to be able to solve this problem. These are typical problems that we will need to solve later on. So first let us start with tokenization. What we need to do is we want to be able to split this line into the element that are constituted by. So let us say I have this record S. S is a string. Strings have a method called split. So let us look at what split says. It says split is given an optional separator and it will return a list of strings. So basically it returns a list of the words in the string using sep as the delimiter string. So which means in this particular example that we have my delimiter is semicolon. But before we do that, let us look at the original example. Now line is a string. Let us see what happens if I simply did line dot split. Notice that it removed all the white space and gave me three elements in a list, one with parse this string. So by default, notice the help documentation that is supplied here. If separator is not specified, any white space is a separator and empty strings are removed. So line equals parse this string and I split that, I get parse this. Now if you look at S, this is another string. So if I do S dot split, what do you think you will get? You will get a list because it says it will return a list of strings. But the list of strings will only have two elements because it is going to split on white space. Where is the only white space in this? For example in this, there are two white spaces. There is Joseph, there is a white space and then there is after Raj there is. So you should get in my example over here, you will get a semicolon blah blah blah till this then r semicolon blah blah blah till the end. So let us see what we get. That is exactly what we get. In the example that is shown here, if we split without giving this separator, it will give me 3. But supposing I say A S dot split semicolon, it will do what I want. It will split it as A roll number, name, first mark, second mark, third mark, so on and so forth, total marks, parse fail and two additional fields. So this is how it is very easy to do tokenization with Python. So now let us add a little wrinkle. Remember that we had S dot split. Now imagine that in my text file I did some ratification. So let us say my string was this. Now if I did split, notice that the A now has suddenly spaces because I only split on semicolons. So if I take the first element, the fields are S dot split. Fields of 0 is not just A but it also has the space. Why? Because I did not split on empty, on white space, I split on semicolons. So now if I want to remove this white space, I can simply do, remember this is a string. Strings have a whole bunch of methods. One other very useful method is called strip. Strip will remove all white space, either leading or trailing. So let us try another trick. What do you expect? Remember that strip will only remove the leading and the trailing. It will not remove all white space. It will only remove leading and trailing. So it works for us in this case. Fields 0 dot strip will simply give me A. So now even if there is space inside the string, if I strip it, I can easily find out region codes A or B or whatever it is. And please remember strip returns a new string. It does not change any existing string. So let me recap. What we are trying to do is take this record, try to split it, get the fields, then do our calculations. So we looked at split. Without arguments it splits on white space. With arguments it will only split on what you have given it. Therefore you may have the problems if there is space and you have to remove the space you can use strip. Now notice that all of the marks. So when I did S dot, if I look at fields, the fields are A, roll number and so on and so forth. Notice that all of the marks are basically strings. We have already seen that to convert a string to a float, if you remember the ternary operator, I clearly said that you can do int of and that will become an integer. So you can take each of these arguments, each of the values of this list, elements of the list and convert them to a floating, to a integer. You could also convert it to a floating point number by doing float. So now we have enough machinery to be able to solve our little problem. So let us look at the solution. So I will go through the solution line by line and after I go through the solution if you have any questions, please ask me. So the first thing is we create, what do we have to do? What is the task? Let us go back and look at the problem. I have a file which has some 5 lakh records or something like that. Let us see. So it has 1.8 lakh records or 5. So lot of lines. So for example this is actual data. Now we want to take each of these records represents a student. We want to find the average marks of all the students in region B. That is our task. How do we do this? We need to first iterate through every line of that file, identify the region code. If the region is A, we need to add it to some list or something, keep on adding, adding, adding. Then we can find the average. How do we do that? Let me look at the code. First we create an empty list to store the marks. Easy to create. Then we iterate over every line for line in open SSLC.txt. Here there is a 1. I do not know if that file is there. Whichever file works. Then you first split the fields with the separation operator semicolon. The region code is fields of 0. Region code dot strip will remove the space so that you get a unique code. Remember if I give you 2 records, if I am trying to check the equality of 2 strings, if there is a space in between I cannot, they will not be equal. So you remove all the space. Then you find the math marks. So if I want to find the math mark, I think it is the 5th field. So the 5th field is found. I convert it to a floating point number. Then I say if the region code is B, math B dot append math mark. Finished. Then math B dot math B mean is sum of the math elements divided by length of the math. Finished. So extremely easy. So let us now run this quickly. I am going to write slightly shorter code. Marks is for line in open SSLC.txt. I am splitting on every line, splitting every line on the semicolon. Then I am saying region code writing it little shorter. Math mark is float of fields 6, 5. If region code is equal to B, finished. It has read all of the file. It has read the whole file and let us see how many fellows there are. There are 41,000 fellows in region B. How do I find the total of all their marks? There is a built in called sum. Given any sequence, it will return the sum of the sequence. Sum of marks divided by length of marks. So the average math mark in region B is 63.63. And all it took was 1, 2, 3, 4, 5, 6, 7 lines of code. 8 lines of code. We have read 1,80,000 records. Figured out some, we did something interesting. Passed each of those lines, found out all the fellows in region B, calculated the average. And this is the power of Python with which I am able to quickly read files, quickly do string processing and manipulation, do some basic arithmetic and do things. So now we go to a very big topic, very important topic. So if you have any questions, please ask me now. If not I am going to go to functions. Is this code example clear? If you have any questions, I will give you 5 minutes now. Let us go through questions now. Any questions now? If you do not have questions, no problem. Good question. How to use files for output? Do not think we have slides for that, but it is very easy. So let us say I say output, open. Before we do that, let us look at open. So open way, you can give it a mode. You can give it a name of a file and you can give it a mode. So let us do that. So output, open. I am opening some file in slash tmp slash test dot txt and I specify write mode. Now this is a file that I can write to. Out.tab tells me that I can close the file, tells me I can flush it, I can put new lines, I can go to the next, I can read and in the bottom here which is what is of interest to us, I can read, I can write. So I can say write. Notice that in write I have to explicitly put new lines. If I want a new line, I did not close the code properly. Please note my error message is very clear. It says hello world. At this point it found the end of line and it could not find the end of the quotation. Now if I look at the file, there is no guarantee that the file is actually written. So if you actually want to make sure it is written, you can do either out.flush if you want to continue writing or you can do out.close which is even better. Close the file, what is this called? Test. Done. So it is as easy as that to write files, ASCII files. You simply open, you simply use out open and you specify this additional argument w. And then you use out.write and you say out.close once you are finished. Now the modes, they can have different kind of modes. So you can have something which you want to append. I think you can do. So append mode will let you append to a file. Write mode will create a new file. So if I did write, so let us try this. It is gone. So it creates a fresh file. So if you do write, it is gone. If you do append, it will append to a file. And if you do not specify anything, it will just read it. If I did this, out would not be an output. I cannot write to it but I can only read from that file. Does that answer your question? Any more questions? What b refers to an example? Oh, it is a region code. It could be Maharashtra, it could be South, North, East, West, some region code, some data. g refers to, I will show you, some region, some code for a region. So it could be, it could be for example North Maharashtra, South Maharashtra, East, West, some coding, encoding that they have. Can we take some function as procedural feature of Python? Some function is simply a function that is provided by default in Python. There is no procedural feature. It is a function that is available by default. Well, so the other question is b and b are equal or I need to use lower upper functions. Well, if the recording is consistent, everything will be capital but if that is not the case, then you will have to do something like that. You will have to check lower, always lower it and check for a and b. That depends on the data. So the question is, I mean if you have little b or capital B, does it make a difference? That depends on your data. Someone has taken the data so you will have to know something about the data to make the decision. So that is the decision that depends on the data and even in that case you know exactly what to do. If you know that b and little b and capital B are the same, you simply do region code dot lower and then compare whether that is equal to the type that you are interested in, little b or capital whichever you want and that is it. I do not need flush method in my code because I close the file. You need flush. Flush will basically flush out the stream. Normally if you do a write, the contents of the buffer will not be written. They will be buffered somewhere. They may not be actually written to disk. When you do flush, it is forced to write to disk. If you do close, it will do the same thing. It will essentially force it to close. So flush is just a convenience if you want to make sure I have written the file and it is seen. So if you are continuously writing and somebody else is looking at the file to check something, if you want to definitely be certain, want to force the buffer to be flushed to the disk, you can do that. I think we have spent 5 minutes. I think most of the ideas are essentially clear.