 If you major in computer science, almost certainly in your first year you'll take a course called something like Intro to Data Structures and Algorithms, which is a course covering the basic collection types, most commonly used in code, and the various algorithms associated with those collections, primarily searching and sorting algorithms. The course also tends to serve as a student's introduction to the formal analysis of algorithms, that is the student's real introduction to proper computer science. In this unit, we're going to cover at least the essential highlights of such a course, and we'll even take a brief look at formal analysis when we discuss time complexity and what's called big O notation. Our treatment, though, of these topics are going to be kept quite informal. We're not going to get into any real math, not just because I'm not qualified to teach that sort of thing, but because we're just trying here to hit the broad conceptual highlights. Now we're going to look at algorithms, particularly those associated with arrays and lists, namely algorithms for searching through arrays and lists, and also for sorting them. The simplest of these is surely a linear search, a search through a list or array for a particular matching value. In a linear search, we simply start at the beginning, and we iterate through every value, comparing those values against what we're looking for until we find a match. So here, for example, is a function in Python, which performs a linear search on a list. The first parameter takes in the list itself, and the second parameter is the value which we're searching for. What we want returned is the index, the location of the value in the list. So it should be an integer value from 0 up to but not including the length of the list. And understand that we search from the start of the list. So if matching values are located at multiple places in the list, what we get back is the index of the first occurrence. In the case where the list contains no such value, when there is no match, we simply return negative 1 as a special value indicating not found. This is a common convention with search functions, negative 1 indicates that there is no such match. So, looking at the body of our function, our foreign loop iterates over sequence, which is the range from 0 up to but not including the length of the list. So say if the length of the list is 5, then our loop will go from 0 to 1 to 2 to 3 to 4, and 4 will be the last iteration. In the loop, we simply test whether the value at that index in the list matches the value we're searching for, and if so, we return the index. If our loop is exhausted with no matches found, then we return negative 1. Now, you'll note that there's nothing really clever at all about the linear search algorithm. It performs a search for a particular value in really the most obvious way possible. For this particular problem though, there really isn't a better solution. There are plenty of more sophisticated search algorithms, but those algorithms don't apply to the general case of having just a big dump of stuff in no particular order. For example, what's called a binary search is a much more efficient kind of search, but you can only use it on lists or arrays where the items are sorted. The gist of the binary search is that we start our search not at the beginning, but actually in the middle. If the value in the middle happens by luck to match what we're searching for, then our search is done. Otherwise, if the value there is greater than what we're searching for, then we know we need to look to its left. Otherwise, if it is less than the value we're searching for, then we need to search to the right. And so, with our first comparison, we have effectively eliminated a whole half of the list from consideration. We know the value can't be found there. In the remaining range of the list where the value still might be found, we continue with the same strategy. We look in the middle of that range and we can make a comparison. And if it happens to match the value we're searching for, then we're done, we've found the value. Otherwise, we know to either look to its left or its right, depending upon whether the value there is greater than or less than the value we are searching for. So here, for example, we have a sorted list where the first item is negative 7 and the last value is 1881. And notice that all the values are in ascending order. And if we wish to find the value 340 in the list to retrieve the index at which it is located, assuming that it is located at all in the list, we start in the middle and we see that the value there, 178, it's not equal to 340. It's in fact less than 340. So we know that the value we're searching for is somewhere to the right of that index. So we know now that the value isn't in the range from the first index, index 0, up to and including the index we just compared. In the subrange that remains, we jump to the middle there and we compare that value against 340 and we see that it is greater than 340. So the value we're looking for, 340, must occur earlier than this index, if in fact it is present at all in the list. So one last time we jump to the middle of the remaining range, though in this case the range left has an even number of elements, so there's no precise middle. We either have to have our algorithm round down or round up in such cases. In this case, we're rounding up and it turns out that, oh, we've actually landed on the value we're searching for. And having found the value we're searching for, the algorithm returns the index, which in this case is 7. So one way to think about this algorithm is that we are effectively shrinking the range in which we are searching. It is contracting as we search. So in our binary search function implementing this algorithm, again we take into parameters the list itself and the value we're searching for and we're going to return the index of the found value or negative 1 if it's not found at all. And we're going to use two local variables, start index and end index, which denotes the range within the list which we are searching. And of course at the start we're searching the entire list, so we initialize start index to 0 and end index to the last index of the list, which is of course 1 less than the length of the list. As long as we have a sub-range left to search, the start index will be less than or equal to the end index. In each iteration we find the middle index, mid IDX, retrieve the value at that index and compare it against the value. And if it's equal to the value then we found the index, mid index is the index we are searching for. Otherwise, if it is less than the value we're searching for then we know that that index and everything to its left is no longer part of the valid search range. So we need to adjust the start index to be actually 1 greater than the middle index. And conversely, if the middle value is actually greater than the value we're searching for, then we know that the value will not be found off to the right. So we adjust the end index, we bring it in to be 1 less than the middle index. What will then happen in a case where we're searching for a value not found in the list is that in the last iteration, the start index or end index will be adjusted such that the end index and start index cross over. The end index ends up less than the start index. And so our loop condition tests false and we return negative 1. Notice at the line at the top of the loop where we get the mid index value that we use the int function to round off whatever gets returned from the division by 2. If it happens to be something .5 then we want that to be rounded because of course indexes to a list are always integers not floating point numbers.