 Hi, I'm Chris Brooks. I'm faculty here at the School of Information at the University of Michigan. I teach in our data science courses here and I use Python heavily and I work daily in Python and use this development environment called the Jupyter Notebooks. I'm pleased that I'll be able to share that with you through this course. The Jupyter Notebooks is a great piece of technology that allows you to use the web essentially to write real-world Python programs that leverage really impressive APIs including optical character recognition and detecting faces and so forth. In my work here, I run a research lab called the Educational Technology Collector, etc. We do research in educational technology and learning analytics. I'm really interested in how students like yourself interact with technology like the Coursera platform, including the content, some videos like these, and your peers on the discussion forums. So my research is really focused on building things like predictive models of student success and studying how you interact in these different avenues to exceed learning expectations. In this course, we're going to be more project-based and it's a little different focus than the previous courses. So in the first four courses in this specialization, you learn the fundamentals of Python and now we want you to practice those fundamentals to try and solve a project. We're going to introduce to you new libraries. Now the goal here is not actually to learn those libraries in detail, but to learn enough about those libraries and and moreover to have this meta-learning, to learn how to approach a new library, so that you can start using your skills in the wild to solve the projects that you might be interested in. Those libraries are going to include image recognition libraries, so pillow and image manipulation, Tesseract, which is an optical character recognition library, so how we take pictures of books and take the text out of them and turn it into something we can process, Kraken, which is a layout library for text, and you'll get a sense for the challenges that come when dealing with taking images and trying to recognize characters in them. And then the last one we'll introduce is called OpenCV, where CV stands for computer vision. And it's really, it's used for a lot of things, but we're going to focus on using it to detect faces in pictures and your project will be about that. So one last thing to note, along with some of the other faculty here at the School of Information, I teach a data science specialization on Coursera. We use Python heavily in that and the Jupyter Notebooks. And we think that after you finish the course like this and come all the way through it, you're more than ready to take that specialization. So if you want to continue your learning on the platform and continue your learning with us, please join us in that specialization. Let's just dig into the class now. Hi, in this video, I'm going to share with you the Jupyter Notebook. The Jupyter Notebook is a great way to get started with Python on the Coursera platform. And it's a great way to do data science and other more advanced Python aspects on the platform, too. So let's dive in and take a look. So when you log into the Jupyter platform, you'll be greeted with a screen that looks like this. Jupyter is really built around this notion of code cells. So here I have one cell. Now there's a full Python interpreter running in the background behind this so I can do things like create variables. So here I'll just say X equals 10 and then we'll just print X. You'll see that there's no output until you actually go to run the cell. But when you run the cell, the interpreter returns you the result. You also get a little number indicating how many cells have been run. So you can see in this example, I've played 10 practice cells before actually showing this. Now after it runs, it's not like the application is finished running. The kernel is still running in an interpreter mode and we can continue to send queries to it. So if we wanted to say X plus X and we wanted to print that, you can see that prints out as we expected. If you wanted to, you can stop the interpreter. So if you feel that the state is confusing or you just want a fresh Python instance, it's actually pretty easy to do that. You go up to kernel and you say restart. And often I use restart and clear output as a way that reminds me that the interpreter has actually been restarted. You can cut copy and paste cells and move them around and so you can do those normal kinds of things. The Jupyter Notebook has a special feature. It actually has a number of them. But one is that if the result of your last statement was an object, there was some value returned, but you don't do anything with that. It then automatically tries to print that to screen. A good example might be this. So if we say X equals 10 and then just the value X, and then we actually run that, we'll actually see the output. And so you'll often see in some of the videos, people just leaving the value towards the end. You can individually run cells. So for I in range 10. So if we want to have a loop, print I. We could actually just not run this yet. And we could say down here, wait, what was the value of I? And just run that individual cell and oh, I's not actually defined. If we then decide we want to run that, OK, there's a bunch of eyes. And now run this cell again, we could do that. So you can see that we can have a nonlinear editing format. And that can be a little tricky, actually, because you can do things like change the value in one cell. But then, you know, this seems to suggest that I should be nine. But then when we run it, we actually see that it's minus one. So you can sometimes get this irregular state, or at least it feels irregular. It's actually the interpreter, the way it's run that you've run the cells in order determines the interpreter state. And you can often see that if you look at the cell number here, so six is greater than five, but it's greater than three. So you can see that we were running some things in a different order. One of the benefits of the Jupyter platform is that you can add just text in here as well. So let's say I want to describe this is a great example of a loop. And let's say I want this section to even be called loops. So you can change the format of a cell here. And there's a number of different formats. And you'll mostly use code and then something called markdown. And so when you change it to markdown, you'll note that the in out goes away because it won't be sent to the interpreter. But when you run the cell, it'll actually it'll actually run it in a data format called markdown. And so like the pound sign here or a double pound sign means give me a title. So when you actually run that, you get a nice bit of text. And so you can actually mark up your code execution in a way that's a lot like a textbook might be. So you could create a whole textbook in this environment. So those are the main features of the Jupyter Notebook. There are a bunch of other options. So one that I often use is I will restart and run all. So if I want to run all of the values in a notebook, I want to just see the whole execution trace. You can do that. So in this case, it would we'll do that here. You'll see that it ran through the loop and then I print it out. I and it set up our markdown. That's a pretty common method. If it runs into an error, it will pause execution. So that's important because sometimes in a lecture video, you might see we intentionally put an error in there for a teachable moment. Under the view, there's some options as well. Line numbers is one that I will often turn on. And that'll number ourselves as well. Each Jupyter Notebook has a title for it. And this determines the file name. So I'll just call this demo. And if you click on the Jupyter logo, you'll be taken to what's called the tree interface, which is just a directory interface of the files that you have for this project. And within a given course on Coursera or a specialization or a degree on Coursera, you may have any number of different Jupyter systems with different file spaces mounted on them. I also really like to be able to download my Jupyter Notebooks. So sometimes I download them as HTML to show them to people or as a PDF version. So that can be quite helpful as well. So that's a quick tour of the Jupyter Notebook. Now, most of the work you'll be doing in this course, this specialization of this degree will be in the Jupyter Notebook. It's a great environment for doing Python. And there's ways to create assignments and submit assignments directly to auto-graders from within the platform. And I hope that we'll be able to share some of our research tools as well on educational technology that are built into the Jupyter Notebook. But we'll see how that goes. All right, we'll see you in the course. Hi, everyone. My name is Daniel Shoren, and I'm a student assistant in this class. Today, we'll be going over how to set up your local programming environment on a Windows computer. We'll go through and install all of the libraries, packages, and software you'll need to run the files in this course. Well, the Jupyter console in the Coursera module comes included with all of the libraries and packages you'll need to run the lessons in this class. Some people may prefer to do that on your own computer. The only prerequisites for this tutorial are computer-running Windows with administrative access that is connected to the Internet. We'll be completing the installation using the command line, which is a way to pass instructions to your computer using text. The command line is also known as a shell. It is a powerful tool for modifying, automating, and organizing tasks on your computer. Before we get too into the command line, let's first install Python. Navigate to this URL. HTTPS colon slash slash www.python.org slash downloads slash Windows and download the latest version. At the time of this video recording, that's Python 3.73. This downloads an installer which will automatically configure the paths and dependencies, allowing the programming language to be interpreted by your computer. Follow the instructions and you'll be good to go. Python automatically comes with a package called pip, which allows you to install libraries super easily. We'll touch more on pip in just a little bit. In order to download and manage libraries, we'll need to download a package manager. A package manager is a set of software tools that automate complex installation processes, which include downloading, upgrading, configuring, and removing software. The most robust and common package manager for Windows is Anaconda, which we'll be using in this guide. Anaconda is a free and open source package, an environment management system that makes installing software on Windows pain-free. Additionally, Anaconda is a Python data science distribution. It comes loaded with lots of useful libraries for data mining, machine learning, and statistics programming work. If you're interested in using these libraries, be sure to check out the University of Michigan's Applied Data Science and Python programming course on Coursera after this class. To download Anaconda, go into your web browser and navigate to www.anaconda.com slash distribution and download the package manager corresponding to the Python version you downloaded. This will launch another installer program similar to the one used for the Python installation. Follow the instructions, download it to the default location, and you'll be ready to go. If you already have Anaconda installed on your computer, you can update it to the latest version by taping conda, update conda in your command line interface. Now that we have our package manager installed, let's talk about virtual environments. Virtual environments allow developers to have separate space for programming projects, ensuring that the dependencies of one project don't inadvertently affect another project. Using virtual environments can prevent a lot of compilation issues, as well as giving more control over our Python projects. It's a best practice to create a programming environment for each programming project, as you can create as many of them as you'd like. Let's go ahead and create a virtual environment for this class, Pi 3. To do this, we'll use Anaconda's built-in virtual environment capabilities using the command line. On Windows computers, we can use the command line application to access the command line interface, which can run scripts, download software, and more. You can find the command line by opening the start menu and scrolling through your applications or by using the search bar. Once you have your command line opened, let's create a home directory for the files in this class. This tutorial will place it in a folder on your desktop, although you can put it wherever you'd like. Type cd squiggle line. That'll just take you back to your home directory. And then type cd desktop. cd stands for change directory. That'll help us change the directories and folders as we navigate using the command line. So cd desktop helps us navigate to our desktop. And then we're gonna make a folder using the command make directory or makedir, M-K-D-I-R. And then the name of our home folder for this class. And for this, we'll use Pi 3. Now that we've had that folder created, we'll go into that folder using the change directory command. Now that we're in our class directory, we'll create our virtual environment simply by typing this command, conda create hyphen n pi 3M, Python equals 3.72 and a conda, where pi 3M is the name of our virtual environment and the Python version corresponds with the Python version you downloaded. You can check the Python version by typing Python dash dash version. Here on this computer, we're using Python 3.5. So now that we've created our virtual environment, we can activate it using the following command. Just simply source activate pi 3M for pi 3M is the name of our environment. As you can see, the windows built-in command prompt is having a hard time working with anaconda. To work around this, we're gonna use a command line interface program called git bash. Git bash is a shell. What's works on top of the command prompt to make it easier to download libraries and easier to work with your command line as a whole. To download git bash, just Google search git bash. Click on the downloads page. Click on your windows operating system and your download will start immediately. Follow the instructions on that download and you'll be all set to use git bash. We'll go to our main directory, go to our desktop, go to our pi 3 folder. And now since we already created the environment using the command line interface from the windows computer, we'll go ahead and see if that's working with git bash. So we'll try source, activate pi 3M. And wonderful, we're in our Python 3 virtual environment. You can tell that you are in your Python 3 environment. If the name of your environment is in parentheses after any command. Now that we've created and activated our virtual environment, let's install the packages we'll need for this course. We'll describe more about what these packages do and how to use them in different lessons. For now, run the following commands one at a time in your pi 3 folder with your virtual environment activated. Due to the power of editing, we're going to speed through the process of copying and pasting these downloads into your git bash shell or your command prompt. And additionally, on this computer, these packages were already installed, so don't be alarmed if you see different instructions in your command prompt. We'll install pip install pillow, pip install py tesseract, pip install numpy, pip install matplotlib, and pip install opencv python. With all of your libraries downloaded, we're nearly ready to go. We just need to get our files ready to run. Let's navigate to the pi 3 Coursera class. Download the Jupyter notebook files with their ip ynb extensions and their accompanying data files into the pi 3 directory on your local computer. To run these files, we'll simply navigate to our folder in the command line and type Jupyter notebook. And as you can see, our notebook is up here, all ready to go. If you can install the files, we're ready to go. Let's get started. Hi, everyone. My name is Daniel Shoren, and I'm a student assistant in this class. Today, we'll be going over how to set up your local programming environment on a Mac computer. Starting off, while the Jupyter console is included in Coursera, and it comes with all the included libraries and packages necessary to run the lessons in this class, some people may prefer to run the files on your local computer. Today, we'll go over the steps and help you do that in this video. The only prerequisites for this tutorial are a computer running Mac OS with administrative access that's connected to the internet. We'll be completing the installation using the command line, which is a way to pass instructions to your computer using text. The command line is also known as a shell, and it's a powerful tool for modifying, automating, and organizing tasks on your computer. On Mac computers, we use the terminal application to access the command line interface. You can find terminal by opening Finder and navigating to terminal in the utility folder within the applications folder. In order to download and manage libraries, we'll need to download a package manager. A package manager is a set of software tools that automate complex installation processes on your computer, which include downloading, upgrading, and configuring and removing software. The most robust and common package manager for Mac OS is Homebrew, which we'll be using in this guide. Homebrew is a free and open-source package managing system that makes the installation process on Mac computers pain-free. To install Homebrew, copy the following command and paste in the terminal as such. We'll go ahead and copy this link, navigate to our terminal, Ctrl-click, Paste, and Enter. You'll see a script asking for permissions to download, and we'll go ahead and press Return. And then you'll see it download and install Homebrew. So, Homebrew is a software programmed in the language Ruby. And the installation works by modifying your computer's Ruby path, meaning where Ruby is installed on your computer. You'll need to confirm the download and enter your computer's password. Note that your keystrokes will not display in the terminal window when you're entering your password for security reasons, so simply press Enter when finished typing your password and follow the instructions in terminal to finish the installation. Once that's finished, you can check if Homebrew installed successfully by typing the following command in your terminal. Just brew Doctor, so we'll go ahead and copy that, go back and paste it, and we see that our system is ready to brew. Once Homebrew is finished installing, we can download Python. Homebrew comes included with a ton of packages available for easy installation. You can search for libraries to install using the brew search command. Feel free to browse Homebrew's packages on your own time. Now, we can go ahead and install Python 3 using the following command in terminal. Brew install Python 3, so again we'll go over copy that, control click and paste it, and press Enter. While Python 3 is already installed on this computer, you'll see a few different instructions in your computer's terminal. As you can see on our computer, it says we're already installed and up to date, but if you just follow the instructions in your terminal, your Python will install correctly. Upon entering that command, the terminal should be flooded with information about the download. In addition to Python 3, Homebrew will install pip, setup tools and a wheel. These are all libraries and packages for Python. Pip assists Homebrew in Python package management. We'll be using pip momentarily to download the Python packages we'll be using in this module. We can check the version of Python we have using the following command, Python version. As we see, we are working with Python 2.7.1. While this is an older version of Mac OS, we do recommend using Python 3 to download. To update the version of Python on your computer, we first recommend updating Homebrew. You can do so with the following commands, brew update and as we can see, Homebrew is already up to date on our computer and finally brew upgrade Python 3. Great. Now we can see that Python 3.7 is installed. Now that we have Homebrew and Python installed, let's talk about virtual environments. Virtual environments allow developers to have separate spaces for different programming projects, ensuring that the downloaded packages of one project don't inadvertently affect another project. Using virtual environments can prevent a lot of compilation issues and it can also give us more control over our Python projects. It's best practice to create a programming environment for each programming project, as you can agree as many of them as you like. Let's go ahead and create a virtual environment for this class Py3. First, we'll have to create a home directory to house the files in for this course. For this tutorial, we'll put it in a folder on our desktop, although for you, you can put it wherever you like. In terminal type cd and the squiggle line and that takes us to our home directory. Then we'll type in cd desktop and then we type the command makedir mkdir and the name of your folder which we'll use Py3. We'll navigate into this folder using the change directory or cd command. Now that we're in our class directory, we can create our virtual environment simply by typing this command. In this command, the 3.7 corresponds to the version of Python and Py3m is the name of our environment. This command creates a new directory inside of our py3 home folder that houses a few files that allow our virtual environment to run correctly, isolating the project files so that they don't mix with the system files on our computer. The most important of these is the live sub directory which starts out empty but will at the end of this lesson hold the data for all the libraries we install in this environment. To use the environment we created we need to activate it. We do this by invoking activate script in terminal. Wonderful and you know your virtual environment is activated when you see the name of your environment in parentheses before the terminal commands in the application. Now that we've created and activated our virtual environment, let's install the packages we'll need for the course. We'll describe more about what these packages do and how to use them in different lessons. For now just run the following commands one at a time in your py3 folder with your virtual environment activated. Due to the power of editing we're going to speed through this installation process but understand that these packages may take some time to install so be patient with them. Install, pip install, pillow test her act, py test her act, numpy, matplotlive and openCVpython. With all of our libraries downloaded we're nearly ready to go. We just need to get our files ready to run. Let's navigate to the py3 Coursera course, download the jupyter notebook files with the ip, y, and b extensions and their company data files and put those into the py3 directory on our own computer. To run these files we simply navigate to our folder in terminal type jupyter notebook. There you should see a folder for your files and you'll be all ready to go and run these files within jupyter. Thank you so much and have fun with this course. So let's demystify the python runtime environment a bit. Up until now you've been learning how the code itself works. That is how the python interpreter which is a computational process running on your computer considers the commands you give it which are loops, control structures, variables functions and how it executes the underlying code on hardware such as the processor, video card, and memory of your computer to create some experience for the user. In much of this course you've been using a web based simulation environment runestone to create these experiences. With jupyter though we'll be using an environment that is more traditional. A key aspect of this traditional environment are the installation files for python itself. Let's take a look. I'm going to open up a new terminal. Don't worry if this seems unfamiliar to you we won't be using the terminal much in this course I just want to help you explore the jupyter system a bit. You'll notice that there are a bunch of characters when we open up the terminal. The first set of characters are our username. You can ignore this it should be the same Jovian for all Coursera users. Then there is an at sign and the next set of characters are the machine name. Again the machine name is just auto generated by the Coursera system and isn't really relevant for our discussion right now. Finally the rest of the string is the current path or the location that we're working in. This is useful and actually if we type the characters L and S in there for list and hit enter we'll see a list of all files and some folders in this directory. But I actually want to show you where Python lives on this machine. So we're going to change the directory with the command cd. On the Coursera system we're using a specific kind of installation for Python called Anaconda. Don't worry too much about that. Let's just change to the site packages directory. This is the real heart of the Python library ecosystem. So I'll go cd. Optaconda. Lib. Python 3.6 and site packages. When we do this then do a directory listing we see a whole bunch of things. First there's a lot of interesting files and directories in here and these are the third party packages for Python which are installed on the system. We're going to be dealing with a lot of new packages but I want you to feel empowered to explore a bit. These packages are just Python files or sometimes other languages as well which have been configured to work with the current Python environment. Let's take a look at one that I'm familiar with called Pillow. Pillow is an imaging library for Python. We can see that it's installed here because there's a Pillow egg file. We can actually look at the source code of the Pillow library by going into the Pill directory. So let's look at the CDPill. You'll see that when we do an LS most of these files are just .py files, Python code itself. We can even take a look at the Python source code behind this library using the more command. Here, let's look at the main Python file in this library called image.py. So more image.py. We can see that there's a bunch of comments at the top reaching all the way back to 1995. We see a few import statements, then some top level variables like a logger. We won't talk much about loggers, but they're handy when debugging code. Then we see that there are a few classes which are created, then a math expression, then a try and accept block and so forth. You can feel free to explore this library more by hitting the space bar or you can exit with more command by hitting Q. So that's a very brief overview of where Python libraries exist on your system. Now, that's not exactly a user friendly way to interact with the library by reading the source code, but it's a great way to learn how libraries work and how programmers create complex Python solution. But let's go back into the Jupiter notebook and explore how to actually use this library. Let's recall that we import a library using the import keyword. Just import pill and we run this. Documentation is a big help in learning a library. There exist standards that make this process easier. For example, most of the libraries let you check their version using the version attribute. So we go pill dot and then double underscore. So this is called Dunder version Dunder. We see the version of the one that I'm using is 4.2.1. You might be using a different version because we might have upgraded it. Let's figure out how to open an image with pillow. Python provides some built-in functions to help us understand the functions and objects which are available in libraries. For instance, the help function when called on any object returns the objects built-in documentation. Let's try it with our new library module pill. So help pill and this renders nicely in line a bit of help file that actually comes from the documentation of the source code itself. This shows us that there's a host of classes available to us in the module as well as version information and even the file called Dunder and Net.Py which has the source code for the module itself. We could look up the source code for this in the Jupyter console if we wanted to. These documentation standards make it easy to poke around and explore a library. Python also has a function called dir which will list the contents of an object. This is especially useful with modules where you might want to see which classes you might interact with. Let's list the details of the pill module. So dirpill and we can see here a list comes back with a bunch of strings. These strings are intended to be internal functions so they've got Dunder before and Dunder after and that's just double underscore that's a fancy way in Python that they say and we see that there's a couple that do not have Dunder and are expected to be used more generally. At the top of this list there's something called image and this sounds like it could be interesting for us so let's import it directly and run the help command on it. Import image then help image. Running help on the image tells us that this object is the image class wrapper. We see from the top level documentation about the image object that there's hardly ever any reason to call the image constructor directly and they suggest that we use the open function and that's what we should be using to get images. Let's call help on the open function to see what it's all about. Remember that since we want to pass in and not run the function itself we don't put parentheses behind the function name. So help image.open and we're passing an object but that object is actually a reference to the function. So it looks like image.open is a function that loads an image from a file and returns an instance of the image class. Let's give it a try. In the read-only directory there's an image I've provided which is from our masters of information program recruitment flyer. Let's try and load that now. So file will make it a string read-only directory and then msi underscore recruitment.jif and we'll call image.open and pass it this path to the file and that should return to us an image object which we're going to put into the image variable and let's just print out this image. Okay so we see it printed a pill.jifimageplugin.jif image file and it gives us some other information there. So we see that this returns to us a kind of pill.jifimageplugin.jif image file. At first this might seem a bit confusing because we were told by the docs that we should be expecting a pill.image.image object back but this is actually just object inheritance working. In fact the object returned is both an image and a jif image file. We can use the Python inspect module to see this as the getMRO function. We'll return a list of all of the classes that are being inherited by a given object. Let's give it a try. So we'll import inspect. Now this is not a third party library. It comes with Python and then we'll write a function. We'll just print the type of the image and so type will tell us all of the types of the image or sorry type will tell us the type of the image but we want to convert that to a string but then we're going to call inspect.gip MRO and pass it the type of the image and see what that inheritance chain looks like. So we see the result is a tuple that's returned to us which is actually all of the different objects that are being inherited from here with jif image plugin at the very usually bottom we would call this of the inheritance the most specific version up to an image file up to an image and then up finally to an object. Now that we're comfortable with the object how do we view the image? It turns out that the image object has a show function. You could find this by looking at the properties of the object if you wanted to using the dir function so we'll call image.show and that didn't seem to have the intended effect the problem is the image is stored remotely on Coursera server but show tries to show it locally to you. So if the Coursera server software was running on someone's workstation in Mountain View California where Coursera has its offices then you just popped up a picture of our recruitment materials. Thanks for that. Instead though we want to render the image in the Jupyter Notebook it turns out Jupyter has a function which can help with this. So from ipython.display we'll talk a little bit more about that but ipython was one of the early terms for Jupyter which started as just an interactive Python interpreter before moving into a much larger project. We want to import the display function and then let's call display and pass at the image. Okay so there we see our inline display of happy masters of science and information students. For those who would like to understand this in more detail the Jupyter environment is running a special wrapper across the Python interpreter called ipython. ipython allows the kernel back-end to communicate with the browser front-end among other things. The ipython package has a display function which can take objects and use custom formatters in order to render them to the screen. There's a lot of different formatters provided including ones which know how to handle image types and different image types and that's what we're using here. That's a quick overview of how to read and display images using PILO. In the next lecture we're going to jump in a bit more into detail to understand how to use PILO to manipulate images. First let's import the PIL library in the image object. Import PIL and then from PIL we'll import image. And let's import the display functionality from ipython. So from ipython.display we'll import display. And finally let's load the image we were working with last time. So that image is in read-only msi-recruitment.jif and now we'll open it into the image object. Let's execute that cell. Great. Now let's check out a few more methods of the image library. First we're going to look at copy and if you remember we can do this using the built in python help command. So help image.copy Alright so here's some information about copy. Let the copy takes no arguments and that the return object is an image object itself. Now let's look at save. So help image.save Okay so there's quite a bit to the image.save object. The image save method has a couple of parameters which are interesting. The first, called fp is the file name we want to save the object to. The second format is interesting. It allows us to change the type of the image but the docs tell us that this should be done automatically by looking at the file extension as well. Let's give it a try. This file was originally a jif image file but I bet if we save it with a .ping format and read it in again we'll get a different kind of file. So let's image.save and we'll just call it msi-recruitment.ping and that's going to save it in your home directory. And then we'll open a new image so image.option and open msi-recruitment.ping and take that into an image variable. Now let's import inspect like we did in the previous lecture and call inspect.getmro and wrap an image in the type function. Indeed this created a new file which we could view by going into the Jupyter Notebook file list by clicking on the logo at the top of the browser. And we can see this new object is actually a .ping image file object. For the purposes of this class the difference between image formats really isn't so important but it's nice that you can explore how a library works using the functions of help dir and getmro. The pillow library also has some nice image filters to add some effects. It does this through the filter function. The filter function takes a filter object and those are all stored in the image filter object. Let's take a look. So frontpill will import the image filter object and then run help on image filter. So there are a bunch of different filters here but let's just try and apply the blur filter. Before we do this we have to convert the image to what's called RGB mode. This is a bit magical. Images like GIFs are limited into how many colors can be displayed at once based on the size of the palette. This is similar to a painter's palette, which only has so much room. This is actually a very old image format. If we convert the image into something more sophisticated we can apply these interesting image transforms. Sometimes learning a new library means digging a bit deeper into the domain the library is about. We can convert this image using the convert function. So image and then we'll just call image.convert RGB and this stands for red green and blue mode, which is a pretty common mode for images. Then we'll create a new variable blurred image and we'll set that to image.filter and we'll pass in the pill.imagefilter.blur parameter. You'll note that the parameter that we're passing in is really a placeholder. It's not a function that we're running. It's an object we're passing in. And then let's display that blurred image. I encourage you to pause the video here and jump into notebooks and start experimenting with some of the other filters. The emboss and sharpen filters, for instance, are interesting. Or for a challenge, check out the box blur or the median filter functions and look at their parameters to get a sense of how they're being used. Okay, let me show you one more function in this lecture, which is crop. This removes portions of the image except for the bounding box that you describe. When you think of images, think of individual dots or pixels which make up the image being lined up on a grid. You can actually see the number of pixels high the image is and the width of the image. So let's do that here. We'll print the image. We'll use the string formatting to pass in two parameters. The first is going to be image.width, the width of the image in pixels and image.height, the height of the image in pixels. We see it's 800 by 450. This means that this image is 800 pixels wide and we call that the X axis and 450 pixels high and we call that the Y axis. If we take a look at the crop documentation, we see that the first parameter to the function is a tuple, which is the left, upper, right and lower values of the X, Y coordinates. So the two corners of the image. So help image.crop and we run that and we see that there's a box that we provide. With pill images, we define the bounding box using the upper left corner and lower right corner and we count the number of pixels out from the upper left corner which is 00. This might seem odd if you're used to coordinate systems which start in the lower left. A lot of us learnt these in primary school mathematics. Just remember that we define our box in the same way we count out the positions in the image. So if we wanted to get the Michigan logo out of this image, we might start with the left at say 50 pixels and the top at 0 pixels and then we might walk to the right another 190 pixels and set the lower bound to say 150 pixels. So let's display and we'll call image.crop and we pass it, a parameter is a tuple. So remember that we're going to put all four of these parameters in one tuple object and pass it in and this should display the image. There we go. That's the school of information, a logo cropped out of the image. Of course, crop like other functions only returns a copy of the image and doesn't change the image itself. A strategy I like to do is to try and draw the bounding box directly on the image where I'm trying to line things up. We could draw on images using the image draw object. I'm not going to go into this in detail, but here's a quick example of how I might draw the bounding box in this particular case. So from pill will import the image draw object. We'll instantiate this, so we'll call image draw dot draw and we pass it the image and you can check the documentation to learn more about how you would use this object. Then we say drawing object dot rectangle and we give it the rectangle the outline that we're actually interested in and we give it a fill value and you can set the fill to different colors or types and an outline and in this case we want to make it red. And because we ran this on the drawing object it actually executes it and the drawing object has a reference to the underlying image. So let's display this image now in the Jupiter notebook. There we go. We have an image which is our main image of the logo that's outlined in the red box for our flyer. Okay, that's been an overview of how to use pill for single images. But a lot of work might involve multiple images and in fact your assignment involves multiple images. So in the next lecture we're going to tackle that and set you up for this assignment. I'll see you there. Let's take a look at some other functions we might want to use in Pillow to modify images. First let's import all of the library functions we need. So we'll import pill like we've been using. From Pill we'll import image and from ipython.display we'll import display. And let's load the image we're working with and we can just convert it to RGB inline. So that image was in read-only slash msi recruitment .jif and here we'll just image.open the file and then that returns an image object which we call convertOn, send it the RGB string and now we've got our image. Let's just display that image inline here. Alright so that's the image we've been working with. A task that's fairly common in image and picture manipulation is to create contact sheets of images. A contact sheet is one image that actually contains several other different images. So let's try and make a contact sheet for the master of science and information advertisement image. In particular let's change the brightness of the image in 10 different ways then scale the images down to smaller sizes and put them side by side so that we can get a sense of which brightness we might want to use. So first up let's import the image enhance module which has a nice object called brightness. So from Pill import image enhance. Checking the online documentation for this function it takes a value between 0.0 which is a completely black image and 1.0 which is the original image to adjust the brightness. All of the classes in the image enhance module do this in the same way. You create an object in this case brightness then you call the enhance function on that object with the appropriate parameter. Let's write a little loop to generate 10 images of different brightnesses. First we need the brightness object with our image. So create a variable called enhancer and we'll set this to image enhance.brightness and pass in the image. Now let's create a new list for our images and then for each image so from 0 to 10 let's do some processing of this. So divide i here by 10 to get the decimal values we want and append it to the images list. We actually call the brightness routine by calling the enhance function as I said. So remember that you can dig into the details of this using the help function if you want to or by consulting the web docs. So we'll do images.append and we'll do enhancer.enhance and we'll just take our counter parameter here i and divide it by 10. Well first we'll get 0 and then 0.1, 0.2 and so forth. We can see that the result here is going to be a list of 10pill image.image objects. Jupyter nicely prints out the value of python objects nested in lists. So let's just print out the images here. And there we go. We can see that we have these pill.image.image objects and pill actually when we call print on them the wrapper function has been written so we get to see a little bit more information about each of these objects. We see that the mode is RGB of course we knew that because we converted that but we also see the size and then they give us a memory location which for our cases is not particularly interesting but you can see that that changes for each of the images so we know they're pointing to different images. Let's take these images now and composite them one above another into a contact sheet. There's several different approaches we can use but I'll simply use a new image which is like the first image but 10 times as high. Let's check out the pill.image.new functionality so we can do this using help which is today help pill.image.new and run that cell. The new function requires that we pass to it a mode. We're going to use the mode RGB which stands for red, green, blue and it's the mode of our current first image. There's lots of different image mode formats and this one is actually one of the most common. For the size we have a tuple which is the width of the image and the height. We'll use the width of our current first image but for the height we'll multiply this by 10. This will make a sort of canvas for our contact sheet. Finally the color is optional and we'll just leave it to black. So we'll create a new variable here called first image and we'll use that a lot and it's just the first image from the list but we're going to use it as a reference for sizes. From pill we want to import image and then let's create the contact sheet and so this is we call pill.image.new so we're invoking the image new constructor or what will be the constructor to give us the first image mode so that's RGB, that's the first parameter and then we give it the width and then 10 times the height. So now we have a black image that's 10 times the size of the other images in the contact sheet variable. Now let's just loop through the image list and paste the results in. The paste function will be called on the contact sheet object and it takes in a new image to paste as well as an XY offset for that image. In our case the X position is always 0 but the Y location will change by 450 pixels each time that we iterate through the loop. So let's first create a counter variable for the Y location it will start at 0 so we'll call that counter current location and then for each image in our images list let's paste the current image into the contact sheet so we call contact sheet.paste we give it the image and we give it a 0 which is the X location so the far left and then the current location which is our Y location and then let's update the current location counter so we update this by increasing it by 450 pixels which is the height of our image. This contact sheet has gotten big, 4500 pixels tall. Let's just resize this sheet for display. We can do this using the resize function. This function takes in a tuple of width and height and we'll resize everything down to the size of just two individual images. So we just call contact sheet.resize and give it the tuple size that we wanted to scale to. Now let's just display that composite image. We'll call display with contact sheet. You can see here that we have all of these images, they're stacked on one another and there are different brightnesses. They really go from quite dark, maybe even black to almost our full normal image. So that's a nice proof of concept, but it's a little tough to see. Let's instead change this to a 3 by 3 grid of values and this is something you'll use in your assignment. So first thing we should do is make our canvas. We'll make it 3 times the width of our image and 3 times the height of the image, a 9 image square. So we'll call pill.image.new. Again we want to pass in the mode of the image we're using and then the width and the height each multiplied by 9 and we'll save this as our new contact sheet. Now we want to iterate over our images and place them into this grid. Remember that in pill we manage the location of where we refer to an image in the upper right hand corner. So this will be 0.0. Let's use one variable for the x dimension and one y dimension. So x equals 0 and y equals 0. Now let's just iterate over our images. Except we don't want to bother with the first one because it's just solid black. Instead we want to just deal with the images after the first one and that should give us a 9 total. So for image in images and we just use slicing here to pull out from the first image, that is the one in the one position as opposed to the zeroth position to the end of the list and that gives us a sub-list that we're going to iterate over. Let's paste the current image into the contact sheet and so we do this with contact sheet dot paste and we pass it in the image and the x, y position. Now we need to update our x position. If it's going to be the width of the image then we set it to 0 and we update y as well to the point to the next line of the contact sheet. So if x plus first image dot width is equal to the contact sheet dot width we'll set x to 0 and we'll update y and we'll make y equal to the first image dot height. So if this isn't true then we'll just update x and so x will be equal to x plus the first image dot width and this will move us over a little bit horizontally but won't change our vertical alignment. Now let's resize the contact sheet. We'll make it just half the size by dividing it by 2 and because the resize function needs to take round numbers we need to convert our divisions from floating point numbers into integers using the int function and so we'll take our contact sheet we'll resize it. We want to really pass two parameters but we need to wrap those in int calls to make them integers whole numbers because we're doing a division and so when you take two integers and divide one by the other integer you always get a floating point return and we want to typecast this into integers. Now let's display that as a composite image and so just like before we display the contact sheet in Jupiter. We see here we have our 3 by 3 grid in the upper left we start with very dark and it gets lighter as we go across and then it wraps around and it gets lighter and lighter until we get to the lightest image in our brightness levels at the lower right and you could see actually how a photographer who might be looking to change the brightness of an image let's say they're going to put it in the background of a video or they want to put it as a background of a page and put text over it might want to see it in a bunch of different variations a bunch of different brightnesses and we've written a function which could do that so if something like Photoshop was written in Python you could wire up a button now with this logic that just shows a bunch of alternative versions of the image. Well that's been a tour of our first external API the Python imaging library or the pillow module. In this series of lectures you've learned how to read and write images manipulate them with pillow and explore the functionality of third party APIs using features of Python like dir help and get MRO. You've also been introduced to the console and how Python stores the libraries on the computer while for this course all of the libraries are intended for you to use in the Coursera website in the Jupiter system and you won't need to install your own it's good to get the idea of how this work is done in case you wanted to set things up on your own PC. Finally while you can explore pillow from within Python most good modules also put their documentation up online and you can read more about pillow here and read the docs and this will be very useful for your assignment which comes up soon. All right now it's time for the assignment so the first assignment actually follows on the first set of lectures which uses Pill and the pillow library and so I've done a bunch of digital photography just as a hobby myself and something I'm very interested in is creating variations on images. So I'll take a nice photo and I did some black and white photography and I wanted to see that photo in sepia tones so the little browns or warm colors where I wanted to see it in cool gray tones gives it a more architectural feel a more distant and changes the emotion even though it's a black and white image usually there's not just black and white and this is interesting both in the images that we can create digitally but actually there's this whole world of pigmentation and color science as well when it comes to actually printing these kinds of images. So in this assignment I want you to improve upon a variations that we did in the main lecture so we looked at variations of brightness but I'm going to let you dig a little bit further into this RGB setup of images so in an image a given pixel one by one pixel can be represented as red, green and blue color channels and those three things are mixed together to provide an overall color so purple might have a lot of red and a lot of blue in it and a little bit of green for instance and so in this assignment you're actually going to manipulate those color channels directly and create a contact sheet that shows us what some of those manipulations look like. So have some fun with this assignment I think you can explore quite a bit about images in this assignment if you do any digital photography, if you do stuff with Photoshop or the like you'll probably enjoy this assignment and I'll see you next week when we talk about Tesserac and optical character recognition. Optical character recognition or OCR is the conversion of text captured in images into text usable by a computer. In other words an OCR tool can read the text in images. OCR is a common method of processing large volumes of printed text especially when that text isn't available in a digital format. In practical application OCR has been used to scan the pages of books to recognize license plates and even to convert handwriting into digitized text. Let me share an example for my own work. During my doctoral degree I was working on an open source system called OpenCast Matterhorn. This system allows for the automated recording of lectures within universities similar in some ways to the video you're watching now. The system generally ran automatically when an instructor was teaching a course and the video was uploaded to the web immediately following the lecture without any input from technicians. This was great but it was hard to find videos about a given topic that might have been covered during the lecture. To deal with this we built a search index from the contents of videos themselves. We essentially broke a video up into a sequence of images ran OCR on each image to determine what text might have been shown to the students from the slides in the classroom then created a search index using this technique. In the project for this course you're going to be doing something similar but we're going to do it with images of newspapers instead. In this module of the course we're going to use an OCR engine called Tesseract. Tesseract was originally developed between 1984 and 1994 as a PhD research project at HP Labs. The engine vastly outperformed commercial products at the time but then development was stopped until HP released Tesseract as open source in 2005. In 2006 Google began maintaining the tool and has since released updated versions of Tesseract with support for over 100 languages. I think Tesseract is a great tool and a wonderful example of how companies can engage in open source software development. So before we spend a lot of time talking about Tesseract let's talk more about open source software. Have you noticed that the source code for all of the libraries we've discussed is openly available to the public? Publicly available software is often known as open source software or OSS. Specifically open source software is software whose creator released the source code under an open source license thereby granting anyone the right to access, modify and distribute the software. The open source initiative OSI defines open source software as software that can be freely accessed, used, changed and shared in modified or unmodified form by anyone. You can find the full criteria for open source software on the OSI website. The first three stipulations are the core of OSS. That is that the software is available without charge, the source code is public and accessible and that this license, whatever it is, is to be adhered to by all derivative works. Open source development was pioneered by computer scientist Eric Raymond in his landmark 1997 essay The Cathedral and the Bazaar. It is well regarded as a software development technique as it lowers consumer cost and increases code flexibility, security and accountability due to its community sourced nature. And there's good business sense to using open source software too. According to a 2008 study OSS saves consumers $7 annually and much open source software allows companies to build on top of it lowering infrastructure costs. There are different types of open source licenses and the field can be a bit confusing to understand especially for businesses. For example, the Apache license allows the linking of Apache license code with differently licensed code. As a developer you may find such a license feature useful if you want to include a closed source library or a proprietary library in your open source project. On the other hand, under the GNU public license or GPL one can link only other GPL compatible libraries. You may find this license feature desirable if your open source project is composed of entirely open source code and you wish to ensure that this will always be the case regardless of who in the future uses your code. GPL is probably the most well-known and perhaps common open source software license in part due to its viral nature which requires all links of software to also be GPL licensed. One of my favorite licenses to use and a license I love to see used on libraries that I use is called the BSD license. It was originally put together by the University of California Berkeley for the release of the BSD operating system. Here's an example of the BSD license. As we can see it's very short and to the point, it maintains just a minimal amount of legal ease. The issues surrounding the features of free and open source licenses quickly become political and philosophical and the choice of license often depends on the project that you have in mind. If you're interested in understanding open source licenses in more detail I would encourage you to check out the Wikipedia article on the topic of that link in the course. For the rest of the lectures in this module, we're going to use two projects. The Tesseract project which is now run by Google and the PyTesseract bindings which allows us to use the Tesseract system from within Python. If we look at the source code repository for Tesseract we see that its license is released under the Apache license. This means we can use Tesseract in any of the code we produce and keep it licensed however we want. Even commercially licensed for instance, unless we change Tesseract itself. However, when we look at the PyTesseract license we see it is released under the GPL. That means by importing this library into our own code, if we share it with others we must also license our code under the GPL. This is the viral clause and depending on the project I were working on I would have to consider this very carefully. However, for this course it's no problem since I really do want to share it with you and others broadly. So let's dig in and let's start using Tesseract. We're going to start experimenting with Tesseract using just a simple image of nice clean text. Let's first import PIL from image and display the image text.ping from PIL will import image and image is equal to image.open and we'll get the text.ping to the read only and then we'll display that image. Great, we have a base image of some big clear text. Let's import PyTesseract and use the dir function to get a sense of what might be some interesting functions to play with. So import PyTesseract then we can use dir to see what's inside of it. Okay, it looks like there's just a handful of interesting functions and I think image to string is probably our best bet. Let's use the help function to interrogate this a bit more. So help PyTesseract image to string. So this function takes an image as the first parameter and there's a bunch of optional parameters and it'll return the results of the OCR. I think it's worth comparing this documentation string with the documentation we were receiving from the pillow module. Let's run the help command on the image resize function. So help image image.resize Notice how the pillow function has a bit more information in it. First off, it's using a specific format called restructured text which is similar in intent to document markup such as HTML, the language of the web. The intent is to embed semantics in the documentation itself. For instance, in the resize function we see the words param size with colon surrounding it. This allows documentation engines which create docs for source code to link the parameter to the extended docs about that parameter. In this case, the extended docs tell us that the size should be passed as a tuple of width and height. Notice how the docs for image to string, for instance, indicate that there's a lang parameter which we could use, but then fail to say anything about what that parameter is for or what its format is. What this really means is that you need to dig deeper. Here's a quick hack if you want to look at the source code of a function. You can use the inspect get source command and print the results. So let's import inspect and remember this module comes from our Python 3 standard library and then we'll create the source inspect.get source and we pass it a function pointer. You note that we're not calling the function. We're just passing a reference to the function and then let's print that source to the screen. So it's kind of interesting. You can actually look at the source code behind a given function and that's one of the powers of an interpreted language like Python. There's actually another way in Jupyter and that's to append two question marks to the end of a given function or module. Other editors have similar features and this is actually a great reason that you should be using a software development environment. So PyTesterAct.imageToString, just as if we were going to call the function, add question marks to the end and then run that and we see that it pops up at the bottom of the screen with a lot more information and it's nice and syntax highlighted for us too. We can see from the source code that there really isn't much more information about what the parameters are for or what this image to string function is. This is because underneath the PyTesterAct library is calling a C++ library which does all of the hard work and the author just passes through all of the calls to the underlying Tesseract executable. This is a common issue when working with Python libraries and it means that we need to do some web sleuthing in order to understand how we can interact with Tesseract. In a case like this, I just googled Tesseract command line parameters and the first hit was what I was looking for. Here's the URL to the GitHub. This goes to a wiki page which describes how to call the Tesseract executable and as we read down we see that we can actually have Tesseract use multiple languages in its detection such as English and Hindi by passing them in as ng plus hin. That's very cool. One last thing to mention. The image to string function takes in an image but the docs don't really describe what this image is underneath. Is it a string to an image file, a pillow image or something else? Again we have to sleuth and or experiment to understand what we should do. If we look at the source code for the PyTesseract library we see there's a function called run and get output. Here's a link to that function on the author's GitHub account. When we look at this function we can actually see what actually happens when we call this function. In this function we see that one of the first things which happens is the image is saved through the save image function. Here's that line of code. And we see that there's another function called prepare image which actually loads the image as a pillow image file. So yes sending a pill image file is appropriate for use for this function. It sure would have been useful for the author to have included this information in restructured text to help us not have to dig through the implementation itself. But this is an open source project. Maybe you would like to contribute back some better documentation? Just a hint if you're interested in it the dock line we need is param image and then we just say that it's a pill image dot image file or an nd array of bytes. In the end we often don't do this full level of investigation. We just experiment and try things. It seems pretty likely that a pill image dot image would work given how well known pill is in the Python world. But still as you explore and use different libraries you'll see a breadth of different documentation norms. So it's useful to know how to explore the source code. And now that you're at the end of this course you've got the skills to do so. Okay let's try and run tesseract on this image. So text is equal to pi tesseract dot image to string and we pass in the image and then let's just print out the text. Looks great. We can see that the output includes new line characters and faithfully represents the text but doesn't include any special formatting. Let's go on and look at something with a bit more nuance to it next. In the previous example we were using a clear unambiguous image for conversion. Sometimes there will be noise in images you want to OCR making it difficult to extract the text. Luckily there are techniques we can use to increase the efficacy of OCR with pi tesseract and pillow. Let's use a different image this time with the same text as before but with added noise to the picture. We can view this image using the following code. So from pill will import image pretty common for us now then we'll do an image dot open and we'll pull out this noisy OCR dot ping and then we'll use the display function in jupiter to display it in line. As you can see this image has shapes of different opacities behind the text which can confuse the tesseract image. Let's see if OCR will work on this noisy image. So import pi tesseract then we'll call pi tesseract dot image to string and we'll just pass it in the image that we're going to open this noisy OCR and then let's print out the text directly. This is a bit surprising given how nicely tesseract worked previously. Let's experiment on the image using techniques that will allow for more effective image analysis. First up let's change the size of the image. So first we'll import pill then we set the base width of our image so the base width we'll set it to 600 points, these are in pixels. Now let's open the image so that's old hat for us now and we'll assign this to IMG. We want to get the correct aspect ratio so we can do this by taking the base width and dividing it by the actual width of the image. So I'm going to create a new variable called w% to make this equal to the base width divided by the image.size sub zero which is the width value there. With the ratio we can just get the appropriate height of the image too so I'll make something called age size and set this to image size sub one and we'll times that by the percentage so we're just scaling here. Finally, let's resize the image. Anti-aliasing is a specific way of resizing lines to try and make them appear smooth. So here I'll just call image.resize I pass it a tuple which is the base width and the height size and then I use this pill.image.anti-aliase to really just create better lines. Now let's save this to a file so I'll call this image.save resize noise.ping you can call it whatever you'd like and finally let's display it in line so we'll call display and then let's run OCR so again PyTestRact image to string and I'm going to open this new image underneath I guess I could have just passed an image here and then print the text. So there's not actually any improvement for resizing the image and this is sometimes life when you're experimenting and trying to get things like this to work. Let's convert the image to grayscale. Converting images can be done in many different ways. If we poke around in the pillow documentation we find that one of the easiest ways to do this is with the convert function and we pass in the string a capital L so let's open the image that we're working with and then let's call img.convert and pass in a capital L now let's save that image I'm going to call it grayscale underscore noise.jpg here. Remember pill always worries about the file format for you based on the name of the image so ending in .jpg here versus ending in .ping is fine and then let's run OCR on the grayscale image and so to prove there's no shenanigans I'll open that grayscale image that we saved and pass it to image to string in PyTestRect and print out the text Wow that worked really well so if we look at the help documentation using the help function as in help img.convert we see that the conversion mechanism used is the itu-r601-2 luma transform there's more information about this out there but this method essentially takes a three channel image where there's information for the amount of red green and blue or rg and b and reduces it to a single channel to represent luminosity and that's what the L is for this method actually comes from how standard definition television sets encode color onto black and white images if you get really interested in image manipulation and recognition learning about color spaces and how we represent color both computationally and through human perception is a really interesting field even though we now have the complete text of the image there's a few other techniques we could use to help improve OCR detection in the event that the above two don't help the next approach I would use is one called binarization which means to separate into two distinct parts in this case black and white binarization is enacted through a process called thresholding if a pixel value is greater than a threshold value it will be converted to a black pixel if it is lower than a threshold value it will be converted to a white pixel this process eliminates noise in the OCR process allowing greater image recognition accuracy with pillow this process is very straightforward so let's open the noisy image and convert it using binarization so here we just image.open we're going to read our noisy image in and then we call convert and we pass in the character 1 note that we're passing in as a character not as a number so this is a string value we're passing in now let's save and display that image so img.save we'll call it black and white noise.jpg and display it you can see here the image is modeled there's various different patterns in it but definitely this is a black and white image so that was a bit magical and it really required a fine reading of the docs to figure out that the number one is the special string parameter to the convert function that actually does the binarization but you actually have all the skills you need to write this function by yourself let's walk through an example first let's define a function called binarize which takes in an image and a threshold value so I'll def binarize and image to transform and then some threshold value now let's convert the image to a single grayscale image using convert so here we just create some new output image this is what we'll end up returning and we'll transform the image that's passed in by the color to luminosity values only so right now there's nothing new magical here to be done this is just creating a grayscale image the threshold value is usually provided as a number between 0 and 255 which is the number of bits in a byte the algorithm for the binarization is pretty simple go through every pixel in the image and if it's greater than the threshold turn it all the way up so to 255 and if it's lower than the threshold turn it all the way down so that's to 0 so let's write this in code first we need to iterate over all the pixels in the image so for x in range and we'll just go over the width so values along the x-axis and then for y in range and we'll go through the height so these will be our values through the y-axis so for a given pixel at some width and height let's check its value against the threshold so we can do this with if output image.getPixel we'll just pull the pixel x and y you'll note lots of brackets here that's because we are actually passing a tuple value in we just check to see if it's less than some threshold value so let's set this to 0 if it is so in our output image we just put that pixel and we pass in the same xy and we put it to 0 so we're just changing it to 0 if it's less than a threshold otherwise we want to set this to 255 so outputImage.putPixel xy255 and now we just return the new image so let's test this function over a range of different thresholds remember that you can use a range function to generate a list of numbers at different step sizes range is called with a start a stop and a step size so let's try the range 0, 257 and 64 which should generate 5 images of different threshold values so for ThreshInRange 0 to 257 and then we're going to step at 65 let's print out a string to tell us what threshold we're trying here and so we want to change remember the Thresh value is an integer so we'll change it to a string here using the str function then let's display the binarized image inline and so the way we do this is the display function then we're going to call our function binarize we're going to pass it the image.open, read only, noisy, ocr we could of course cache this open it and pass it around as a parameter but it's okay for a demonstration to do it this way and then we'll send in the threshold value which will be 0 the first time, 64 the second time and so forth and let's use Tesseract on it it's inefficient to binarize it twice but this is really just for a demo so here we'll call print the pyTesseract.imageToString passing in then a call to binarize which passes in a call to image.open so there's a lot of image.opens here, lots of room this code could be improved but it should generate an example for us so you can see the result with threshold 0 is pretty empty with threshold 64 we actually get a very faint looking image but it seems like we get all of or most of the text when we increase the threshold to 192 from 128 we see that we actually pick up a new space between the words of and this so we're getting more definition in the text but then when we increase the threshold all the way to 256 we lose a lot of text because a whole segment of the image becomes black and then all of a sudden at the very top end threshold we get nothing because the whole image is black at that point we could see from this that a threshold of 0 essentially turns everything white that the text becomes more bold as we move towards a higher threshold and the shapes which have a filled in gray color become more evident at higher and the next lecture we'll look a bit more at some of the challenges you can expect when doing OCR on real data let's try a new example and bring together some of the things that we've learned here's an image of a storefront let's load it and try and get the name of the store out of that image so from pill we'll need the image package of course and then let's bring in Pytesseract as well so let's read in the storefront image I've loaded into the course and display it so I put this in the read only and we'll just open that as an image and display it in line and then finally let's try and run Pytesseract on that image and see what the results are so we'll call image to string on that we see at the very bottom that there's just an empty string Pytesseract is unable to take this image and pull out the name but we learned how to crop an image in the last set of lectures so let's try and help Pytesseract by cropping out certain pieces so first we have to set the bounding box the image the store name is in a box bounded by roughly 315,170,700 and 270 so I'll make a bounding box equal to this tuple remember that's the upper left corner and then we walk around the image and you can go back to the pill lecture if you want to be reminded how to do this now let's crop the image so we just call the image.crop and we pass in a bounding box and it doesn't change the image it turns a new image so we save this to this title image variable that we'll use later now let's display it and pull out the text so we'll pull out display and then we'll call Pytesseract on image to string and pass in the title image great so we see how with a bit of problem reduction we can make that work so now we've been able to take an image preprocess it where we expect to see text and turn that text to string that python can understand if you look back up at the image though you'll see that there's a small sign inside of the shop and that also has the shop name on it I wonder if we're able to recognize the text on that sign let's give it a try first we need to determine a bounding box for that sign I'm going to show you a shortcut to make this easier in an optional video in this module but for now let's just use the bounding box that I decided on we'll set this to a tuple of 900 by 420 for the upper left and then 940 by 445 for the lower right now let's crop the image so we just call image.crop pass it in the bounding box and we'll call this little sign for fun and display that little sign alright this is a little sign OCR works better with higher resolution images so let's increase the size of this image by using the pillow resize function let's set the width and the height equal to 10 times the size it is now in a w,h tuple so we'll take the new size and we'll make it equal to the little sign.width times 10 and the little sign.height times 10 now let's check the docs for resize we can see here that there's a number of different filters for resizing the image the default is image.nearest let's see what that looks like so we'll take our little sign.resize we'll pass in the new bounding box size so that's new size and then we'll say image.nearest all in caps and pass that to display so here you can see that it actually resized the image and now it's maybe much more readable I don't know I didn't have troubles maybe seeing it before although it was little and it says the word fossil I think we should be able to find something better though I can read this but it looks really pixelated let's see what all the different resize options look like you can go back up to the documentation to look at the names so here I'm going to make just a list of all the different names as options so image.nearest image.box image.bilinear image.hamming image.bycubic and image.lenosh is how you say that so for each of the options let's print out the option name so we'll print out whatever the option name is and then let's display what this option looks like on our little sign so here we're actually going to call little sign.resize pass in the new size pass in the option that we're looking at and call to display okay so you can see that this has run and we have a whole bunch of different numbers are printed and then different images that are interesting so from this we can notice two things first when we print out one of the resampling values it actually just prints an integer and this is actually really common that the API developer writes a property such as image.bycubic and then assigns it to an integer value to pass it around some languages use enumerations of values which is common in say Java but in Python this is a pretty normal way of doing things the second thing we learn is that there's a number of different algorithms for the image resampling in this case the Linoche and image.bycubic filters do a good job everything else not so much so let's see if we're able to recognize the text off this resized image so first let's resize to the larger size so I'm going to create something bigger sign and I'm going to take little sign.resize I'm going to pass in our new size that we want and then I'm going to use image.bycubic for lack of any personal preference you feel free to try one of the different methods and then let's print out the text so we'll call PyTesseract image to string and pass in the bigger sign well not really any text there let's try and binarize this so first let me just bring in the binarization code we did earlier now let's apply binarizations with say a threshold of 190 and try and display that as well as to do the OCR work so binarize remember this function takes in the sign or the image that we want to binarize and then a value between 0 and 255 and it's going to walk through pixel by pixel of the image and either set it to 0 or 1 so change it to straight up black and white and then we'll display what the binarize sign looks like and then let's actually try and get the text out with PyTesseract2 in the hopes that 190 is actually a good number for us to use well that looks pretty abysmal I would say it doesn't look at all like fossil I guess you can kind of see some of the S's there but really not much in that image at all ok so the text is pretty useless how should we pick the best binarization to use and there's a number of different methods but let's just try something very simple to show how this can work we have an English word that we're trying to detect it's called fossil if we tried all binarizations from 0 through 255 and look to see if there were any English words in that list this might be one way so let's see if we could write a routine to do this so we're problem solving on our own here so first let's load a list of English words into a list I put a copy in the read only directory for you to work with so create something an mdict it's just an empty list and then I'm going to open the read only words alpha dot text as read you can go back into one of the previous courses if this looks very familiar to you on how to work with files we're going to call the file f and then I'm just going to read all of f in in one giant chunk and put that in data so now we actually want to split this into a list based on those new line characters if you go look in that data file words alpha you'll see it's one word per line so I'll call data dot split on slash and this is the new line character and this will return a new list which is all of the different words and I'll put this into English dictionary now let's iterate through all the possible thresholds and look for an English word printing it out if it exists so for I in range 150 and 170 I'm just going to binarize between those ranges let's binarize and convert this to string values and then string will set to pi tesseract dot image to string so we'll binarize passing in the bigger sign and we'll give an I value so this will binarize with 150, 151, 152, 153 and so forth so we're going to try them all between these two threshold values 150 and 170 so we want to remove all non alphabetical characters so that includes parentheses, brackets, percentage signs, dollar signs etc from the text so here's a short method to do that so first let's convert our string to lowercase only so string dot lower and we'll just change string and then let's import the string package it's got a nice list of lowercase characters so import string and now let's just iterate over our string looking at a character by character putting it in the comparison text so create some new value comparison and then for every character in our string remember this is a lowercase if that character is in the string dot ascii lowercase so this is actually just checking to see if a single character is in a list of characters I remember a string and a list of characters are the same when you use in and if so then comparison is equal to comparison plus that character so we just append it to our output string alright finally let's search for the comparison in the dictionary file so that's easy in python in other languages you would have to do a lot of work but here we just use the in comparator and see if comparison is eng-decked and then we're going to print it out if we find it and so we'll print comparison alright let's run that so you should start to see that various characters come up and in my case fossil came up and w came up so w is also in this dictionary and w was detected in a data that we sent in at least one of the binarizations so well this is not perfect but we can see fossil there among other values and this is not a bad way actually to clean up OCR data it can be useful to use a language or domain specific dictionary in practice instead of all of the English language words especially if you're generating a search engine for specialized language such as medical knowledge base or locations so like cities if you scroll up and look at the data we're working with this tiny little wall hanging in the inside of the store it's really not so bad a lot of this comes down to the purpose that you're actually doing the OCR for so if you're using it for instance to back a search engine that's one thing if you're using it to do text to speech for instance and somebody is going to use this to listen to a lecture that's completely different and you have to have a very very strong method for generating the actual data so this point you've now learned how to translate images and convert them into text in the next module in this course we're going to dig deeper further into a computer vision library which allows us to detect faces among other things so then we'll go on to a culminating project I'll see you there in this brief lecture I want to introduce you to one of the more advanced features of the jupiter notebook development environment called widgets sometimes you want to interact with a function you've created and call it multiple times with different parameters for instance if we wanted to draw a red box around a portion of an image to try and fine-tune the crop location widgets are one way to do this quickly in the browser without having to learn how to write a whole large desktop application let's check it out first we want to import the image and image draw classes from the pillow package so from pill will bring in image and image draw then we want to import the interact class from the widgets package so from ipie widgets so this is included automatically because we're using the ipython interpreter will bring in interact we will use interact to annotate a function let's bring in an image that we know we're interested in like the store front image from a previous lecture so we'll bring that in from read-only and save that into the image object okay so our setup is done now we're going to use the interact decorator to indicate that we want to wrap the python function we do this using the at sign this will take a set of parameters which are identical to the function to be called then Jupiter will draw some sliders on the screen to let us manipulate those values decorators which again as the at sign is describing this are standard python statements and just a shorthand for functions which wrap other functions they're a bit advanced though so we haven't talked about them in this course and you might just have to have some faith so we do at sign interact this is our documentation our decorator and we're going to put in four values so we'll give it the left and the top I'll set the defaults for these at 100 and the right and the bottom and so this is where we actually want to draw our red box so now we just write the function as though we had it before so def draw border again it's going to take the exact same parameters as the decorator so left top right bottom we'll make a copy of the image so we're working here with a global image and we're just making a copy we'll create a drawing object based on the image that we've made a copy of and then we'll draw a rectangle and so we'll just pass in the left top right bottom and we'll set the fill to none and the outline to red and you can go back and look in the pill lectures if this doesn't look so familiar to you and then we'll just display it in line let's execute that and so here we have that we actually have our storefront image we have four sliders left top right and bottom you'll see that as we take one of these and we drag them over it changes so this is changing the left hand side if we wanted to if we take the top or why don't we take the right and stretch this out a bit so you can see that we can stretch that out you'll see that it's a little laggy there's a lot of documentation about how to use these well and there's ways that you can speed it up so it's not doing all these intermittent values but I essentially just did this and built out my my sliders nicely so that I could actually just pull out the text that I was interested in so Jupyter Widgets is certainly advanced territory but if you'd like to learn more to explore you can check out the website for iPython Widgets at Read the Docs and I would encourage you to do so and play with this nice platform for doing Python the next library we're going to look at is called Kraken which was developed by the University of PSL in Paris it's actually based on a slightly older codebase, Acropos and you can see how the flexible open source licenses allow new ideas to grow by building upon older ideas and in this case I fully support the idea that the Kraken a mythical, massive sea creature is the natural progression of an octopus what we're going to use Kraken for is to detect lines of text as bounding boxes in a given image the biggest limitation of Tesseract is the lack of a layout engine inside of it Tesseract expects to be using fairly clean text it gets confused if we don't crop out other artifacts it's not bad but Kraken can help us by segmenting pages let's take a look first we'll take a look at the Kraken module itself so import Kraken unless we're on help on Kraken so there isn't much of a discussion here but there are a number of sub modules that look interesting I spent a bit of time on their website and I think the PageSeg module which handles all of the page segmentation is the one we want to use let's look at it so Kraken will import PageSeg and then help on PageSeg so it looks like there are a few different functions that we can call and the segment function looks particularly appropriate I love how expressive this library is on the documentation front I can see immediately that we're working with pill.image files and the author has even indicated that we need to pass in either a binarized example one or a grayscale example L for Luminance image we can also see that the return value is a dictionary object with two keys text direction which will return to us a string of the direction of text and boxes which appears to be a list of tuples where each tuple is a box in the original image let's try this on the image of text I have a simple bit of text in a file called tocall.png which is from a newspaper on campus here so fromPill will import image as normally and then image.open in readonly slash tocall.png let's display the image in line so we'll just call display with im and let's now convert it to black and white and segment it up into lines with kraken so for this we'll make some new variable bounding boxes is equal to PageSeg.segment and then im.convert and we'll binarize that sub boxes and let's print those lines to the screen so I'll just print bounding boxes alright so we see the image here and we see the bounding boxes okay so pretty simple two column text and then a list of lists which are the bounding boxes of the lines of that text let's write a little routine to try and see the effects a bit more clearly I'm going to clean up my act a bit and write real documentation too it's good practice so def showboxes we'll call it and we'll take in a parameter image so the docs I say modifies the past image to show a series of bounding boxes on an image is run by kraken our parameter image is a pill.image object that makes it easier for other people to use this function and our return is also going to be an image the modified pill.image object okay let's bring in our image draw object first so from pill import image draw this was covered in earlier lectures you can go back if you're interested and let's grab a drawing object to annotate that image so we'll create some new variable drawing object image draw.draw and we'll pass in the image that we want to be able to draw we can create a set of boxes using the page seg.segment so bounding boxes is equal to page seg.segment we'll convert our image remember we have to binarize our luminance as sub boxes let's go through that list of bounding boxes so for box and boxes we're just going to draw a nice rectangle so drawing object.rectangle we'll give it the box we're interested in we'll set the fill to none and the outline to red and to make it easy we're just going to return that image object so return img to test this let's use display so here display then show boxes then we'll read in the image image.open we could of course reuse the image but this is good practice when you're using jupiter notebooks alright so we see our image here with a bunch of red boxes so not bad at all it's interesting to see that kraken isn't completely sure what to do with this two column format in some cases kraken has identified a line in just a single column well in other cases kraken has spanned the line marker all the way across the page does this matter well it really depends on our goal in this case I want to see if we can improve on this a little bit so we're going to go a bit off script here well this week of lectures is about libraries the goal of this last course is actually to give you confidence that you can apply your knowledge to actual programming tasks even if the library you're using doesn't quite do what you want I'd like to pause the video for a moment and collect your thoughts looking at the image above with the two column example in red boxes I do think we might modify this image to improve kraken's ability to detect lines so thanks for sharing your thoughts I'm looking forward to seeing the breadth of ideas that everyone in the course comes up with here's my partial solution when looking through the kraken docs on the page seg function I saw that there are a few parameters we can supply in order to improve the segmentation one of these is the black call seps parameter if set to true kraken will assume that the columns will be separated by black lines this isn't our case here but I think we can have all of the tools that we need to go through and actually change the source image to have a black separator between columns so the first step is what I want to update the show boxes function I'm going to just do a quick copy and paste from above but add in the black call seps equals true parameter the next step is to think of the algorithms we want to apply to detect a white column separator in experimenting a bit I decided that I only wanted to add the separator if the space was at least 25 pixels wide which is roughly the width of a character and 6 lines high the width is easy let's just make a variable so car width equals 25 the height is harder since it depends on the height of the text I'm going to write a routine to calculate the average height of a line calculate line height and we'll pass it in the image so our docs for this calculates the average height of a line from a given image and we'll take a pill dot image object and we'll return the average height of a line in pixels let's get a list of the bounding boxes for this image so we'll convert this using pageseg dot segment remember binarize always and we just want to pull out the boxes each box is a tuple of top, left, bottom, right so the height is just top minus bottom so let's just calculate this over the set of all boxes so we'll set some height accumulator to be 0 for box and bounding boxes and then the height accumulator is equal to the height accumulator plus box sub 3 minus box sub 1 this is a bit tricky remember that we start counting in the upper left corner in pill so for those of you who are used to starting in the upper left corner not true with images we start in the upper left normally now let's just return the average height and let's change it to the nearest full pixel by making it an integer so we'll just return and we'll type cast this to an integer which we'll just cause a rounding and height accumulator divided by the number of bounding boxes and let's test this with the image that we've been using up till now so line height equals the rate of images are open read only slash to call dot ping and we'll just print out the line height ok so the average height of a line is 31 pixels now we want to scan through the image looking at each pixel in turn to determine if there is a block of white space how big of a block should we look for that's a bit more of an art than a science looking at our sample image I'm going to say an appropriate block should be one character width wide and 6 line heights tall but honestly I just made this up by eyeballing the image so I would encourage you to play with the values as you explore let's create a new box called gap box that represents this area so gap box is equal to 00 and then our car width and then our line height times 6 let's just look at that gap box it seems we'll want to have a function which, given a pixel in an image can check to see if that pixel has white space to the right and below it essentially we want to test to see if the pixel is in the upper left corner of something that looks like the gap box if so then we should insert a line to break up this box before sending to Kraken let's call this new function gap check so def gap check we'll pass in an image in a location so here are docs check the image in a given x y location to see if it fits the description of a gap box our first parameter is a pill.image file our second parameter location is a tuple x y which is a pixel location in that image so we're not going to pass x and y separately we're going to pass x y together in a tuple we're going to return true if that fits the definition of a gap box otherwise we'll return false recall that we can get a pixel using the image.getPixel function from pill it returns the value as a tuple of integers one for each color channel our tools all work with binarized images black and white so we should just get one value if the value is zero it's a black pixel if it's white then the value should be 255 we're going to assume that the image is in the correct mode already in that it already is binarized the algorithm to check our bounding box is fairly easy we have a single location which is our start and then we want to check all the pixels to the right of that location up to gap box sub 2 so for x in range location sub 0 so that's our x value 2 location sub 0 plus gap box sub 2 so that's our offset and the height is basically the same so let's iterate a y variable to gap box sub 3 so for y in range location sub 1 2 location sub 1 plus gap box sub 3 we want to check if the pixel is white but only if we're still within the image so if x is less than the image width and y is less than the image dot height if the pixel is white we don't want to do anything if it's black we just want to finish and return false so if image dot get pixel xy is not equal to 255 then we'll return false if we've managed to walk through the whole gap box without finding any non-white pixels then we can return true this is actually a gap so we'll just return true alright we have a function to check for a gap called gap check what should we do once we find a gap for this let's just draw a line in the middle of it let's create a new function so def I'll call this draw sep and it'll take an image in a location and this draws a line in the image in the middle of a gap discovered in the location note that this doesn't draw the line in the location but draws it to the middle of the gap box starting at the location so the first parameter is a PIL image file and then our tuple xy which is the pixel location so first let's bring in all of our drawing code so from PIL we'll import draw and then we'll create a drawing object which equals image draw dot draw image next let's decide what the middle means in terms of coordinates in the image so x1 is equal to location sub 0 plus and then we'll just take our gap box size x size divided by 2 and round it to an n and our x2 locations actually just the same thing since this is a 1 pixel vertical line so we'll just say x2 is equal to x1 our starting y coordinate is just the y coordinate that was passed in which is the top of the box so y1 equals location sub 1 but we want our final y coordinate to be the bottom of the box so y2 is equal to y1 plus the gap box height which is gap box sub 3 and then we'll actually do the work drawing object dot rectangle we'll pass in x1, y1, x2, y2, set the fill to black here I'll set the outline to black and that'll draw us a nice rule that's a vertical rule and we don't have anything we need to return from this because we actually modify the image directly in line alright now let's try it out this is pretty easy we can just iterate through each pixel in the image check if there's a gap then insert a line if there is so def process image take an image so we're going to take in an image of text and add this black vertical bars to break up columns fill dot image file both in and out we're going to start with a familiar iteration process so for x in range width and for y in range height we're going to check to see if there's a gap at this point so if gap check sub sorry if gap check and then we'll pass at the image and the tuple is true then we're going to update the image and draw a separator in it so then we'll just call our draw sep image xy and for good measure we'll return the image we modified so return image alright let's let's test it out let's read in our test image and convert it through the binarization process so i is our new image we'll read this in and we'll convert it to luminance here and then we're going to call process image and then since we returned it we're going to call display now you can notice immediately that this function didn't return right away in fact you're sitting there kind of wondering what's happening you can see the asterisk in the margin in jupiter that tells you that the backend processor is still working and this will actually take a fair bit of time on the Coursera system and so reflect a little bit on what's happening in the code that we've written we're iterating over every pixel in the image both through the x and y directions and we're looking to see if there's a gap box to the right and to the lower side of that pixel then we're going to try and draw a line if there is and then we just go immediately to the next pixel so you can see there's lots of opportunities for optimization of this code it's really just meant to be a demonstration of what you can do yourself when you start combining these libraries we're going to use the magic of video to speed this up a little bit for you for the video lecture but if you're following in the jupiter notebooks and I hope that you are please think about how you might change that to modify the image not bad at all the effect at the bottom of the image is a bit unexpected to me but it makes sense you can imagine that there's several ways we might try and control for this but let's see how this new image works when we try and run it through the crack and layout engine so we'll say display show boxes and because we stored i it makes it easy so it looks like that's actually pretty accurate and fixes the problems we faced feel free to experiment with different settings for the gap heights and width and sharing the forms you'll notice through this method it's really quite slow which is a bit of a problem if we wanted to use this on larger text but I wanted you to see how you could mix your own logic and work with libraries you're using just because crack it didn't work perfectly doesn't mean we can't build something more specific to our use case on top of it I want to end this lecture with a pause and ask you to reflect on the code we've written here we started this course with some pretty simple use of libraries but now we're digging in deeper and solving problems ourselves with the help of libraries before we go to our last library how well prepared do you think you are to take your Python skills out into the wild OpenCV supports reading of images in most file formats such as jpegs, pings and tiff most image and video analysis requires converting images into grayscale first this simplifies the image and reduces noise while allowing for improved analysis let's write some code that reads in an image of a person, Floyd Mayweather and converts it into grayscale so first we're going to import the OpenCV package cv2 so import cv2 as cv then we'll load the Floyd.jpg image so image equals cv.read from read only Floyd.jpg and we'll convert it to grayscale using the cvColor image so gray equals cv.CVTColor this is a function in OpenCV pass in the image and then cv.color bgr to gray now before we get to the result let's talk about the docs just like Tesseract OpenCV is an external package written in C++ and the docs for Python are really poor this is unfortunately quite common when Python is being used as a wrapper thankfully the web docs for OpenCV are actually pretty good so hit the website at docs.opencv.org when you want to learn more about a particular function in this case, cv.Color converts from one color space to another and we're converting our image to grayscale of course we already know at least two different ways of doing this using binarization and pill color space conversions so let's inspect this object that has been returned so import inspect this is in the standard library and inspect.getmro and then type of gray so we see that this is a type of ndarray which is a fundamental list type coming from the numerical Python project that's a bit surprising up until this point we've been used to working with these pill.image objects OpenCV however we want an image as a two-dimensional sequence of bytes and the ndarray which stands for an n-dimensional array is the ideal way to do this let's look at the contents of the array so just gray so the array shown here is a list of lists where the inner lists are filled with integers the dtype equals un8 definition indicates that each of the items in the array is an 8-bit unsigned integer which is very common for black and white images so this is a pixel by pixel definition of the image the display package however doesn't know what to do with this image so let's convert it into a pill object to render it into the browser so from pill import image pill can take an array of data with a given color format and convert this into a pill object and this is perfect for our situation as the pill color mode L is just an array of luminance values in unsigned integers so the new image equals image.fromArray gray and then we tell it luminance values then display image let's talk a bit more about images for a minute numpy arrays are multi-dimensional for instance we can define an array in a single dimension so import numpy as mp and then singledim equals mp.array we just pass into this a list of all of the integer values in an image this is analogous to a single row of five pixels each in grayscale but actually all imaging libraries tend to expect at least two dimensions a width and a height and to show a matrix so if we put the single dim inside of another array this would be a two-dimensional array with the equivalent of one element in the height and five in the width so we can go double dim equals mp.array and here we've just taken the single dimension array and put it inside another bracket and so double dim so this should look pretty familiar it's a lot like the lists of lists we saw above let's see what this new two-dimensional array looks like if we actually display it to the screen so display and then we have to convert to a pill image.fromeray we pass in the double dim and say that it's a luminance and this should display it to the screen so that's pretty unexciting it's just a little line it's actually five pixels in a row to be exact of different levels of black the numpy library has a nice attribute called shape that allows us to see how many dimensions big an array is the shape attribute returns a tuple that shows the height of the image so double dim dot shape let's take a look at the shape of our initial image that we loaded into the image variable so img dot shape this image has three dimensions that's because it's got a width a height and what's called a color depth in this case the color is represented as an array of three values let's take a look at the color of the first pixel so the first pixel is equal to image sub zero on the width, sub zero on the height and so the first pixel here we see that the color value is provided in full RGB using an unsigned integer this means that each color can have one of 256 different values and that the total number of unique colors that can be represented by this data is 256 by 256 by 256 which is roughly 16 million colors we call this 24 bit color which is 8 plus 8 plus 8 24 if you find yourself shopping for a television you might notice that some expensive models are advertised as having 10 bit or even 12 bit panels these are televisions where each of the red, green and blue color channels are represented by 10 or 12 bits instead of 8 for 10 bit panels this means that there's over 1 billion colors capable and 12 bit panels are capable of over 68 billion colors we're not going to talk much more about color in this course but it's a fun subject instead let's go back to this array representation of images because we can do some interesting things with this one of the most common things we can do with an nd array is to reshape it to change the number of rows and columns that are represented here so that we can do different kinds of operations here's our original two-dimensional image so let's print original image and then print out gray if we wanted to represent that as a one-dimensional image we just call reshape so print new image and reshape takes the image as the first parameter and a new shape is the second so we'll say image1d equals np.reshape we send in our original array gray and then the new shape will have as 1 and then gray dot shape times gray dot shape sub 0 and 1 multiply them together to get the total number of pixels and then print image1d so why are we talking about these nested arrays of bytes when we're supposed to be talking about OpenCV as a library well I wanted to show you that often libraries work on the same kind of principles in this case that images are stored as arrays of bytes and they're not representing the data in the same way in their APIs but by exploring a bit you can learn how the internal representation of data is stored and build routines to convert between formats for instance remember the last lecture when we wanted to look for gaps in an image so we could draw lines to feed into Kraken well we used PIL to do this using the getPixel to look at the individual pixels and see what luminosity values were then image.draw.rectangle to actually fill in a black bar separator this was a nice high level API and let us write some routines to do the work we wanted without having to understand too much about how the images were actually being stored but computationally it was very slow instead we could write the code to use this using matrix features within NumPy let's take a look so import CV2 as CV we'll load the two column image as well so image equals CV.read and read only to call and we'll convert it to grayscale using this CVT color image so gray equals CVT color image and color to gray now remember how slicing on a list works if you have a list of a number such as A equals 0 through 5 and then you go A sub 2 colon 4 that's going to return a list of numbers at positions 2 through 4 inclusive and don't forget that lists start indexing at 0 if we have a two dimensional array we can slice out a smaller piece of that using the format A sub 2,4 for the first dimension and then 1 colon 3 for the second dimension and you could think of this as first slicing along the row dimension then in the columns dimension so in this example that would be a matrix of rows 2 and 3 and columns 1 and 2 here's a look at our image so gray 2 colon 4 and 1 colon 3 so we see that it's all white we can use this technique as a window and move it around our big image finally the ND array library has lots of matrix functions which are generally very fast to run one that we might want to consider in this case is count non-zero which just returns the number of entries in the matrix which are not 0 so np.count non-zero we can say gray and give it 2 colon 4 and 1 colon 3 so this is going to crop out essentially a piece of the image as a matrix and send it to np.count non-zero the last benefit of going to this low level approach to images is that we can change pixels very fast as well previously we were drawing rectangles and setting a fill and line width this is nice if you want to do something like change the color of the fill from the line or draw complex shapes like polygons but we really just wanted a line here and that's really easy to do all we have to do is change the number of luminosity values from 255 to 0 here's an example let's create a big white matrix so white matrix equals and we'll use np.full we wanted a 12 by 12 matrix so we'll pass that in as the tuple for the shape of it 255 because we want everything white and we'll set the dtype to be np.un8 so that means one byte per pixel to describe it now let's display that so remember we have to do image.from array to convert from pill pass in the white matrix tell pill that it's luminosity values and let's print out the white matrix as well so this looks pretty boring it's just a giant white square which we actually can't see but if we want we can easily color a column to be black so if we go white matrix sub and here I just want all row values so I just use a colon comma 6 so I want the 6th well in this case the 7th column and I want to set this to be np.full and I just want to add in a matrix of 1 colon 12 in shape with 0 that's also dtype and I want to display this to the screen so image.from array white matrix which now won't just be white and then print out the white matrix and that's essentially what we wanted to do so why should we do it this way when it seems so much more low level and really the answer is speed this paradigm of using matrices to store and manipulate bytes of data for images is much closer to how low level api and hardware developers think about storing files and bytes in memory how much faster is this well that's up to you to discover there's an optional assignment for this week to convert our old code over into this new format to compare both the readability and the speed of the two different approaches okay we're just about at the project for this course if you reflect on the specialization as a whole you'll realize that you started with probably little or no understanding of Python, progressed through the basic control structures in libraries included with the language with the help of a digital textbook, moved on to more high level representations of data and functions with objects and now started to explore third party libraries that exist for Python which allow you to manipulate and display images this is really quite an achievement you've also no doubt found that as you progress the demands on you to engage in more self discovery have also increased where the first assignments were maybe straight forward the ones in this week require you to struggle a bit more with planning and debugging code as you develop but you've persisted and I'd like to share with you just one more set of features before we head over to the project the OpenCV library contains mechanisms to do face detection on images the technique used is based on Harcust Cades which is a machine learning approach we're going to go into the machine learning bits we have another specialization on that called the applied data science with Python specialization and you can take that after this one if you're interested but here we're going to treat OpenCV more like a black box OpenCV comes with trained models for detecting faces, eyes, and smiles which we'll be using you can train models for detecting other things like hot dogs or flutes and if you're interested in that I'd recommend you to download the OpenCV docs on how to train a cascade classifier and here's a URL however in this lecture we just want to use the current classifiers to see if we can detect portions of an image which are interesting so the first step is to load OpenCV and the XML based classifiers so import CV2 as CV and then we'll create a face cascade classifier so CV.CascadeClassifier we'll load that from read-only, I've put it in there hard cascade, frontal face default.xml ok with the classifiers loaded we now want to try and detect a face let's pull in the picture we played with the last time so image equals CV .imread and we'll bring in the Floyd picture and we'll convert it to grayscale using the CVT color image so gray equals CV.CVT color and we'll put this to grayscale the next step is to use the face cascade classifier I'll let you go explore the docs if you'd like to but the norm is to use the detect multiscale function this function returns a list of objects as rectangles the first parameter is an nd array of the image so faces equals faceCascade .detectMultiscale and we just pass it in our nd array our image gray now let's just print those faces out to the screen and we'll print them out as a list the resulting rectangles are in the format of XYWH where the X and Y denote the upper left hand corner point for the image and the width and height represent the bounding box we know how to handle this already in pill so from pill import image let's create a pill image object so pill image equals image.fromArray we pass in gray and we set the mode opacity now let's bring in our drawing object so from pill import image draw and let's create a context so drawing equals image.draw.draw and pass in the pill image now let's pull the rectangle out of the faces object so we'll take a rectangle and faces to the list we know there's one item in there so sub zero and now we're just going to draw a rectangle around the bounds so drawing.rectangle we're interested in set the outline to white and display the image in line so display pill image so not quite what we're looking for what do you think went wrong well a quick double check of the docs and it's apparent that OpenCV is returning the coordinates as XYWH while pill.imageDraw is looking for X1Y1 and X2Y2 so it looks like this is an easy fix let's wipe our old image so pill image and we'll just reload our image set up our drawing context and draw the new box so drawing.rectangle and so we'll take rec sub zero and rec sub one and then we'll add to that rec sub two and rec sub three as appropriate and set our outline and display it in line ok we see the face detection works pretty good on this image note that it's apparent that this is not head detection but that the hardcascades file that we're using is for eyes and a mouth let's try this on something a little bit more complex let's read in that MSI recruitment image so image equals cv.imRead and we'll bring in the MSI recruitment.gif and let's take a look at that image to remind ourselves what it looks like so display image.fromArray remember we have to do this but we don't have to pass in luminosity values that's in full color whoa what's that error about looks like there's an error on a line deep within the pil image.py file and it's trying to call an internal private member called dunder array underscore interface dunder on the image object but this object is none it turns out that the root of this error is that open cv can't work with gif images this is kind of a pain and unfortunate but we know how to fix this right one way is that we could just open this in pil and then save it as a ping and then open that in open cv so let's use pil to open our image so pil image equals image.open and bring in our gif now let's convert it to grayscale for open cv and get the byte string so open cv version is equal to pil image.convert luminosity and that's going to return to us in array and now let's just write that out to a file so open cv version.save msi recruitment.png okay now that the conversion of format is done let's try reading this back into open cv so open cv image equals cv.im read and bring in the ping we don't need to color convert this because we saved it as grayscale let's try and detect faces in that image so faces equals face cascade .detect multiscale and pass in the nd array cv image now we still have our pil color version in a gif so pil image equals image.open the gif and so we'll set a drawing context so drawing equals image.draw.draw now for each line and faces let's surround it with a red box so for x, y, w, and h in faces so this might actually be new syntax for you recall that faces is a list of rectangles in the y, width, and height format that is a list of lists instead of having to do an iteration and then manually pull out each item we can use something called tuple unpacking to pull out individual items in the sub-list directly to variables this is a really nice python feature alright so now we just need to draw our box so drawing.rectangle x, y, and then x plus width can't forget this and y plus height and let the outline to white and display pil.image whoa what happened here we see that we've detected faces so there's some white boxes and that we've drawn boxes around them but the colors have gone all weird this it turns out has to do with the color limitations for gif images in short a gif image has a very limited number of colors this is called a color palette after the palette artists used to mix paints for gifs the palette can only be 256 colors but they can be any 256 colors when a new color is introduced it has to take the space of an old color in this case pil adds white to the palette but doesn't know which color to represent and thus messes up the image who knew there was so much to learn about image formats we can see what mode though an image is with the dot mode attribute so if we do pil image dot mode we can see that there's a list of modes in the pilo documentation and they correspond with color spaces that we've been using for the moment though let's just change back to RGB which represents a color as a 3 byte tuple instead of in a palette so let's read in our image bring back our gif image nice and clean let's convert it to RGB mode so that's pretty easy we just do pil image dot convert with RGB and let's print out this mode okay we're pretty convinced that we've changed it now let's go back to drawing rectangles let's get our drawing object and let's iterate through the face sequence again and we'll tuple unpack as we go so x, y, width and height in faces and remember again width and height so we have to add these appropriately so drawing rectangle x, y, x plus width y plus height set the outline to white finally let's display that awesome we managed to detect a bunch of faces in that image it looks like we've missed four faces in the machine learning world we would call these false negatives something which the machine thought was not a face so a negative but that it was incorrect on consequently we would call the actual faces that were detected as true positives something that the machine thought was a face and it was correct on this leaves us with false positives something the machine thought was a face but it wasn't we can see that there's two of these in the image picking up the shadow patterns or textures and shirts and matching them with hard cascades finally we have a class of true negatives where the set of all possible rectangles that the machine learning classifier could consider where it correctly indicated that the result was not a face in this case there's many, many, many true negatives there's a few ways we could try and improve this and really it requires a lot of experimentation to find good values for a given image first let's create a function which will plot out rectangles for us in the image so def show rex will have that pass in faces let's read in our gif and convert it so we'll read it in and we'll convert it to RGB in the same line we'll set our drawing context and we'll plot all of the faces or all of the rectangles in faces so for x, y with the height in faces and we've seen this before with drawing.rectangle we'll set the outline to white and finally we'll display this alright so now we have a function show faces so first up we could try and binarize this image and it turns out that openCV has a built in binarization function called threshold pass in the image, the midpoint and the maximum value as well as a flag which indicates whether the threshold should be a binary or something else so let's try this so we'll do cv image bin equals cv.threshold so we pass in the image we're interested in I'll choose 120 for our threshold 255 for our top end and then cv.thresh binary and we're just going to pull from this actually because this function returns a list and we want the second value now let's do the actual face detection on this so faces equals faceCascade.detectMultiScale we'll just pass in this new cv image after it's been binarized and then let's call our showRex faces to see the results so that's kind of interesting not better but we do see that there is one false positive towards the bottom where the classifier detected the sunglasses' eyes and the dark shadow line as a mouth if you're following in the notebook with this video why don't you pause things and try a few different parameters for the threshold value the detectMultiScale function from OpenCv also has a couple of parameters the first of these is the scale factor the scale factor changes the size of rectangles which are considered against the model that is the hard cascade's XML file you'd think of it as if it were changing the size of the rectangles which are on the screen let's experiment with the scale factor usually it's a small value so let's try 1.05 so faces equals faceCascade.detectMultiScale and we'll pass in our image and 1.05 and now let's render those results to the screen through showRex now let's also try this on 1.15 so we'll just put that in there quickly and then finally let's try and do this on 1.25 as well so we'll put that in there and we'll give it a run we can see that as we change the scale factor we change the number of true and false positives and negatives with the scale set to 1.05 we have seven true positives which are correctly identified faces and three false negatives which are faces which are there but not detected and we have three false positives which are non-faces which OpenCv thinks are faces but not the false positives but also lose one of the true positives the person to the right wearing a hat and when we change this to 1.25 we lost more true positives as well this is actually a really interesting phenomena in machine learning and artificial intelligence there's a tradeoff between not only how accurate a model is but how the inaccuracy actually happens so which of these three models do you think are best? well the answer to that question is really it depends it depends why you're trying to detect faces and what you're going to do with them if you think these issues are interesting you might want to check out the Applied Data Science with Python Specialization Michigan offers here on Coursera ok beyond an opportunity to advertise did you notice anything else that happened when we changed the scale factor? it's subtle but the speed at which the processing ran took longer at smaller scale factors this is because more sub-images are being considered for those scales this could also affect which method we might use Jupiter has nice support for timing commands you might have seen this before a line that starts with a percentage sign in Jupiter is called a magic function this isn't normal Python it's actually a shorthand way of writing a function which Jupiter has predefined it looks a lot like the decorators we've talked about in a previous lecture but the magic functions were around long before decorators were part of the Python language one of the built-in magic functions in Jupiter is called time it and this repeats a piece of Python 10 times by default and tells you the average speed it took to complete let's time the speed of detect multiscale when using a scale of 1.05 so a percentage time it to call the magic function is called our normal Python code so faceCascade.detectMultiscale CVImage and 1.05 ok now let's compare that to the speed at scale 1.15 so same thing time at faceCascade.detectMultiscale CVImage 1.15 so you can see that this is a dramatic difference roughly 2.5 times slower when using the smaller scale let's look up our discussion of detecting faces in open CV you'll see that like OCR this is not a foolproof process but we can build on the work others have done in machine learning and leverage powerful libraries to bring us closer to building a turnkey Python based solution remember that the detection mechanism isn't specific to faces and that's just the hard cascade's training data we used on the web you'll be able to find other training data to detect other objects including eyes, animals, and so forth one of the nice things about using the Jupyter Notebook system is that there's a rich set of contributed plugins that seek to extend this system in this lecture I want to introduce you to one such plugin called iPie WebRTC WebRTC is a fairly new protocol for real-time communication on the web yep, I'm talking about chatting the widget brings this to Jupyter Notebook systems first, let's import this from the library two different classes which we're going to use in a demo one for the camera and one for the images so from iPie WebRTC import camera stream and image recorder then let's take a look at the camera stream object so help camera stream we see from the docs that it's easy to get a camera facing the user and then we have the audio on or off we don't need audio for this demo for instance so camera equals camera stream dot facing user and audio equals false the next object we want to look at is the image recorder so help image recorder the image recorder lets us actually grab images from the camera stream there are features for downloading and using the image as well we see that the default format is a ping file let's hook up the image recorder to our stream so image recorder equals image recorder and stream is equal to the camera now the dogs are a little unclear about how to use this within Jupiter but if we call the download function it will actually store the results of the camera which is hooked up in image recorder dot image let's try it out first let's tell the image recorder to start capturing data so image recorder dot recording equals true now let's download the image so image recorder dot download and then let's inspect the type of the image so type image recorder dot image ok so the object that it stores is an ipi widgets dot widget dot widget media dot image how do we do something useful with this well an inspection of the object shows that there's a handy value field which actually holds the bytes behind the image and we know how to display those so let's import pill image import pill dot image and then let's import IO import IO and now let's create a pill image from the bytes so image is equal to pill dot image dot open and then IO dot bytes IO to create a stream and then image recorder dot image dot value wow that's a lot of dots and let's render it to the screen so that's just display image because it's just a pill image now great you see a picture hopefully you're following along with one of the notebooks and you've been able to try this out for yourself so what can you do with this well this is a great way to get started with a bit of computer vision you already know how to identify a face in the webcam in the picture or try and capture text from within the picture with open cv there's any number of things that you can do simply with a webcam, the jupiter notebooks and python