 This week's topic is software engineering, and in our class, we'll focus on two main ideas. One, what are some of the tools and techniques that professional software developers use to create real projects? And two, what are the theoretical principles behind software engineering to manage the complexity and understand large software systems with lots of developers? In this video, I'm going to show you an example of one of many tools that developers use. It's called MELD, and it allows you to compare the differences between files and directories. Now the reason I'm showing you this utility is so that you understand how it works so that we can look at the source code and what a program looks like that's written across many different files and directories. Then I'm going to show you some of the tools in the book, the different types of diagrams in UML that the book talks about, and take a look at some of that in the context of the MELD utility. I have some examples here on my desktop, bottles1.py and bottles2.py. These are sample solutions to the lab we did a couple weeks ago with the bottles that pop on the wall. So this first solution is the one I've already given you on Canvas, which has a while loop and an if statement. This other one I just made up quickly, it has a while loop and two if statements, but essentially it should print the same results. Something you end up doing a lot as a software developer is comparing different versions of files that you've written. Perhaps you've been working on a project for a little while, and you want to know what all changes have I made to this file and do I want to keep those changes permanently. A nice utility in Linux and also available in other operating systems, it's called visual diff. To run this diff utility, you just select the files you'd like to compare, right-click, and choose the compare option. By the way, diff means difference, it's just an abbreviation of that term. So this is a program called meld that runs on Linux, and it basically puts two files, one on the left, one on the right, and it highlights the visual differences between the files. You can see, for example, in this file I have verse greater than two, whereas in the other file I have verse greater than one. And it highlights the background that that character was changed on that line. And then also it shows, okay, these lines are the same here and here, but this chunk is different between those two files. So that's kind of a handy utility. And if you explore this in the lab you can see what all the other menu options do. Meld also works on a pair of directories. So here I have some pretend directories of files left over from another tech talk in the past. I can select those two and compare. And basically what it does is shows you a list of the directory contents on the left versus the other directory. Here's a new file, for example, that comes up in green and it's striked through on the left, whereas this file here, bash quickref.pdf, I've deleted out of the directory on the right. Finally, there's a copy of Grim's fairy tales. It's a text file. And that one shows up in red because that file has been changed. And if I double click on that file it will show me any differences between the two. So on the left it says the brother's Grim. On the right the brother's Mayfield. I'm not trying to plagiarize that work, I just want to show you that if you make a difference to a file you can detect that difference using a utility like meld. This utility also come in handy when computer science professors want to compare your code submissions to see if you copied from other students in the class. So please don't cheat in your programming classes. It's not very difficult for a computer scientist to detect that kind of thing. Well the reason why I showed you meld is I want you to be familiar with what features this software has because now we're going to take a look at the source code for the meld utility. So I've cleared off my desktop, let me bring up my web browser again and search for meld on Google. So it should be the first link, hopefully it's the same on yours. So here's the website for meld. Again it just shows you what the features are and blah blah blah. So I'm going to go ahead and download the latest version of meld. It downloads in a tar.exe file which is extractable by a Linux computer so I'll just go ahead and right click that download file and say extract here. And now what I have is a copy of the source code of the meld program. Now I'm not going to go into all the details to how this code is implemented. Our goal today is to look at modularity and understand what the concepts in Chapter 7 of the textbook are talking about. So what you see here if I go into the meld directory is here are all the source files for the meld program. Now I've also got other files like the binary and some data, maybe the help documentation. If you ever see the term PO those files are used to translate an application into other languages. So here's all the PO files or they're called translation files. And then some tools that go with that as well. But you'll notice that the meld application consists of a number of Python programs or not Python programs, Python modules. So here we've got matchers.py, merge.py, mist.py. And there's some subdirectories as well like UI which must stand for user interface. So maybe these are all the programs that make all the different text boxes and buttons work. Utility functions and VC which in terms of meld stands for version control. There's a number of version control systems that meld automatically integrates with. So there's all of those. So one thing you'll notice in a medium sized software project like meld, all of the source code is in different directories. You've got a number of different files and they each have their own role in the application. So let's take a look at some of these files in more detail. The first file you may notice that looks strange is this double underscore init double underscore dot py file. In fact, if I open that file it's just a blank file. To understand what this means, let's go to the Python tutorial. I'll search for Python modules on Google. That will take me directly to chapter six of the tutorial. Now I don't expect you to read this entire tutorial, but I do want you to know that it exists. This might be a useful page just to skim through for your own enrichment. And you'll notice that what it says about Python modules is it's basically a way to organize an application into multiple directories. This is really useful if you have multiple people working on the project and they each have their own space or their own set of files that they make changes to. A requirement of the Python language is that each directory needs to contain an initialization module. So this double underscore init dot py file says if I go to import code out of this directory, you need to run this initialization first. Now, although that's a requirement of a language, you don't necessarily have to include things there. So for example, if I go back to the meld source directory and look at that init dot py file, for the most part they're blank. There's another blank file. There's another blank file. However, in the version control directory, the init dot py file has a bunch of extra stuff in there for initializing that version control system. Anyway, you might notice if we look at, let's say, dir diff dot py, this must be the file that does directory diffs or directory comparisons. And again, the goal here is not to understand all this code, but to get a feel for what a real software application might look like at a small scale. Here's another one, file diff. So here's the code for comparing the differences between two files. And of course, at the top of this file, we have a whole bunch of import statements because there's all these other modules that need to be included in the code. Finally, let's take a look at misc dot py. So here's some miscellaneous things. You can see I've got some statements, some definitions like shell join, run dialogue, open uri, position menu under widget. I kind of like these names here. However, if I go back to file diff or dir diff dot py, here we have classes like a stat item. Or let's see, there's some other definitions. Here's a dir drift tree store or a canonical listing. Those are all class blueprints for different objects in this application, a cached sequence matcher, for example, or cursor details or task entry. There's a lot of details that go into this type of application where even if I just have a simple task like compare the contents of two directories or compare the contents of two files. So now that we've seen a little bit about what the code looks like without again understanding the individual details of the code, let's define what these different concepts in Chapter 7 of the textbook mean. So in this segment, I'm going to refer to three of the figures in the textbook. These are all in Chapter 7 starting on page 309. So the first figure is Figure 7.3 where we have a structure chart. Now a structure chart is used to understand how the code is organized in terms to what function calls which function or how does the control go from one place to the other. So for example, you might see here, I'm back in my dir diff file. And as I go along, you'll see at some point it will call other functions like tree view dot connect or focus out events dot append or so forth. Now this would make a very large structure chart. And of course, again, I don't want you to go through and fully understand some big projects in five minutes. But on a small scale, you might have a game that has these four different functions and you call each one. And the point is, can you go through a large software project and come up with a diagram that helps you understand how do I get from one place to the other. This is like a map for the execution through that code. Let's take a look at Figure 7.4 now. And this is a different type of diagram. This is called a class diagram or a structure of, let's say, a class in all of its instances. So for example, going back to the mild application, I have a class for file diffs and a class for directory diffs. Where did that class go? Here we go. So there's this nice big comment right here that helps you find it. So the directory diff class has all of these things and it just got some kind of a map and a state actions and it has all these other different buttons and it has all of these UI tools. You can imagine having in the diagram over here a list of all of the properties of that class. So if I were to have multiple diffs going at the same time, which I can, I can compare multiple sets of files at the same time, those are all going to be instances of the same application. And so a diagram like this helps you understand what are the objects in the application and what are they all doing. Finally, if you want to look at how the application is running itself, you need something like an interaction diagram. So in the textbook example, going through with this tennis game, we have different players and a judge and then some kind of a class that keeps track of the score. And player A will let the judge know that it served and then the judge will let player B know that that ball was hit to them. And then B will say, well, did I hit it back or not? And the judge will let player A know and at some point the match ends and the judge updates the score. So this allows you to see both the flow of execution and the different actors or objects that are involved. And the way you read this diagram is from the top to the bottom. As time goes on, we move further down the chart. So when you have a project this complex, right, if there's this many files involved, and I think the meld project is developed by, I don't know, about half a dozen or so people as an open source project, it's kind of nice to keep everything independent, right? So if I need to fix something in this file or in this file, I don't want to break something in this file or this file on accident. So the way we do that is with a number of techniques described in the book. Let me pull up a document to type some of these. So one of them is the idea of coupling, or in other words, we want to minimize coupling. Now, coupling is when one module affects another. So for example, in the beginning of this presentation, when I was showing you how meld worked, one of the things you can do during a directory diff is double click on files that have changed, and that will bring up a file diff, right? So there's some kind of coupling between these two files, and if I change one, it may have a side effect on the other one. This is called, now there's actually two ways that you can actually do coupling. One is control coupling, and one is data coupling. So in control coupling, it's basically when you have a procedure or a function calls between the two of them. Let me go ahead and bold this term. Now, of course, this could be between two or three or how many of our modules, but a couple means two, right? So if I have two modules and one module calls a procedure or function of another module, then those in terms of the flow of control of the program are coupled. That's what control coupling refers to. Now data coupling, as you may guess from the name, talks about sharing information or data between modules. So if I go back here and look at the source code, I think it wouldn't take long until I see that there's some kind of data structure that may be shared between the two modules. And if I just take a look at the list of all these files, the meld window may have nothing to do with, say, the undo button. Actually, that's a bad example. Undo is probably part of the window features. But there's other things like debus service that might not have anything to do with the miscellaneous features. And so as we design the software project, we want to minimize coupling. We want to keep control between several modules that need each other, but there shouldn't be a dependency between every file to every other file. And same thing with data. We want to keep the data organized in different places so that if there needs to be a change, I have to go to the minimal number of places to make that change. Now the other task, or sorry, the other concept that's introduced in Chapter 7 is that of cohesion. Now this is the other side of the problem. I actually want good cohesion. I want to avoid coupling, but we want to maximize cohesion. So I'll tell you what, I'm going to make this green because that's a good thing. Dark green is probably good enough. And I'll make this red or crimson because that's a bad thing. Okay, so what is cohesion? Well, the book refers to cohesion as glue. The glue that holds a module together. In other words, as I'm designing a module, I want everything in that module to be related. So for example, the file diff module should only have code that is necessary for doing the difference comparison between two files. I don't need any code here that deals with directories. I don't need code in here that deals with, well, I don't know what other files are in this program. Maps of links or buffer management or different paths of things or the undo button. That's not necessarily what it means to compare two files. So as I'm trying to build this module, I only want everything that's related. So glue in this sense is what's the underlying abstract concept that makes all of these things the same. I think most of you as you organize large projects do this anyway, right? You organize things if they're related. So save to file name, save file, make patch. These are all part of doing the differences between files. Now one technique that we can use to protect our cohesion, and this is the last section now in 7.4 that I'd like you to study before Wednesday is called information hiding. Let me unitalicize the word and. So information hiding. Now some of you have seen this in the Java intro courses if you've learned about public and private. So the basic idea of information hiding is that the details of how a module works should not matter. So in other words, if I'm going to do a file diff and some other module like the directory diff needs to do a file diff, the directory diff doesn't need to know, well, what all variables did I have to write to make that happen? So looking at the example in the textbook when it talks about that tennis game, the judge doesn't need to know what are all the data types of the players like their name and their current speed and the trajectory. The judge only needs to know did the ball cross over the net or not, right? And by making something private or hidden in the details of how you write code, it's not like you don't want anyone else to ever see that code. What you don't want is them to accidentally access that code, right? We don't want to have data coupling. We don't want one module to try to access data of another module accidentally. So by hiding that data with mechanisms in the programming language such as making it private, we prevent the coupling of two modules accidentally. You know, people make mistakes and these things that we want to minimize, we use cohesion and information hiding to prevent that from happening. So clearly there's a lot more to this chapter than these concepts, but this is what I would like you to focus on between now and Wednesday. Be sure to read the textbook section 7.4 and 7.5 so you can get the details of what this information means. I've mainly wanted to give you a bird's eye view just picking up some source code that I've never looked at before. I know most of you probably haven't looked at this code before, but it's not that intimidating if we use tools to understand how things are coupled, where the data flows, and I hope you'll take a look at the UML tutorial because we're going to draw some diagrams on Wednesday to do just that. So until then, I'll be signing off and I'll see you on Wednesday.